Nature Medicine ( IF 58.7 ) Pub Date : 2024-09-12 , DOI: 10.1038/s41591-024-03214-0 Lukas Heumos 1, 2, 3 , Philipp Ehmele 1 , Tim Treis 1, 3 , Julius Upmeier Zu Belzen 4 , Eljas Roellin 1, 5 , Lilly May 1, 5 , Altana Namsaraeva 1, 6 , Nastassya Horlava 1, 3 , Vladimir A Shitov 1, 3 , Xinyue Zhang 1 , Luke Zappia 1, 5 , Rainer Knoll 7 , Niklas J Lang 2 , Leon Hetzel 1, 5 , Isaac Virshup 1 , Lisa Sikkema 1, 3 , Fabiola Curion 1, 5 , Roland Eils 4, 8 , Herbert B Schiller 2, 9 , Anne Hilgendorff 2, 10 , Fabian J Theis 1, 3, 5
With progressive digitalization of healthcare systems worldwide, large-scale collection of electronic health records (EHRs) has become commonplace. However, an extensible framework for comprehensive exploratory analysis that accounts for data heterogeneity is missing. Here we introduce ehrapy, a modular open-source Python framework designed for exploratory analysis of heterogeneous epidemiology and EHR data. ehrapy incorporates a series of analytical steps, from data extraction and quality control to the generation of low-dimensional representations. Complemented by rich statistical modules, ehrapy facilitates associating patients with disease states, differential comparison between patient clusters, survival analysis, trajectory inference, causal inference and more. Leveraging ontologies, ehrapy further enables data sharing and training EHR deep learning models, paving the way for foundational models in biomedical research. We demonstrate ehrapy’s features in six distinct examples. We applied ehrapy to stratify patients affected by unspecified pneumonia into finer-grained phenotypes. Furthermore, we reveal biomarkers for significant differences in survival among these groups. Additionally, we quantify medication-class effects of pneumonia medications on length of stay. We further leveraged ehrapy to analyze cardiovascular risks across different data modalities. We reconstructed disease state trajectories in patients with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) based on imaging data. Finally, we conducted a case study to demonstrate how ehrapy can detect and mitigate biases in EHR data. ehrapy, thus, provides a framework that we envision will standardize analysis pipelines on EHR data and serve as a cornerstone for the community.
中文翻译:
用于电子健康记录数据端到端分析的开源框架
随着全球医疗保健系统的逐步数字化,电子健康记录 (EHR) 的大规模收集已变得司空见惯。但是,缺少一个可扩展的框架,用于解释数据异质性的全面探索性分析。在这里,我们介绍了 ehrapy,这是一个模块化的开源 Python 框架,旨在对异构流行病学和 EHR 数据进行探索性分析。ehrapy 包含一系列分析步骤,从数据提取和质量控制到生成低维表示。ehrapy 辅以丰富的统计模块,有助于将患者与疾病状态相关联、患者集群之间的差异比较、生存分析、轨迹推断、因果推断等。利用本体,ehrapy 进一步实现了数据共享和训练 EHR 深度学习模型,为生物医学研究中的基础模型铺平了道路。我们通过六个不同的例子来演示 ehrapy 的特征。我们应用 ehrapy 将受不明肺炎影响的患者分层为更细粒度的表型。此外,我们揭示了这些组之间生存率显着差异的生物标志物。此外,我们量化了肺炎药物对住院时间的药物类影响。我们进一步利用 ehrapy 分析不同数据模式的心血管风险。我们根据影像学数据重建了严重急性呼吸系统综合症冠状病毒 2 (SARS-CoV-2) 患者的疾病状态轨迹。最后,我们进行了一项案例研究,以展示 ehrapy 如何检测和减轻 EHR 数据中的偏差。因此,ehrapy 提供了一个框架,我们设想该框架将使 EHR 数据的分析管道标准化,并作为社区的基石。