Name: Automated extraction of post-stroke functional outcomes from unstructured electronic health records
Published: Oct. 2, 2025
License: https://github.com/bdsp-core/bdsp-license-and-dua

Model Credentialed Access

Marta Fernandes , Kaileigh Gallagher , Niels Turley , Aditya Gupta , M Brandon Westover , Aneesh Singhal , Sahar Zafar

Published: Oct. 2, 2025. Version: 1.0.0

When using this resource, please cite: (show more options)
Fernandes, M., Gallagher, K., Turley, N., Gupta, A., Westover, M. B., Singhal, A., & Zafar, S. (2025). Automated extraction of post-stroke functional outcomes from unstructured electronic health records (version 1.0.0). Brain Data Science Platform. https://doi.org/10.60508/zksv-mq70.

MLA	Fernandes, Marta, et al. "Automated extraction of post-stroke functional outcomes from unstructured electronic health records" (version 1.0.0). Brain Data Science Platform (2025), https://doi.org/10.60508/zksv-mq70.
APA	Fernandes, M., Gallagher, K., Turley, N., Gupta, A., Westover, M. B., Singhal, A., & Zafar, S. (2025). Automated extraction of post-stroke functional outcomes from unstructured electronic health records (version 1.0.0). Brain Data Science Platform. https://doi.org/10.60508/zksv-mq70.
Chicago	Fernandes, Marta, Gallagher, Kaileigh, Turley, Niels, Gupta, Aditya, Westover, M Brandon, Singhal, Aneesh, and Sahar Zafar. "Automated extraction of post-stroke functional outcomes from unstructured electronic health records" (version 1.0.0). Brain Data Science Platform (2025). https://doi.org/10.60508/zksv-mq70.
Harvard	Fernandes, M., Gallagher, K., Turley, N., Gupta, A., Westover, M. B., Singhal, A., and Zafar, S. (2025) 'Automated extraction of post-stroke functional outcomes from unstructured electronic health records' (version 1.0.0), Brain Data Science Platform. Available at: https://doi.org/10.60508/zksv-mq70.
Vancouver	Fernandes M, Gallagher K, Turley N, Gupta A, Westover M B, Singhal A, Zafar S. Automated extraction of post-stroke functional outcomes from unstructured electronic health records (version 1.0.0). Brain Data Science Platform. 2025. Available from: https://doi.org/10.60508/zksv-mq70.

Additionally, please cite the original publication:

Fernandes M, Gallagher K, Turley N, et al. Automated extraction of post-stroke functional outcomes from unstructured electronic health records. European Stroke Journal. 2025;10(3):829-836. doi:10.1177/23969873251314340

Abstract

Purpose:

Population level tracking of post-stroke functional outcomes is critical to guide interventions that reduce the burden of stroke-related disability. However, functional outcomes are often missing or documented in unstructured notes. We developed a natural language processing (NLP) model that reads electronic health records (EHR) notes to automatically determine the modified Rankin Scale (mRS).

Method:

We included consecutive patients (⩾18 years) with acute stroke admitted to our center (2015–2024). mRS scores were obtained from the Get With the Guidelines registry and clinical notes (if documented), and used as the gold standard to compare against NLP-generated scores. We used text-based features from notes, along with age, sex, discharge status, and outpatient follow-up to train a logistic regression for prediction of good (0–2) versus poor (3–6) mRS, and a linear regression for the full range of mRS scores. The models were trained for prediction of mRS at hospital discharge and post-discharge. The models were externally validated in a dataset of patients with brain injuries from a different healthcare center.

Findings:

We included 5307 patients, 5006 in train and test and 301 in validation; average age was 69 (SD 15) and 65 (SD 17) years, respectively; 47% female. The logistic regression achieved an area under the receiver operating curve (AUROC) of 0.94 [CI 0.93–0.95] (test) and 0.94 [0.91–0.96] (validation), and the linear model a root mean squared error (RMSE) of 0.91 [0.87–0.94] (test) and 1.17 [1.06–1.28] (validation).

Discussion and Conclusion:

The NLP-based model is suitable for use in large-scale phenotyping of stroke functional outcomes and population health research.

Background

We selected each note type (physical therapy, occupational therapy, discharge summary, and other types) from the calendar date closest to the time of gold standard mRS measurement. Models predicting discharge mRS used notes closest to the day of discharge, and models predicting post discharge mRS used notes documented closest to the day of post-discharge gold standard measurement. The data from our center was split into train (70%) and test (30%) sets, with unique patients in each set. With the train set we developed a logistic regression model for prediction of good (mRS 0–2) versus poor (mRS 3–6) mRS and a linear regression model for prediction of the full range of mRS 0–6.

Model Description

Our final model had three-stages: (stage 1) for patients with a discharge status of deceased we automatically assigned mRS 6 as the predicted score; (stage 2) for any encounter where mRS was documented by clinicians, regular expressions were used to extract the score; (stage 3) for all other encounters (patients alive at discharge and those without mRS documentation) LASSO models were used for prediction.

Technical Implementation

Both models used the least absolute shrinkage and selection operator (LASSO) to select informative text-based features, age, sex, patient discharge status, and outpatient follow-up flag (yes/no) to predict the mRS scores. Age values were normalized using min–max normalization. For each model, we performed 100 iterations of five-fold cross validation in the training data to determine the best regularization parameter.

Installation and Requirements

Python

Usage Notes

Python

Ethics

Ethical approval

In this dataset, all data were anonymized with all identifiable patient information removed. Scans were identified retrospectively from IRB-approved chart review under protocols approved by the BIDMC IRB (protocols #2022P000481, #2022P000417) and MGB IRB (protocol #2013P001024), which provided a waiver of consent for retrospective data analysis; no prospective data acquisition or participant recruitment was performed.

Informed consent

A waiver of informed consent was obtained for this observational study.

Acknowledgements

This project was supported by NIH R01NS131347 (PI Sahar F. Zafar).

Conflicts of Interest

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Dr. Zafar is a clinical neurophysiologist for Corticare, received speaking honoraria from Marinus, and received royalties from Springer publishing, unrelated to this work. Dr. Westover is a co-founder, scientific advisor, and consultant to Beacon Biosignals and has a personal equity interest in the company. None of these interests played any role in the present work.

References

Fernandes M, Gallagher K, Turley N, et al. Automated extraction of post-stroke functional outcomes from unstructured electronic health records. European Stroke Journal. 2025;10(3):829-836. doi:10.1177/23969873251314340