Resources


Model Credentialed Access

Automated Prediction of Glasgow Coma Scale Scores from Unstructured Electronic Health Records: a Natural Language Processing Approach

Marta Fernandes, Niels Turley, Haoqi Sun, Shibani Mukerji, Lidia M. V. R. Moura, M Brandon Westover, Sahar Zafar

Prediction of Glasgow Coma Scale scores from Unstructured Electronic Health Records using NLP

glasgow coma scale natural language processing ordinal regression electronic health records clinical notes

Published: April 17, 2026. Version: 1.0.0


Model Credentialed Access

Automated extraction of post-stroke functional outcomes from unstructured electronic health records

Marta Fernandes, Kaileigh Gallagher, Niels Turley, Aditya Gupta, M Brandon Westover, Aneesh Singhal, Sahar Zafar

This project aims to automatically extract mRS scores for a post-stroke patient population from unstructured electronic health records using natural language processing

stroke natural language processing modified rankin scale machine learning

Published: Oct. 2, 2025. Version: 1.0.0


Model Credentialed Access

Automated extraction of stroke severity from unstructured electronic health records using natural language processing

Marta Fernandes, M Brandon Westover, Aneesh Singhal, Sahar Zafar

This project automatically extracts NIHSS scores from unstructured electronic health records using natural language processing

nihss nlp stroke

Published: Oct. 2, 2025. Version: 1.0.0


Database Credentialed Access

The Brain Imaging and Neurophysiology Database (BIND)

Charlotte Maschke, Peter Hadar, Yicheng Zhang, Jian Li, Gauri Ganjoo, Andrew Hoopes, Alessandro Guazzo, Aditya Gupta, Manohar Ghanta, Bruce Nearing, Christine Tsien Silvers, Bharath Gunapati, Robert Thomas, Jennifer Kim, Shibani Mukerji, Adrian Dalca, Sahar Zafar, Alice Lam, Emmanuel Mignot, M Brandon Westover

BIND Database 1: Neuroimaging Data (MRI, CT, PET, SPECT) that can be paired with EEG and PSG (found in the Harvard EEG Database https://bdsp.io/content/harvard-eeg-db/4.1/). LLMs helped categorize pathology.

ct mri brain imaging

Published: Sept. 9, 2025. Version: 1.0


Database Credentialed Access

Identification of patients with epilepsy using automated electronic health records phenotyping - Data and Code

Marta Fernandes, Sahar Zafar, M Brandon Westover

Code and data for identifying patients with epilepsy using automated electronic health records.

nlp ehr epilepsy

Published: June 5, 2025. Version: 1.0


Database Credentialed Access

Automated Extraction of Seizures and Ictal-Interictal Continuum Patterns from EEG Reports to Enable Large-Scale Neurophysiology and Neurocritical Care Research - Data and Code

Shadi Sartipi, Deena S. Godfrey, Alexandra-Maria Tauțan, Marta P. Fernandes, Manohar Ghanta, Aditya Gupta, Bruce Nearing, Jennifer Kim, Aaron F. Struck, Tobias Loddenkemper, Jurriaan Peters, Jong Woo Lee, M. Brandon Westover, Sahar F. Zafar

Deidentified seizure and ictal-interictal continuum (IIC) data extracted from 156,582 EEG reports across three health systems (two adult, one pediatric), plus the LLM-based extraction pipeline. Companion to Sartipi et al. (IJMI, under review).

Published: June 1, 2026. Version: 1.0.0


Database Credentialed Access

The Human Sleep Project

Qichen Li, Shenghan Wen, Haoqi Sun, Wolfgang Ganglberger, Ayush Tripathi, Niels Turley, Samuel Waters, Arnav Gupta, Aditya Gupta, Manohar Ghanta, Bruce Nearing, Han Wu, Katie L. Stone, Chad Robichaux, Zhiyong Zhang, Qiao Li, Gauri Ganjoo, Christine Tsien Silvers, Bharath Gunapati, Kiran Maski, Samaneh Nasiri, Dennis Hwang, Lynn Marie Trotti, Umakanth Katwa, Gari D. Clifford, Emmanuel Mignot, Robert J. Thomas, M. Brandon Westover

Multi-center clinical polysomnography dataset of 119,234 sleep recordings from 90,166 patients across five U.S. sites spanning the human lifespan, with manual + CAISR automated annotations, per-session quality grades, and 22-category ICD-10 linkage.

Published: June 2, 2026. Version: 3.0


Database Credentialed Access

PRediction Of Disease PHEnoTypes (PROPHET)

Niels Turley, Marta Fernandes, Shadi Sartipi, Han Wu, Alice Lam, Lydia Petersen, Catherine Clive, Daniel Sumsion, Ruoqi Wei, Bram Overmeer, Jaden Searle, Gregory Hooke, Spencer Boris, Wan-Yee Kong, Arjun Singh, Marjan Sarami, Alihan Yaramis, Imad Akbar, Rebecca Milde, Jet Veltink, Elijah Davis, Aditya Gupta, Manohar Ghanta, Aidan McDonald Wojciechowski, Shibani Mukerji, Haoqi Sun, M Brandon Westover, Sahar Zafar

Multicenter expert-annotated EHR dataset and NLP phenotyping framework for 17 neurological conditions spanning diagnoses, severity scales, and outcomes across six U.S. health systems.

Published: March 31, 2026. Version: 1.0


Database Credentialed Access

Narcolepsy Risk Estimation from Clinical Notes

Niels Turley, Haoqi Sun, M Brandon Westover

Dataset and code for developing and validating machine learning models to phenotype narcolepsy type 1 (NT1) and narcolepsy type 2/idiopathic hypersomnia (NT2/IH) from multi-site electronic health record data, including cross-sectional classification

Published: March 2, 2026. Version: 1.0


Model Open Access

Automated phenotyping of mild cognitive impairment and Alzheimer's disease and related dementias using electronic health records

Ruoqi Wei, Niels Turley, Aditya Gupta, Manohar Ghanta, Robert Thomas, Sahar Zafar, Haoqi Sun, M Brandon Westover

a MCI/ADRD EHR phenotyping model trained with python sklearn pipeline, injoblib format.

Published: Sept. 25, 2025. Version: 1.1