Database Credentialed Access
Neurophysiologic Encephalopathy Severity Index (NESI) - Data and Code
Arka Roy , Katelyn Surrao , ChenXi Sun , Jin Jing , Tianyu Zhang , Aditya Gupta , Ryan A. Tesh , Imad Akbar , Kaileigh Gallagher , Marjan Sarami , Rajib Kanti Dey , Aysenur Yaramis , Alihan Yaramis , Zeeshan Haider , Utsav Patel , Laraib Jumani , Grace Bayas , Haoqi Sun , Wolfgang Ganglberger , Erika L. Juarez Martinez , Fábio A. Nascimento , Irfan S. Sheikh , Oluwaseun Akeju , Miles Berger , Eyal Y. Kimchi , Sahar F. Zafar , Christine A. Eckhardt , M Brandon Westover
Published: June 14, 2026. Version: 1.0.0
When using this resource, please cite:
(show more options)
Roy, A., Surrao, K., Sun, C., Jing, J., Zhang, T., Gupta, A., Tesh, R. A., Akbar, I., Gallagher, K., Sarami, M., Dey, R. K., Yaramis, A., Yaramis, A., Haider, Z., Patel, U., Jumani, L., Bayas, G., Sun, H., Ganglberger, W., ... Westover, M. B. (2026). Neurophysiologic Encephalopathy Severity Index (NESI) - Data and Code (version 1.0.0). Brain Data Science Platform. https://doi.org/10.60508/rrbg-ba24.
Abstract
Disorders of consciousness are assessed with multiple bedside scales that measure overlapping but non-identical aspects of neurological function, producing fragmented and often discordant assessments in acute and critical care. This resource accompanies the Neurophysiologic Encephalopathy Severity Index (NESI), a continuous electroencephalography (EEG)-based measure of brain dysfunction that places patients on a single physiological continuum spanning alertness, delirium, sedation, and coma.
NESI is derived by extracting generalized EEG representations from a clinical EEG foundation model (MORGOTH), compressing them with a contrastive encoder, and learning a continuous severity score with a pairwise ranking head trained jointly across four clinical scales (RASS, GCS, CAM-S, and ICANS). On a held-out test set, NESI agreed strongly with all four scales (Spearman correlations 0.72-0.84), showed higher AUROC than GCS for in-hospital mortality, and tracked propofol effect-site concentration more closely than RASS.
This release provides the curated 10-minute EEG segments, the MORGOTH feature activations, the assembled training/validation/test data, trained model checkpoints, the propofol pharmacokinetic and medication-exposure derivatives, and per-segment metadata, together with the complete analysis code (mirrored on GitHub). It is intended to let credentialed users retrain the models and reproduce the figures and tables from the ground up.
Background
Accurate assessment of neurologic status is essential to the care of acute and critically ill patients, yet current practice approximates the continuum of acute brain dysfunction through a patchwork of behavioral scales developed in different eras for distinct populations: the Glasgow Coma Scale (GCS), the Richmond Agitation-Sedation Scale (RASS), the Confusion Assessment Method-Severity (CAM-S), and immune effector cell-associated neurotoxicity syndrome (ICANS) grading, among others. These scales measure overlapping but non-identical constructs, can be discordant, and are limited by inter-rater variability and by focal deficits that mask underlying status.
EEG provides a direct measure of cerebral activity that captures the physiology underlying acute encephalopathy, and prior machine-learning work has shown that EEG carries enough information to approximate individual behavioral scores. However, those instruments were built for specific populations and single targets and were never integrated into a common framework. We treat the scales as noisy observations of one shared latent construct - the severity of acute encephalopathy - and recover that physiologic dimension directly from EEG. The result, NESI, unifies the four scales on a single continuous axis, which lets scores from different instruments be related to one another under a shared construct.
Methods
Cohorts and data source
Data were drawn from the Harvard EEG Database (HEEDB), a large de-identified corpus of multichannel EEG from four Harvard-affiliated hospitals linked to electronic health records. Cohorts were assembled for RASS, GCS, and CAM-S by cross-matching behavioral-assessment timestamps with concurrent EEG; the ICANS cohort is a separate retrospective dataset with daily ICANS grades paired with continuous EEG and age/sex-matched outpatient controls. Inclusion required age ≥ 18, inpatient status, and concurrent assessment and EEG. Observations with RASS > 0 were excluded from downstream analysis.
Signal processing and quality assessment
EEG was recorded with 19 channels (international 10-20 system, 200 Hz). For RASS, GCS, and CAM-S, the 10-minute segment immediately preceding each assessment was extracted; for ICANS, which lacks per-grade timestamps, the 10-minute segment maximizing awake activity (via the MORGOTH sleep head) was selected. Signals were band-pass filtered (0.5-70 Hz) and notch-filtered (60 and 50 Hz). A signal-quality-assessment (SQA) block flagged segments with high-amplitude or flat artifacts and excluded low-quality segments.
Feature extraction and model
Each 10-minute segment was passed through the MORGOTH clinical-EEG foundation model, producing a 591×17 matrix of event-level feature activations. A supervised-contrastive ResNet encoder compresses this matrix to a 40-dimensional embedding; a pairwise ranking head (a parameter-shared Siamese MLP) then maps embeddings to a single scalar NESI value. Training uses within-scale ordinal pairs and cross-scale pairs (assembled into a unified dataset, SPECTRA) so that severity is learned on one scale-agnostic axis. Data were split subject-independently 70/10/20.
Sedation and medication analyses
Medication exposure was computed from the medication administration record over a 24-hour pre-EEG window. For propofol, effect-site concentration (Ce) was estimated with the Eleveld (2018) pharmacokinetic-pharmacodynamic model, and the variance in NESI and RASS attributable to Ce was decomposed with linear mixed-effects models (patient random intercept) and cluster-bootstrap confidence intervals.
Code
All preprocessing, training, evaluation, figure, and table code is provided in the accompanying GitHub repository (see Usage Notes), with a REPRODUCE.md mapping each figure and table to its script.
Data Description
All files live under the credentialed S3 prefix yama/. The dataset is organized by
cohort, plus a NESI/ module with the assembled modeling data and the
sedation/medication derivatives. EEG segments are MATLAB .mat arrays (19 channels at
200 Hz); MORGOTH activations are the per-segment 591×17 feature matrices. Data are
de-identified (surrogate subject IDs; shifted dates).
Top-level layout
yama/
README.md, LICENSE.txt
segment_index.csv unified per-segment index (all cohorts) + source-EEG provenance
source_eeg_files.csv unique continuous source EEG recordings (all cohorts)
RASS/ GCS/ CAMS/ ICANS/ per cohort: *_EEG10minSegments/, MorgothActivations/,
Cohort/ (HEEDB metadata), *Training_Final_Metadata.csv, ModelCheckpoints/
Death/ Death_EEG10minSegments/, MorgothActivations/, Cohort/ (cohort + labels)
DiagnosisMetadtafiles/ admission-diagnosis tables
NESI/
Data/ SPECTRA train/val/test feature pickles, embeddings, NESI result CSVs
ModelCheckpoints/ trained universal model (ResNetGAP_BestModel.pth + NESI_best_model.pth)
Bespoke_models/Results/ per-scale bespoke score CSVs
NESI-Medication-Analysis/ propofol Eleveld PK + medication-exposure derivatives
Cohort sizes
CAM-S (526) and ICANS (784) segment counts equal the final analysis sets; RASS (118,330) and
GCS (124,295) contain the full pre-filtering set. The manuscript valid-observation counts
(RASS 97,620; GCS 110,836; CAM-S 526; ICANS 784; total 209,766) match
NESI/Data/NESI_*_results.csv.
Index and source-EEG provenance
segment_index.csv has one row per paired observation with the segment filename,
subject/study id, assessment time, raw score, NESI value, and source-EEG provenance. Provenance is
resolved for every segment: complete for RASS/GCS (BidsFolder, SessionID, source
begin/end), and recovered for CAM-S (source start + 10-minute snippet window) and ICANS
(mapped to the companion ICANS dataset, s3://bdsp-opendata-credentialed/icans/, via
ICANSFiles.xlsx; BidsFolder gives the source path and
resolved_BDSPPatientID the patient id). source_eeg_files.csv lists the
unique continuous source recordings the segments were cut from.
Notes
Death/Cohort/Death_labels.csv is a per-patient mortality reference; the exact
mortality-analysis cohort (5,675 subjects, 1,333 in-hospital deaths) is defined by the
sequential-logistic-regression code in the GitHub repository. Per-cohort de-identified HEEDB
metadata are under each <cohort>/Cohort/.
Usage Notes
Code on GitHub
The complete preprocessing, training, evaluation, and figure/table code is at
https://github.com/bdsp-core/NESI (public; browseable without credentialed access). See
NESI/REPRODUCE.md for a figure/table→script map and
NESI/requirements.txt for the Python 3.9 environment. Trained universal and bespoke
model weights are committed in the repository under NESI/model/ModelCheckpoints/ and
NESI/Bespoke_models/ModelCheckpoints/.
Downloading
Access requires credentialed approval and the Data Use Agreement. With credentials configured, sync a cohort with the AWS CLI, e.g.:
aws s3 sync s3://bdsp-opendata-credentialed/yama/CAMS/ ./CAMS/ --profile opendata
Loading
import pandas as pd, scipy.io as sio
idx = pd.read_csv("segment_index.csv") # per-segment index + provenance
meta = pd.read_csv("CAMS/CAMSTraining_Final_Metadata.csv")
eeg = sio.loadmat("CAMS/CAMS_10minEEGSegments/<file>.mat") # 19 x samples @ 200 Hz
Assembled MORGOTH feature inputs and labels for modeling are in
NESI/Data/NESITripletIP_Morgoth_{train,val,test}_data.pkl with matching
NESI_{train,val,test}_results.csv.
Release Notes
Version 1.0.0 - initial release accompanying the NESI manuscript. Includes the curated 10-minute EEG segments, MORGOTH feature activations, assembled SPECTRA training/validation/test data, baseline model checkpoints, propofol pharmacokinetic and medication-exposure derivatives, and per-segment metadata, together with the full analysis code on GitHub.
Ethics
The study was conducted with ethical approval from the institutional review boards of Beth Israel Deaconess Medical Center (protocols #2022P000481 and #2022P000417) and Stanford University (protocol #69873), with a waiver of informed consent for retrospective analysis. All data are de-identified.
Acknowledgements
Arka Roy and Katelyn Surrao contributed equally as co-first authors. Eyal Y. Kimchi, Sahar F. Zafar, Christine A. Eckhardt, and M. Brandon Westover contributed equally as co-senior authors.
This work received support from the National Institutes of Health (R01AG073410, R01HL161253, R01NS126282, R01AG073598, R01NS131347, R01NS130119).
Conflicts of Interest
Dr. Westover is a co-founder of, serves as a scientific advisor and consultant to, and has a personal equity interest in Beacon Biosignals. The remaining authors report no competing interests.
Access
Access Policy:
Only credentialed users who sign the DUA can access the files.
License (for files):
BDSP Credentialed Health Data License 1.5.0
Data Use Agreement:
BDSP Credentialed Health Data Use Agreement
Required training:
Discovery
DOI:
https://doi.org/10.60508/rrbg-ba24
Project Website:
https://github.com/bdsp-core/NESI
Corresponding Author
Files
- be a credentialed user
- sign the data use agreement for the project