Database Credentialed Access

Dementia detection from brain activity during sleep - data and code

Elissa Ye Haoqi Sun Wolfgang Ganglberger Robert Thomas Alice Lam M Brandon Westover

Published: April 25, 2026. Version: 1.0.0


When using this resource, please cite: (show more options)
Ye, E., Sun, H., Ganglberger, W., Thomas, R., Lam, A., & Westover, M. B. (2026). Dementia detection from brain activity during sleep - data and code (version 1.0.0). Brain Data Science Platform. https://doi.org/10.60508/dysk-je18.

Abstract

Study Objectives. Dementia is a growing cause of disability and loss of independence in the elderly, yet remains largely underdiagnosed. Early detection and classification of dementia can help close this diagnostic gap and improve management of disease progression. Altered oscillations in brain activity during sleep are an early feature of neurodegenerative diseases and can be used to identify those on the verge of cognitive decline.

Methods. Our observational cross-sectional study used a clinical dataset of 10,784 polysomnograms from 8,044 participants. Sleep macro- and micro-structural features were extracted from the electroencephalogram (EEG). Microstructural features were engineered from spectral band powers, EEG coherence, spindles, and slow oscillations. Participants were classified as dementia (DEM), mild cognitive impairment (MCI), or cognitively normal (CN) based on clinical diagnosis, Montreal Cognitive Assessment, Mini-Mental State Exam scores, clinical dementia rating, and prescribed medications. We trained logistic regression, support vector machine, and random forest models to classify patients into DEM, MCI, and CN groups.

Results. For discriminating DEM versus CN, the best model achieved an area under receiver operating characteristic curve (AUROC) of 0.78 and area under precision-recall curve (AUPRC) of 0.22. For discriminating MCI versus CN, the best model achieved an AUROC of 0.73 and AUPRC of 0.18. For discriminating DEM or MCI versus CN, the best model achieved an AUROC of 0.76 and AUPRC of 0.32.

Conclusions. Our dementia classification algorithms show promise for incorporating dementia screening techniques using routine sleep EEG. The findings strengthen the concept of sleep as a window into neurodegenerative diseases.


Background

Among people aged 65 years and older in the United States, ~11% have dementia of various etiologies and ~16% have mild cognitive impairment (MCI). Despite its prevalence, dementia remains largely undiagnosed, with symptoms difficult to differentiate from normal consequences of aging. Early diagnosis can identify people at risk for complications and improve prognosis with early intervention.

Currently, clinical diagnosis of dementia relies on cognitive screening tests (Montreal Cognitive Assessment [MoCA], Mini-Mental State Examination [MMSE]), clinical history, laboratory tests, and neuroimaging. However, these tests are typically performed only when an underlying neurodegenerative disease has already manifested long enough to cause noticeable cognitive decline. A method to detect deteriorating brain health with high sensitivity before significant progression of cognitive decline is highly desirable.

Electroencephalography (EEG) is a low-cost, noninvasive technology that measures brain electrical activity. Quantitative EEG metrics have been identified as a potential biomarker in early detection of dementia. Several studies have used spectral and/or nonlinear features of awake EEG to detect early signs of dementia. However, few studies have systematically evaluated the discriminative power of sleep EEG features at scale, although sleep disturbances are recognized as risk factors and early clinical symptoms of neurodegenerative disorders. Sleep architecture changes (sleep fragmentation, increased awakenings, REM latency, reductions in slow-wave and REM sleep), micro-structural changes (EEG slowing, decreased θ and δ band power during slow-wave sleep, decreased EEG coherence), and deterioration of sleep spindles during NREM sleep have been observed in demented patients.

This project releases the polysomnography dataset and analysis code from Ye et al. 2023 to enable reproduction of the published results and to support follow-on work on sleep-EEG-based screening for neurodegenerative disease.


Methods

Dataset. Observational cross-sectional study using polysomnography (PSG) recorded for clinical purposes in the Sleep Laboratory at Massachusetts General Hospital from 2009 to 2019. The dataset includes 22,991 PSGs from 17,279 participants and contains three major types of sleep tests: diagnostic, full-night titration, and split-night titration. PSGs were recorded and scored adhering to American Academy of Sleep Medicine guidelines. Each PSG was annotated by an experienced sleep technician; every 30-second nonoverlapping epoch was classified as Wake (W), NREM stage 1 (N1), NREM stage 2 (N2), NREM stage 3 (N3), or REM.

Clinical data extraction. Demographics, encounter diagnoses, medications, and clinical notes were extracted from completed pre-sleep-study questionnaires and from the electronic medical record. CDR global scale, MMSE, and MoCA scores when available were extracted from clinical notes using in-house software. Obstructive sleep apnea was defined as apnea-hypopnea index ≥5 based on the PSG report.

Dementia staging. Inclusion criteria for DEM, MCI, and CN groups were based on data entered in the medical record before the sleep study or at most one year after the sleep study unless otherwise stated. We excluded any encounter diagnosis containing the keyword “family history.” Participants were assigned to a group using the criteria met closest to the date of sleep study. Cases with conflicting evidence were resolved by expert chart review and iteratively refined criteria, ultimately predicting DEM and MCI with a false positive rate of 0% and discriminating between them with accuracy >80%.

EEG preprocessing. Six EEG channels (F3-M2, F4-M1, C3-M2, C4-M1, O1-M2, O2-M1) were resampled to 200 Hz, notch-filtered at 60 Hz, and bandpass filtered at 0.5 to 20 Hz. To minimize artifacts, 30-second epochs with maximum absolute amplitude greater than 500 µV, or epochs containing more than 2 s of flat signal (standard deviation less than 0.2 µV), were excluded. Epochs with strong narrow-band spectral artifacts were also excluded.

Macro-structure features. 36 features from the hypnogram including total resting time, total sleep time, duration of sleep stages, percent of time spent in sleep stages, sleep efficiency index, sleep onset latency, wake after sleep onset, REM latency, number of awakenings, number of stage shifts to N1 from NREM/REM sleep, and sleep fragmentation index.

Micro-structural features. 510 features per epoch (line length, kurtosis, sample entropy, min/max/mean/SD across 2-second sub-epochs, relative δ, θ, α band powers, δ/θ, δ/α, θ/α power ratios, kurtosis of δ, θ, α, σ bands), averaged over frontal/central/occipital regions and per sleep stage. Missing-stage features were imputed using the 10 nearest epochs by Euclidean distance.

Individual EEG frequencies and α sub-band powers. Transition frequency (TF) and individual alpha frequency (IAF) were extracted from Wake and N1 epochs. We defined α sub-band ranges α1 (TF to TF-IAF midpoint), α2 (TF-IAF midpoint to IAF), and α3 (IAF to IAF + 2 Hz), then computed the power spectral density of α1, α2, and α3 sub-bands and the α3/α2 ratio per stage and region (36 features).

Spindle and slow-wave oscillation features. Spindle and slow-oscillation patterns during N2 sleep were detected using Luna software (228 features).

EEG coherence features. EEG coherence (Welch’s method) computed across all 15 channel pairs averaged for δ, θ, α, and σ bands per sleep stage (300 features).

Group matching and classification. Two matching individuals were sampled per DEM participant by sex and age within 5 years. Logistic regression, support vector machine, and random forest classifiers were trained for the binary tasks DEM vs CN, MCI vs CN, and DEM/MCI vs CN. Performance was reported as AUROC and AUPRC.


Data Description

The release contains both the analysis code and the underlying data.

Code. All preprocessing, feature extraction, model training, and evaluation scripts that produced the results in Ye et al. 2023 are available in the GitHub repository bdsp-core/dementia-detection-from-sleep. The repository's README documents how to clone, install dependencies, point at the data, and reproduce each figure and table in the paper.

Data. Data are hosted in AWS S3 at s3://bdsp-opendata-credentialed/sleep-dementia-detection/. Access is granted to credentialed users after acceptance of the BDSP Credentialed Health Data Use Agreement (DUA).

The release consists of precomputed feature tables and de-identified clinical metadata for the 10,784 polysomnograms from 8,044 participants used in the paper. Specifically:

Per-participant identification & outcome:

  • mastersheet_outcome_deid.xlsx — one row per participant, with de-identified ID, demographics (age binned, sex), assigned diagnostic group (DEM / MCI / CN), MoCA / MMSE / CDR scores when available, and dementia-medication flags.
  • study_groups_deid.csv — mapping from PSG/study to participant and the inclusion-criterion path used to assign group.
  • psg_manifest.csv and psg_manifest_unresolved.csv — PSG-level manifest listing the 10,784 included sleep studies with their dates and study types (diagnostic / full-night titration / split-night titration), and the smaller set of records that could not be unambiguously linked to a participant.

Clinical context:

  • comorbidities_deid.csv — per-participant comorbidity flags (obstructive sleep apnea, depression, hypertension, etc.) extracted from the EHR.
  • dementia_diagnosis_dates.csv — per-participant date(s) of first dementia / MCI diagnosis used in the inclusion-criterion logic.

Precomputed sleep-EEG feature tables (one row per 30-second epoch summarised to per-PSG averages per region/stage, exactly as used in the paper):

  • features_macro_deid.csv — 36 macro-structure (hypnogram) features.
  • features_alpha_deid.csv — 36 individual EEG-frequency and α sub-band features.
  • features_coherence_deid.csv — 300 EEG coherence features (15 channel pairs × 4 bands × 5 stages).
  • features_brain_age_deid.csv — sleep-EEG brain-age index features used in the companion analysis.
  • features_MGH_deid.csv — auxiliary MGH-specific feature set.
  • features_full_deid.csv — the 510 + 228 + 300 + 36 + 36 unified feature matrix used by the published classifiers.

All studies were recorded in the Sleep Laboratory at Massachusetts General Hospital between 2009 and 2019. Raw polysomnography signals were de-identified, processed with the pipeline in the GitHub repository, and the resulting feature tables released here. Releasing the precomputed features allows users to reproduce the published classification results in minutes rather than re-running the (computationally expensive) feature-extraction stage on raw signals.


Usage Notes

To reproduce the published results:

  1. Clone the analysis code: git clone https://github.com/bdsp-core/dementia-detection-from-sleep
  2. After being granted credentialed access (sign the DUA on this page, then await approval), download the data with the AWS CLI:
    aws s3 cp s3://bdsp-opendata-credentialed/sleep-dementia-detection/ ./data --recursive
  3. Update file paths in the repository's configuration to point at your local ./data directory.
  4. Run the classification pipeline (features_full_deid.csv + mastersheet_outcome_deid.xlsx are sufficient inputs for the published classifiers).

Key files for common analyses:

  • To reproduce the published classifiers: features_full_deid.csv + group labels from mastersheet_outcome_deid.xlsx + the matched-pair script in the repository.
  • To study one feature family: use the per-family files (features_macro_deid.csv, features_alpha_deid.csv, features_coherence_deid.csv, etc.).
  • To relabel participants under a different criterion (e.g. score-only thresholds): use mastersheet_outcome_deid.xlsx together with comorbidities_deid.csv and dementia_diagnosis_dates.csv.

Notes:

  • This release does not include raw EDF/HDF5 polysomnography signals. The pipeline that produced the released feature tables from raw signals is documented and runnable in the GitHub repository, but raw-signal access requires a separate request.
  • Inclusion/exclusion criteria for the DEM, MCI, and CN groups are documented in the paper (Table 1) and reproducible from the released metadata.
  • The matched-pair design used in the paper (two CN per DEM, sex- and age-matched within 5 years) is reproducible from mastersheet_outcome_deid.xlsx.

Ethics

The Massachusetts General Brigham Institutional Review Board approved all study procedures and waived the requirement for informed consent for this retrospective study.

Acknowledgements

We thank the technologists of the Massachusetts General Hospital Sleep Laboratory for their work in acquiring and scoring the polysomnography studies that form the basis of this dataset.

Conflicts of Interest

M. Brandon Westover is a co-founder of and consultant for Beacon Biosignals, which develops EEG analytics software. The remaining authors declare no competing interests related to this work.

References

  1. Ye EM, Sun H, Krishnamurthy PV, Adra N, Ganglberger W, Thomas RJ, Lam AD, Westover MB. Dementia detection from brain activity during sleep. Sleep. 2023 Mar 9;46(3):zsac286. doi:10.1093/sleep/zsac286. PMID:36448766. PMCID:PMC9995788.

Share
Access

Access Policy:
Only credentialed users who sign the DUA can access the files.

License (for files):
BDSP Credentialed Health Data License 1.5.0

Data Use Agreement:
BDSP Credentialed Health Data Use Agreement

Required training:

Corresponding Author
You must be logged in to view the contact information.

Files