Database Credentialed Access

Association of Sleep Electroencephalography-Based Brain Age Index With Dementia - Data and code

Elissa Ye Haoqi Sun Michael Leone Luis Paixao Robert Thomas Alice Lam M Brandon Westover

Published: April 25, 2026. Version: 1.0.0


When using this resource, please cite: (show more options)
Ye, E., Sun, H., Leone, M., Paixao, L., Thomas, R., Lam, A., & Westover, M. B. (2026). Association of Sleep Electroencephalography-Based Brain Age Index With Dementia - Data and code (version 1.0.0). Brain Data Science Platform. https://doi.org/10.60508/17yf-6k87.

Abstract

Importance. Dementia is an increasing cause of disability and loss of independence in the elderly population yet remains largely underdiagnosed. A biomarker for dementia that can identify individuals with or at risk for developing dementia may help close this diagnostic gap.

Objective. To investigate the association between a sleep electroencephalography-based Brain Age Index (BAI) — the difference between chronological age and brain age estimated using the sleep electroencephalogram — and dementia.

Design, Setting, and Participants. A retrospective cross-sectional study of 9,834 polysomnograms (5,144 included in BAI examinations) acquired in the Sleep Laboratory at Massachusetts General Hospital from 2009 to 2017. Patients were grouped into dementia, mild cognitive impairment (MCI), symptomatic (cognitive symptoms but no diagnosis of MCI or dementia), nondementia, and healthy individuals using formal inclusion and exclusion criteria, determined by natural language processing applied to the electronic medical record.

Main Outcomes and Measures. The trend in BAI when moving from healthy through symptomatic to MCI to dementia, and pairwise comparisons of BAI among these groups.

Release. This project releases the de-identified dataset and analysis code used in the published study, so that the published findings can be independently reproduced from the released inputs and the BAI methodology can be applied to follow-on work. Please consult the published paper for the full results and clinical interpretation.


Background

Dementia is an increasing cause of disability and loss of independence in the elderly population. After age 65 years, the prevalence of dementia increases two-fold every five years. Despite this prevalence, dementia remains largely underdiagnosed. Biomarkers that use clinical testing commonly performed in elderly patients could help close this diagnostic gap. Electroencephalography (EEG) has been identified as a potential biomarker, but its use in dementia screening is not yet a part of clinical practice.

Sleep undergoes characteristic changes with age, reflected in the EEG: with aging, sleep becomes fragmented, the proportion of stage 1 sleep increases, and slow-wave sleep decreases. With neurodegenerative diseases, sleep becomes more fragmented, shows reduced slow-wave sleep and rapid eye movement (REM) sleep, and sleep spindles and vertex waves become less well-formed and less numerous.

The Brain Age Index (BAI) is a machine-learning model that estimates the difference between the computed brain age based on sleep EEG and chronological age (Sun et al.). Increased BAI signifies deviation from typical brain aging within reference ranges, suggesting that BAI may reflect the presence and severity of dementia. While other approaches to estimating brain age are based on magnetic resonance imaging (MRI) or awake EEG, this BAI is uniquely based on the sleep EEG. The gap between MRI-based brain age and chronological age is associated with risk of dementia and conversion of MCI to dementia. Sleep-EEG-based BAI has also been associated with neurological and psychiatric disease, hypertension, diabetes, and mortality.

This release accompanies an investigation of the association between sleep-EEG-based BAI and dementia, comparing BAI of patients with MCI, BAI of patients with cognitive symptoms but no diagnosis of dementia or MCI, and BAI of patients without dementia. The published paper additionally examines the association of BAI with neuropsychological scores, the contribution of dementia to increased BAI compared with other covariates, and correlations between dementia and EEG features in the brain-age algorithm.


Methods

Data set. The cross-sectional study retrospectively analyzed a data set of polysomnograms (PSGs) acquired in the Sleep Laboratory at Massachusetts General Hospital from 2009 to 2017, totaling 9,834 PSGs. PSGs were recorded adhering to American Academy of Sleep Medicine (AASM) standards. The data set contains three types of sleep tests: diagnostic, full-night continuous positive airway pressure (CPAP), and split-night continuous positive airway pressure. Polysomnograms were clinically annotated in 30-second epochs as wake, non-REM stage 1 (N1), non-REM stage 2 (N2), non-REM stage 3 (N3), and REM, per AASM standards.

Calculation of the Brain Age Index (BAI). The brain-age model is a generalized linear model that uses a softplus link function and is optimized to estimate an individual's chronological age based on an overnight EEG. As defined in Sun et al., a variety of EEG features are extracted, including spectral powers (absolute and relative or ratios) and measures of signal complexity. Per-PSG BAI is the difference between the computed brain age and the chronological age. Full technical details are provided in the eAppendix to the published paper and reproduced in the released code.

Clinical data extraction. Clinical data — demographics, diagnoses, medications, problem lists, and clinical notes — were extracted from questionnaires completed before the sleep study and from electronic medical records. Mini-Mental State Examination (MMSE) and Montreal Cognitive Assessment (MoCA) scores, when available, were extracted from clinical notes before or within one year after the sleep study using regular expressions.

Group assignment. Patients were categorized into dementia, MCI, symptomatic, nondementia, and healthy groups using formal inclusion and exclusion criteria (Table 1 of the published paper) determined by natural language processing applied to the electronic medical record using custom software. Briefly:

  • Dementia. ≥1 dementia-related medication with a diagnosis containing any dementia keyword or diagnosis in the problem list containing any dementia keyword or MoCA ≤19 with no MoCA score >27 after the sleep study or MMSE ≤25.
  • MCI. Diagnosis in the problem list containing any MCI keyword or MoCA 20–25 with no MoCA score >27 after the sleep study.
  • Symptomatic. Diagnosis containing any dementia or dementia-related keyword in the encounter diagnosis, problem list, and/or medical history (without meeting dementia/MCI criteria).
  • Nondementia. Does not belong to dementia, MCI, or symptomatic groups; may have a prior history of neurological or psychiatric disease in encounter diagnosis, problem list, and/or medical history.
  • Healthy. Subset of the nondementia group with no history of neurological or psychiatric disease in encounter diagnosis, problem list, and/or medical history.

Exclusions: age <50 years; diagnosis of developmental delay, brain tumor, or neoplasm; stroke, brain injury or trauma, or seizure before the sleep study. The study followed the STROBE reporting guideline for observational research.

Cohort sizes (after inclusion/exclusion) used in the BAI examination: 88 dementia, 44 MCI, 1,075 symptomatic, 3,024 nondementia, and 2,336 healthy participants (per Table 1 of the paper).

Reference. The full feature set, model fitting procedure, and statistical analyses are documented in the published paper (Ye et al. 2020) and its supplemental material, and reproduced in the released analysis code.


Data Description

The release contains both the analysis code and the underlying data.

Code. The analysis pipeline used to produce the published BAI results is in the GitHub repository bdsp-core/bai-dementia. The repository's README documents how to clone, install dependencies, point at the data, and reproduce each figure and table in the paper.

Data. Data are hosted in AWS S3 at s3://bdsp-opendata-credentialed/bai-dementia-sleep/. Access is granted to credentialed users after acceptance of the BDSP Credentialed Health Data Use Agreement (DUA). The release contains three CSV tables:

  • bai_per_psg_deid.csv — one row per included PSG (n ≈ 2,864). Columns: BDSPPatientID, HashID, DOVshifted (date of visit, per-patient shifted), ShiftedDays (per-patient deterministic offset preserving within-patient intervals), AgeAtPSG, BrainAgeYr (predicted brain age from the sleep-EEG model), BAI (= BrainAgeYr − AgeAtPSG), NumMissingStage30sEpochs, and a free-text Note field.
  • bai_psg_manifest.csv — one row per PSG file (n ≈ 13,283). Columns: BDSPPatientID, HashID, FileNameNew (de-identified file name), DOVshifted, Sex, AgeAtPSG, PSGType (diagnostic / full-night CPAP / split-night), group (dementia / MCI / symptomatic / nondementia / healthy assignment), session, and s3_path (where the corresponding signal file is staged).
  • cohort_v1_deid.csv — one row per assessed PSG with the full natural-language-processing-derived group-assignment evidence (n ≈ 22,985). Columns include the assignment outcome and the underlying signals used to reach it: encounter, ICD, problem-list, and medication-based dementia / MCI / symptomatic flags; subtype-specific dementia flags (Alzheimer's disease, vascular dementia, frontotemporal dementia, dementia with Lewy bodies, Parkinson's disease); CDR / MMSE / MoCA scores with their dates and the highest-after-visit follow-up scores used in the rule logic; exclusion flags. Users who want to relabel the cohort under a different rule set can do so directly from this file.

De-identification. All date/datetime values are shifted by a per-patient deterministic random offset (ShiftedDays column), so all within-patient intervals (time-to-event, longitudinal gaps) are preserved exactly. Direct identifiers were removed at source. The per-patient shift table is held only by the study PI under PHI controls and is not part of this release.

The PSGs themselves were recorded in the Sleep Laboratory at Massachusetts General Hospital between 2009 and 2017, scored per AASM standards, and de-identified prior to release.


Usage Notes

To reproduce the published results:

  1. Clone the analysis code: git clone https://github.com/bdsp-core/bai-dementia
  2. After being granted credentialed access (sign the DUA on this page, then await approval), download the data with the AWS CLI:
    aws s3 cp s3://bdsp-opendata-credentialed/bai-dementia-sleep/ ./data --recursive
  3. Update file paths in the repository's configuration to point at the local ./data directory.
  4. Run the BAI computation and downstream analyses as documented in the repository's README. bai_per_psg_deid.csv together with cohort_v1_deid.csv are sufficient to reproduce the published group comparisons; bai_psg_manifest.csv is the lookup from PSG IDs to their staged signal files.

Notes for follow-on work:

  • cohort_v1_deid.csv preserves the full evidence used to assign each patient to dementia / MCI / symptomatic / nondementia / healthy, including subtype-specific dementia flags and underlying CDR / MMSE / MoCA scores. Users can re-classify under different rules from these fields and align the result with bai_per_psg_deid.csv via BDSPPatientID / HashID.
  • Because date offsets preserve within-patient intervals, longitudinal analyses (e.g. within-patient progression of BAI) reproduce exactly.
  • BAI is a deviation index (years above or below expected brain age for chronological age); it is most informative when interpreted relative to the healthy reference subset.

Ethics

The Partners HealthCare institutional review board approved all study procedures and waived the requirement for informed consent for this retrospective data analysis because data were deidentified. The study is reported following the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline for observational research.

Acknowledgements

We thank the technologists of the Massachusetts General Hospital Sleep Laboratory for their work in acquiring and scoring the polysomnography studies that form the basis of this dataset.

Conflicts of Interest

M. Brandon Westover is a co-founder of, scientific advisor to, consultant for, and has a personal equity interest in Beacon Biosignals. The remaining authors declare no conflicts of interest related to this work.

References

  1. Ye E, Sun H, Leone MJ, Paixao L, Thomas RJ, Lam AD, Westover MB. Association of Sleep Electroencephalography-Based Brain Age Index With Dementia. JAMA Netw Open. 2020 Sep 1;3(9):e2017357. doi:10.1001/jamanetworkopen.2020.17357. PMID:32986106. PMCID:PMC7522697.
  2. Sun H, Paixao L, Oliva JT, et al. Brain age from the electroencephalogram of sleep. Neurobiol Aging. 2019;74:112-120. doi:10.1016/j.neurobiolaging.2018.10.016. (Original BAI paper.)

Share
Access

Access Policy:
Only credentialed users who sign the DUA can access the files.

License (for files):
BDSP Credentialed Health Data License 1.5.0

Data Use Agreement:
BDSP Credentialed Health Data Use Agreement

Required training:

Corresponding Author
You must be logged in to view the contact information.

Files