Database Restricted Access

The Boston Childrens Hospital Sleep Corpus

Ayush Tripathi Wolfgang Ganglberger Haoqi Sun Callison Alcott Niels Turley Rebecca Fitzgerald Ayan Mitra Samuel Waters Arnav Gupta Aditya Gupta Manohar Ghanta Valdery Moura Junior Samaneh Nasiri Bruce Nearing Katie Stone Emmanuel Mignot Dennis Hwang Matthew Reyna Zuzana Koscova Chad Robichaux Zhiyong Zhang Qiao Li Gauri Ganjoo Lynn Marie Trotti Gari Clifford Christine Tsien Silvers Bharath Gunapati Robert Thomas M Brandon Westover Kiran Maski Umakanth Katwa

Published: Aug. 6, 2025. Version: 1.0.1


When using this resource, please cite: (show more options)
Tripathi, A., Ganglberger, W., Sun, H., Alcott, C., Turley, N., Fitzgerald, R., Mitra, A., Waters, S., Gupta, A., Gupta, A., Ghanta, M., Moura Junior, V., Nasiri, S., Nearing, B., Stone, K., Mignot, E., Hwang, D., Reyna, M., Koscova, Z., ... Katwa, U. (2025). The Boston Childrens Hospital Sleep Corpus (version 1.0.1). Brain Data Science Platform. https://doi.org/10.60508/hjdt-fz71.

Abstract

The Boston Children’s Hospital (BCH) Sleep Corpus comprises 15,695 fully annotated pediatric polysomnography (PSG) recordings collected between 2010 and 2024. Each study includes multimodal physiological signals—EEG, EOG, EMG, ECG, respiratory effort, airflow, SpO, EtCO etc. along with expert sleep staging, arousal, respiratory event annotations (apneas, hypopneas, RERAs). Designed to support research in pediatric sleep, neurodevelopment, and machine learning, the corpus is publicly accessible via BDSP and accompanied by scripts for reproducing the figures in the manucript.


Background

Pediatric sleep disorders are increasingly recognized as critical contributors to developmental and health outcomes. Despite this, large-scale pediatric PSG datasets remain scarce, particularly with comprehensive, high-quality annotations across modalities and age groups. The BCH Sleep Corpus addresses this gap by offering a richly labeled, multimodal PSG dataset spanning infancy through early adolescence. It enables deep exploration of developmental sleep patterns, event dynamics, and neurological correlates—under real-world clinical conditions—supporting both clinical and computational neuroscience research.


Methods

This retrospective data analysis study was conducted under IRB protocol number (BIDMC: # 2016P000058 , MGH: # 2013P001024), with the MGH and BIDMC IRBs granting a waiver of consent.

Data Collection

  • Sample: 15,695 PSGs from patients evaluated at BCH from 2010 to 2024.
  • Equipment: Standard clinical sleep systems capturing multimodal signals (EEG, EOG, EMG, ECG, airflow, effort, SpO₂, EtCO₂).
  • Scoring: Sleep stages, arousals, and respiratory events annotated per AASM guidelines applicable at the time of study. Hypopneas were scored using the ≥30% airflow reduction with ≥3% desaturation and/or arousal criterion consistently.

De-identification and Processing

  • All EDF files were de-identified using MNE-Python, which by default resamples all channels to the highest native sampling rate for internal consistency.
  • Original annotations were preserved as scored; no retrospective re-scoring was conducted to harmonize across AASM versions.
  • The following processed derivative data are provided:
    • EDF: Full raw PSG signals.
    • HDF5: Harmonized, same sampling frequncy versions (~200Hz) with event- and sleep-stage-aligned arrays.
    • Demographics Metadata: Including patient age, sex, clinical codes.

 


Data Description

Channels and Signal Availability: All core channels (C3, C4, P3, P4, O1, O2, F3, F4, LOC/E1, ROC/E2, CHIN1, CHIN2, ECG, SpO, LAT, RAT, pressure, etc.) are present in nearly all studies. Channel availability is summarized in Table 4.

Annotations:

  • Sleep stages: Wake, N1, N2, N3 (merged N3+N4), REM, and UNSCORED epochs (e.g., before lights off, interruptions).
  • Respiratory events: Central, obstructive, mixed apneas and hypopneas; and RERAs.
  • Arousals: Over 1.17 million annotated

Developmental Trends: Spectrum analyses and event distributions demonstrate:

  • Rapid increase in EEG power during early infancy, plateauing by ~2 years.
  • Maturation of posterior dominant rhythm (PDR): emergence at 5–7Hz in infants, shifting to ~8Hz by early childhood.
  • Distinct patterns across sleep stages: N3 dominated by delta power, robust sigma spindles in N2, and age-related decreases in arousal frequency.

Access and Usage

  • Download via request and AWS access instructions on the portal.
  • Full PSG recordings (EDF), harmonized HDF5 files, demographic metadata, and code repository included.
  • Supporting GitHub repository contains scripts for reproducing figures from the manuscript

Directory Structure

I0003/
├── sub-I0003175516025/
│   ├── sub-I0003175516025_diseasediagnosis.csv
│   └── ses-1/
│       └── eeg/
│           ├── sub-I0003175516025_ses-1_task-psg_eeg.edf
│           ├── sub-I0003175516025_ses-1_sleepannotations.csv
│           ├── sub-I0003175516025_ses-1_eventannotations.csv
│           ├── sub-I0003175516025_ses-1_task-psg_channels.tsv
│           └── sub-I0003175516025_ses-1_task-psg_eeg.h5
├── sub-I0003175516118/
│   ├── sub-I0003175516118_diseasediagnosis.csv
│   ├── ses-1/
│   │   └── eeg/
│   │       ├── sub-I0003175516118_ses-1_task-psg_eeg.edf
│   │       ├── sub-I0003175516118_ses-1_sleepannotations.csv
│   │       ├── sub-I0003175516118_ses-1_eventannotations.csv
│   │       ├── sub-I0003175516118_ses-1_task-psg_channels.tsv
│   │       └── sub-I0003175516118_ses-1_task-psg_eeg.h5
│   └── ses-2/
│       └── ...
└── ...

 


Usage Notes

Data and code to generate all results and figures from the publication are provided here https://github.com/bdsp-core/BCH-PSG-dataset


Release Notes

In this dataset, all data were anonymized with all identifiable patient information removed.


Ethics

In this dataset, all data were anonymized with all identifiable patient information removed.


Acknowledgements

Thanks to the Sleep technologists, attending physicians, and fellows who provide diagnostic services.


Conflicts of Interest

Dr. Westover is a co-founder, scientific advisor, and consultant to, and has a personal equity interest in Beacon Biosignals. Dr. Clifford has received research funding from the NSF, NIH, and LifeBell AI, and unrestricted donations from AliveCor Inc, Amazon Research, the Center for Discovery, the Gates Foundation, Google, the Gordon and Betty Moore Foundation, MathWorks, Microsoft Research, NextSense Inc, One Mind Foundation, and the Rett Research Foundation. Dr Clifford has advisory roles and financial interests in AliveCor Inc and NextSense Inc. He is also the CTO of MindChild Medical with significant stock. These relationships are unconnected to the current work. Dr. Thomas is co-inventor of: 1) Cardiopulmonary sleep spectrogram to assess sleep stability/quality and sleep apnea, licensed by the Beth Israel deaconess Medical center to MyCardio, LLC; 2) Patent for Enhanced Expiratory Rebreathing Space to treat high loop gain sleep apnea; 3) Patent for estimating respiratory self-similarity for detection of high loop gain sleep apnea. 4) General sleep medicine consulting: GLG Councils, Guidepoint, Beacon Biosignals, Jazz Pharmaceuticals. Dr. Stone reports grant funding from Eli Lilly and is consultant for Axsome Therapeutics. Dr. Maski 1) is consultant for Alkermes, Avadel, Harmony Biosciences, Jazz Pharmaceuticals, Takeda Pharmaceuticals, 2) has grant funding Harmony Biosciences and Jazz Pharmaceuticals, 3) is DSMB chair for Idorsia, 4) collaborator on clinical trials sponsored by Alkermes and Takeda. These relationships are unconnected to the current work.


Share
Access

Access Policy:
Only registered users who sign the specified data use agreement can access the files.

License (for files):
BDSP Restricted Health Data License 1.0.0

Data Use Agreement:
BDSP Restricted Health Data Use Agreement

Corresponding Author
You must be logged in to view the contact information.
Versions

Files