Database Restricted Access
Harvard Electroencephalography Database
Sahar Zafar , Tobias Loddenkemper , Jong Woo Lee , Andrew Cole , Daniel Goldenholz , Jurriaan Peters , Alice Lam , Edilberto Amorim , Catherine Chu , Sydney Cash , Valdery Moura Junior , Aditya Gupta , Manohar Ghanta , Marta Fernandes , Haoqi Sun , Jin Jing , M Brandon Westover
Published: Nov. 7, 2023. Version: 2.0
When using this resource, please cite:
(show more options)
Zafar, S., Loddenkemper, T., Lee, J. W., Cole, A., Goldenholz, D., Peters, J., Lam, A., Amorim, E., Chu, C., Cash, S., Moura Junior, V., Gupta, A., Ghanta, M., Fernandes, M., Sun, H., Jing, J., & Westover, M. B. (2023). Harvard Electroencephalography Database (version 2.0). Brain Data Science Platform. https://doi.org/10.60508/g6m4-bf96.
Abstract
The Harvard EEG Database will encompass data gathered from four hospitals affiliated with Harvard University: Massachusetts General Hospital (MGH), Brigham and Women's Hospital (BWH), Beth Israel Deaconess Medical Center (BIDMC), and Boston Children's Hospital (BCH). The EEG data includes three types:
- rEEG: "routine EEGs" recorded in the outpatient setting.
- EMU: recordings obtained in the inpatient setting, within the Epilepsy Monitoring Unit (EMU).
- ICU/LTM: recordings obtained from acutely and critically ill patients within the intensive care unit (ICU).
Background
Electroencephalography (EEG) signals are valuable for diagnosing various conditions such as epilepsy, identifying the causes of encephalopathy, predicting the chances of consciousness recovery in patients with prolonged coma after cardiac arrest, assessing the level of consciousness in patients under anesthesia, and assessing sleep quality, among many other medical applications. This repository offers a large and diverse collection of real-world EEG data, serving as a resource for researchers to develop more effective methods for analyzing EEG signals. By sharing this data, our goal is to facilitate broader access to accurate EEG interpretation worldwide and foster research advancements in neurologic disorders. We hope that this repository will lead to broader access to EEG and to new knowledge that improves brain health for all.
Methods
Most EEG data in this repository is recorded using the International 10-20 system for scalp electrode placement. Sampling rates of recordings are provided in the EEG header files.
Data Description
The Harvard Electroencephalography Database includes 164,707 EEG studies conducted on 65,167 distinct patients.
Dataset Folder Structure:
The folder structure follows the BIDS (Brain Imaging Data Structure) specification version 1.7.0 for organizing EEG (electroencephalogram) data collected from multiple sites.
There are four main levels of the folder hierarchy, these are:
bids -> sub-ID -> ses-ID -> eeg
Bids-root-folder/
└── dataset_description.json
└── participants.json
└── participants.tsv
└── README
└── sub-Id/
└── ses-01/
└── sub-SiteIdPatientId_ses-01_scans.tsv
└── eeg
└── sub-Id_ses-1_task-eeg_annotations.tsv
└── sub-Id_ses-1_task-eeg_channels.tsv
└── sub-Id_ses-1_task-eeg_eeg.edf
└── sub-Id_ses-1_task-eeg_eeg.json
└── sub-Id_ses-1_task-eeg_pre.csv
Description:
1. Top Level: BIDS (root-folder)
The top-level files provide metadata and general information about the dataset:
- dataset_description.json: A description of the dataset.
- participants.json: Metadata definitions for columns in participants.tsv.
- participants.tsv: A list of participants with demographic and physical details.
- README: General information and notes about the dataset.
2. Subject Level: sub-SiteIdPatientId
Each folder at this level represents a distinct patient. The subject ID is a combination of the study site ID and the patient's unique ID. All studies related to a specific patient can be found within their corresponding folder.
3. Session Level: ses-XX
Within each participant's folder, individual sessions correspond to separate EEG studies, labeled in chronological order.
- sub-SiteIdPatientId _ses-01_scans.tsv: lists all EEG file names and their acquisition time for a session.
4. EEG Data Level: eeg/
The EEG sub-directory within each session contains:
- Annotations: e.g., sub-SiteIdPatientId_ses-01_task-eeg_annotations.csv.
- Data File: e.g., sub-SiteIdPatientId_ses-01_task-eeg_eeg.edf.
- Metadata: e.g., sub-SiteIdPatientId_ses-01_task-eeg_eeg.json.
- Channels Description: e.g., sub-SiteIdPatientId_ses-01_task-eeg_channels.tsv.
Metadata File
Along with the dataset, a CSV file is provided to assist in identifying the locations of specific studies. This CSV can be found in the Files section below and is structured as follows:
Column Name |
Description |
---|---|
SiteID | Unique identifier of the hospital where the EEG was recorded. |
BDSPPatientID | Unique identifier of the patient. |
BidsFolder | Folder where studies for a specific patient are available in the BDSP OpenData Repository. |
SessionID | Folder in the BDSP OpenData Repository containing a specific study and its auxiliary files for a particular patient. |
CreationTime | De-identified timestamp indicating when the EEG was recorded. |
StartTime | De-identified timestamp indicating when the EEG started. |
EndTime | De-identified timestamp indicating when the EEG finished. |
DurationInSecond | Duration of the EEG recording in seconds |
HasXLTEKAnnotations | Flag indicating if the study has annotations created on Natus/XLTEK. |
HasPersystAnnotations | Flag indicating if the study has annotations created on Persyst. |
ServiceName | EEG type, can be Routine, LTM or EMU. |
AgeAtVisit | Age of the patient at the time of the study. |
SexDSC | Patient informed gender. |
BDSPLastModifiedDTS | The last time the record was updated. |
Usage Notes
Code for loading the EEG data is available in the associated GitHub repository (https://github.com/bdsp-core/Harvard-EEG-Database-Tools).
Release Notes
In this new release, the data has been converted to the EEG-BIDS format.
Ethics
In this dataset, all data were anonymized with all identifiable patient information removed.
Acknowledgements
Thanks to the EEG technologists, attending physicians, and fellows who provide EEG diagnostic services.
Conflicts of Interest
This work was supported by grants from the NIH (R01NS102190, R01NS102574, R01NS107291, RF1AG064312, RF1NS120947, R01AG073410, R01HL161253, R01NS126282, R01AG073598).
Access
Access Policy:
Only registered users who sign the specified data use agreement can access the files.
License (for files):
BDSP Restricted Health Data License 1.0.0
Data Use Agreement:
BDSP Restricted Health Data Use Agreement
Discovery
Corresponding Author
Files
- sign the data use agreement for the project