Database Restricted Access
Harvard Electroencephalography Database
Sahar Zafar , Tobias Loddenkemper , Jong Woo Lee , Andrew Cole , Daniel Goldenholz , Jurriaan Peters , Alice Lam , Edilberto Amorim , Catherine Chu , Sydney Cash , Valdery Moura Junior , Aditya Gupta , Manohar Ghanta , Marta Fernandes , Haoqi Sun , Jin Jing , M Brandon Westover
Published: Nov. 7, 2023. Version: 2.0
When using this resource, please cite:
(show more options)
Zafar, S., Loddenkemper, T., Lee, J. W., Cole, A., Goldenholz, D., Peters, J., Lam, A., Amorim, E., Chu, C., Cash, S., Moura Junior, V., Gupta, A., Ghanta, M., Fernandes, M., Sun, H., Jing, J., & Westover, M. B. (2023). Harvard Electroencephalography Database (version 2.0). Brain Data Science Platform. https://doi.org/10.60508/g6m4-bf96.
The Harvard EEG Database will encompass data gathered from four hospitals affiliated with Harvard University: Massachusetts General Hospital (MGH), Brigham and Women's Hospital (BWH), Beth Israel Deaconess Medical Center (BIDMC), and Boston Children's Hospital (BCH). The EEG data includes three types:
- rEEG: "routine EEGs" recorded in the outpatient setting.
- EMU: recordings obtained in the inpatient setting, within the Epilepsy Monitoring Unit (EMU).
- ICU/LTM: recordings obtained from acutely and critically ill patients within the intensive care unit (ICU).
Electroencephalography (EEG) signals are valuable for diagnosing various conditions such as epilepsy, identifying the causes of encephalopathy, predicting the chances of consciousness recovery in patients with prolonged coma after cardiac arrest, assessing the level of consciousness in patients under anesthesia, and assessing sleep quality, among many other medical applications. This repository offers a large and diverse collection of real-world EEG data, serving as a resource for researchers to develop more effective methods for analyzing EEG signals. By sharing this data, our goal is to facilitate broader access to accurate EEG interpretation worldwide and foster research advancements in neurologic disorders. We hope that this repository will lead to broader access to EEG and to new knowledge that improves brain health for all.
Most EEG data in this repository is recorded using the International 10-20 system for scalp electrode placement. Sampling rates of recordings are provided in the EEG header files.
The Harvard Electroencephalography Database includes 164,707 EEG studies conducted on 65,167 distinct patients.
Dataset Folder Structure:
The folder structure follows the BIDS (Brain Imaging Data Structure) specification version 1.7.0 for organizing EEG (electroencephalogram) data collected from multiple sites.
There are four main levels of the folder hierarchy, these are:
bids -> sub-ID -> ses-ID -> eeg
Bids-root-folder/ └── dataset_description.json └── participants.json └── participants.tsv └── README └── sub-Id/ └── ses-01/ └── sub-SiteIdPatientId_ses-01_scans.tsv └── eeg └── sub-Id_ses-1_task-eeg_annotations.tsv └── sub-Id_ses-1_task-eeg_channels.tsv └── sub-Id_ses-1_task-eeg_eeg.edf └── sub-Id_ses-1_task-eeg_eeg.json └── sub-Id_ses-1_task-eeg_pre.csv
1. Top Level: BIDS (root-folder)
The top-level files provide metadata and general information about the dataset:
- dataset_description.json: A description of the dataset.
- participants.json: Metadata definitions for columns in participants.tsv.
- participants.tsv: A list of participants with demographic and physical details.
- README: General information and notes about the dataset.
2. Subject Level: sub-SiteIdPatientId
Each folder at this level represents a distinct patient. The subject ID is a combination of the study site ID and the patient's unique ID. All studies related to a specific patient can be found within their corresponding folder.
3. Session Level: ses-XX
Within each participant's folder, individual sessions correspond to separate EEG studies, labeled in chronological order.
- sub-SiteIdPatientId _ses-01_scans.tsv: lists all EEG file names and their acquisition time for a session.
4. EEG Data Level: eeg/
The EEG sub-directory within each session contains:
- Annotations: e.g., sub-SiteIdPatientId_ses-01_task-eeg_annotations.csv.
- Data File: e.g., sub-SiteIdPatientId_ses-01_task-eeg_eeg.edf.
- Metadata: e.g., sub-SiteIdPatientId_ses-01_task-eeg_eeg.json.
- Channels Description: e.g., sub-SiteIdPatientId_ses-01_task-eeg_channels.tsv.
Along with the dataset, a CSV file is provided to assist in identifying the locations of specific studies. This CSV can be found in the Files section below and is structured as follows:
|SiteID||Unique identifier of the hospital where the EEG was recorded.|
|BDSPPatientID||Unique identifier of the patient.|
|BidsFolder||Folder where studies for a specific patient are available in the BDSP OpenData Repository.|
|SessionID||Folder in the BDSP OpenData Repository containing a specific study and its auxiliary files for a particular patient.|
|CreationTime||De-identified timestamp indicating when the EEG was recorded.|
|StartTime||De-identified timestamp indicating when the EEG started.|
|EndTime||De-identified timestamp indicating when the EEG finished.|
|DurationInSecond||Duration of the EEG recording in seconds|
|HasXLTEKAnnotations||Flag indicating if the study has annotations created on Natus/XLTEK.|
|HasPersystAnnotations||Flag indicating if the study has annotations created on Persyst.|
|ServiceName||EEG type, can be Routine, LTM or EMU.|
|AgeAtVisit||Age of the patient at the time of the study.|
|SexDSC||Patient informed gender.|
|BDSPLastModifiedDTS||The last time the record was updated.|
Code for loading the EEG data is available in the associated GitHub repository (https://github.com/bdsp-core/Harvard-EEG-Database-Tools).
In this new release, the data has been converted to the EEG-BIDS format.
In this dataset, all data were anonymized with all identifiable patient information removed.
Thanks to the EEG technologists, attending physicians, and fellows who provide EEG diagnostic services.
Conflicts of Interest
This work was supported by grants from the NIH (R01NS102190, R01NS102574, R01NS107291, RF1AG064312, RF1NS120947, R01AG073410, R01HL161253, R01NS126282, R01AG073598).
- sign the data use agreement for the project