Database Restricted Access

Harvard Electroencephalography Database

Sahar Zafar Tobias Loddenkemper Jong Woo Lee Andrew Cole Daniel Goldenholz Jurriaan Peters Alice Lam Edilberto Amorim Catherine Chu Sydney Cash Valdery Moura Junior Aditya Gupta Manohar Ghanta Marta Fernandes Haoqi Sun Jin Jing M Brandon Westover

Published: Nov. 7, 2023. Version: 2.0


When using this resource, please cite: (show more options)
Zafar, S., Loddenkemper, T., Lee, J. W., Cole, A., Goldenholz, D., Peters, J., Lam, A., Amorim, E., Chu, C., Cash, S., Moura Junior, V., Gupta, A., Ghanta, M., Fernandes, M., Sun, H., Jing, J., & Westover, M. B. (2023). Harvard Electroencephalography Database (version 2.0). Brain Data Science Platform. https://doi.org/10.60508/g6m4-bf96.

Abstract

The Harvard EEG Database will encompass data gathered from four hospitals affiliated with Harvard University: Massachusetts General Hospital (MGH), Brigham and Women's Hospital (BWH), Beth Israel Deaconess Medical Center (BIDMC), and Boston Children's Hospital (BCH). The EEG data includes three types:

  • rEEG: "routine EEGs" recorded in the outpatient setting.
  • EMU: recordings obtained in the inpatient setting, within the Epilepsy Monitoring Unit (EMU).
  • ICU/LTM: recordings obtained from acutely and critically ill patients within the intensive care unit (ICU).

Background

Electroencephalography (EEG) signals are valuable for diagnosing various conditions such as epilepsy,  identifying the causes of encephalopathy, predicting the chances of consciousness recovery in patients with prolonged coma after cardiac arrest, assessing the level of consciousness in patients under anesthesia, and assessing sleep quality, among many other medical applications. This repository offers a large and diverse collection of real-world EEG data, serving as a resource for researchers to develop more effective methods for analyzing EEG signals. By sharing this data, our goal is to facilitate broader access to accurate EEG interpretation worldwide and foster research advancements in neurologic disorders. We hope that this repository will lead to broader access to EEG and to new knowledge that improves brain health for all.


Methods

Most EEG data in this repository is recorded using the International 10-20 system for scalp electrode placement. Sampling rates of recordings are provided in the EEG header files. 


Data Description

The Harvard Electroencephalography Database includes 164,707 EEG studies conducted on 65,167 distinct patients.

Dataset Folder Structure:

The folder structure follows the BIDS (Brain Imaging Data Structure) specification version 1.7.0 for organizing EEG (electroencephalogram) data collected from multiple sites.

There are four main levels of the folder hierarchy, these are:

bids -> sub-ID -> ses-ID -> eeg

Bids-root-folder/
	└── dataset_description.json
	└── participants.json
	└── participants.tsv
	└── README
	└── sub-Id/
		└── ses-01/
			└── sub-SiteIdPatientId_ses-01_scans.tsv
			└── eeg
				└── sub-Id_ses-1_task-eeg_annotations.tsv
				└── sub-Id_ses-1_task-eeg_channels.tsv
				└── sub-Id_ses-1_task-eeg_eeg.edf
				└── sub-Id_ses-1_task-eeg_eeg.json 
				└── sub-Id_ses-1_task-eeg_pre.csv 

 

Description:

1. Top Level: BIDS (root-folder)

The top-level files provide metadata and general information about the dataset:

  • dataset_description.json: A description of the dataset.
  • participants.json: Metadata definitions for columns in participants.tsv.
  • participants.tsv: A list of participants with demographic and physical details.
  • README: General information and notes about the dataset.

2. Subject Level: sub-SiteIdPatientId

Each folder at this level represents a distinct patient. The subject ID is a combination of the study site ID and the patient's unique ID. All studies related to a specific patient can be found within their corresponding folder.

3. Session Level: ses-XX

Within each participant's folder, individual sessions correspond to separate EEG studies, labeled in chronological order.

  • sub-SiteIdPatientId _ses-01_scans.tsv: lists all EEG file names and their acquisition time for a session.

4. EEG Data Level: eeg/

The EEG sub-directory within each session contains:

  • Annotations: e.g., sub-SiteIdPatientId_ses-01_task-eeg_annotations.csv.
  • Data File: e.g., sub-SiteIdPatientId_ses-01_task-eeg_eeg.edf.
  • Metadata: e.g., sub-SiteIdPatientId_ses-01_task-eeg_eeg.json.
  • Channels Description: e.g., sub-SiteIdPatientId_ses-01_task-eeg_channels.tsv.

Metadata File

Along with the dataset, a CSV file is provided to assist in identifying the locations of specific studies. This CSV can be found in the Files section below and is structured as follows:

Column Name

Description

SiteID Unique identifier of the hospital where the EEG was recorded.
BDSPPatientID Unique identifier of the patient.
BidsFolder Folder where studies for a specific patient are available in the BDSP OpenData Repository.
SessionID Folder in the BDSP OpenData Repository containing a specific study and its auxiliary files for a particular patient.
CreationTime De-identified timestamp indicating when the EEG was recorded.
StartTime De-identified timestamp indicating when the EEG started.
EndTime De-identified timestamp indicating when the EEG finished.
DurationInSecond Duration of the EEG recording in seconds
HasXLTEKAnnotations Flag indicating if the study has annotations created on Natus/XLTEK.
HasPersystAnnotations Flag indicating if the study has annotations created on Persyst.
ServiceName EEG type, can be Routine, LTM or EMU.
AgeAtVisit Age of the patient at the time of the study.
SexDSC Patient informed gender.
BDSPLastModifiedDTS The last time the record was updated.

Usage Notes

Code for loading the EEG data is available in the associated GitHub repository (https://github.com/bdsp-core/Harvard-EEG-Database-Tools). 
 


Release Notes

In this new release, the data has been converted to the EEG-BIDS format.


Ethics

In this dataset, all data were anonymized with all identifiable patient information removed.


Acknowledgements

Thanks to the EEG technologists, attending physicians, and fellows who provide EEG diagnostic services. 


Conflicts of Interest

This work was supported by grants from the NIH (R01NS102190, R01NS102574, R01NS107291, RF1AG064312, RF1NS120947, R01AG073410, R01HL161253, R01NS126282, R01AG073598). 
 


Share
Access

Access Policy:
Only registered users who sign the specified data use agreement can access the files.

License (for files):
BDSP Restricted Health Data License 1.0.0

Data Use Agreement:
BDSP Restricted Health Data Use Agreement

Corresponding Author
You must be logged in to view the contact information.
Versions
  • 1.0 - June 15, 2023
  • 2.0 - Nov. 7, 2023

Files