Database Credentialed Access
Harvard-Emory ECG Database
Published: Sept. 8, 2023. Version: 1.0
When using this resource, please cite:
(show more options)
Moura Junior, V., Reyna, M., Hong, S., Gupta, A., Ghanta, M., Sameni, R., Rosand, J., Aguirre, A., Li, Q., Clifford, G., & Westover, M. B. (2023). Harvard-Emory ECG Database (version 1.0). Brain Data Science Platform. https://doi.org/10.60508/g072-7n95.
The Harvard-Emory ECG database (HEEDB) is a large collection of 12-lead electrocardiography (ECG) recordings, prepared through a collaboration between Harvard University and Emory University investigators.
These ECGs are provided without labels or metadata for now, to enable pre-training of ECG analysis models. Labels and metadata will be provided in a subsequent installment of this dataset. Labels and metadata are withheld for now while we prepare them for a public computing challenge. Stay tuned for an announcement about the challenge.
HEEDB is published as part of the Human Sleep Project (HSP), funded by a grant (R01HL161253) from the National Heart Lung and Blood Institute (NHLBI) of the NIH to Massachusetts General Hospital, Emory University, Stanford University, Kaiser Permanente, Boston Children's Hospital, and Beth Israel Deaconess Medical Center.
These ECG data include clinical ECGs captured during routine clinical care over several decades, as well as associated covariates, treatment, diagnoses and outcomes. These are intended to be used to determine associations between cardiac abnormalities (e.g. abnormal rhythms) and sleep, sleep-related medical conditions, and health outcomes.
Data acquisition: These ECGs are 12-lead recordings. Most are of 10 seconds duration at 500 Hz. They were collected beginning in the 1990s and now. At the time of initial publication, the database includes 10,771,552 ECGs from 1,818,247 unique patients. New ECGs will be added periodically. All ECGs were collected in the course of routine clinical care.
Data preprocessing: Data was de-identified following the Safe Harbor method.
Data were converted to WFDB (Waveform Database) and Matlab (V4) compatible format. Each ECG recording includes one waveform data file (.mat) and one header file (.hea). The waveform data file can be read by WFDB library functions, applications, Toolbox, or be loaded to Matlab directly. Most waveform files are synchronized 12-lead ECG signals recorded at 500Hz for 10 s. The header file specifies the names of the associated waveform files and their attributes. It contains line-oriented and field-oriented ASCII text and can be read by the WFDB library or generic text editors.
General metadata: Information about ECG data, such as subject ID, sex, age, and (shifted) acquisition date. Person ID is a de-personalized unique identifier.
Data contained in WFDB format are: Waveform data: 12-lead ECG signals recorded at 500 Hz for 10 s. The header file contains general information about the signal, such as sampling rate and units, including the channel names of the signal, and the data file contains 12-lead information encoded in 16 bits.
Note that some metadata is being held back for now, because this dataset will be used in a public challenge in the future. For now, the raw signals are made available for pre-training of models.
HEEDB is intended to support a wide range of ECG studies, in particular those exploring the relationship between ECG conditions and sleep. Python code for working with the HEEDB is available on GitHub.
v1.0: ECGs: 10,771,552; unique subjects: 1,818,247
The study protocol was approved by the Institutional Review Board of Massachusetts General Hospital. The written informed consents were waived, because of the retrospective study design with minimal risk to participants. The study also complied with the Declaration of Helsinki.
Publication of HEEDB is supported by a grant (R01HL161253) from the National Heart Lung and Blood Institute (NHLBI) of the NIH to Massachusetts General Hospital, Emory University, Stanford University, Kaiser Permanente, Boston Children's Hospital, and Beth Israel Deaconess Medical Center
Conflicts of Interest
The authors declare that they have no conflicts of interest.