Database Credentialed Access
Harvard-Emory ECG Database
Zuzana Koscova , Valdery Moura Junior , Matthew Reyna , Shenda Hong , Aditya Gupta , Manohar Ghanta , Reza Sameni , Jonathan Rosand , Aaron Aguirre , Qiao Li , Gari Clifford , M Brandon Westover
Published: Nov. 6, 2024. Version: 2.0
When using this resource, please cite:
(show more options)
Koscova, Z., Moura Junior, V., Reyna, M., Hong, S., Gupta, A., Ghanta, M., Sameni, R., Rosand, J., Aguirre, A., Li, Q., Clifford, G., & Westover, M. B. (2024). Harvard-Emory ECG Database (version 2.0). Brain Data Science Platform. https://doi.org/10.60508/13rj-5d45.
Abstract
The Harvard-Emory ECG database (HEEDB) is a large collection of 12-lead electrocardiography (ECG) recordings, prepared through a collaboration between Harvard University and Emory University investigators.
In version 1.0 of the database, these ECGs were provided without labels or metadata, to enable pre-training of ECG analysis models.
In version 2.0, labels and metadata are included.
HEEDB is published as part of the Human Sleep Project (HSP), funded by a grant (R01HL161253) from the National Heart Lung and Blood Institute (NHLBI) of the NIH to Massachusetts General Hospital, Emory University, Stanford University, Kaiser Permanente, Boston Children's Hospital, and Beth Israel Deaconess Medical Center.
Background
These ECG data include clinical ECGs captured during routine clinical care over several decades. These are intended to be used to determine associations between cardiac abnormalities (e.g. abnormal rhythms) and sleep, sleep-related medical conditions, and health outcomes.
Methods
Data acquisition: These ECGs are 12-lead recordings. Most are of 10 seconds duration at 500 Hz. They were collected beginning in the 1990s and now. At the time of initial publication, the database includes 10,771,552 ECGs from 1,818,247 unique patients. New ECGs will be added periodically. All ECGs were collected in the course of routine clinical care.
Data preprocessing: Data was de-identified following the Safe Harbor method.
Data Description
Data were converted to WFDB (Waveform Database) and Matlab (V4) compatible format. Each ECG recording includes one waveform data file (.mat) and one header file (.hea). The waveform data file can be read by WFDB library functions, applications, Toolbox, or be loaded to Matlab directly. Most waveform files are synchronized 12-lead ECG signals recorded at 500Hz for 10 s. The header file specifies the names of the associated waveform files and their attributes. It contains line-oriented and field-oriented ASCII text and can be read by the WFDB library or generic text editors.
General metadata: Information about ECG data, such as subject ID, sex, age, and (shifted) acquisition date. Person ID is a de-personalized unique identifier.
ECG data contained in WFDB format are: Waveform data: 12-lead ECG signals recorded at 500 Hz for 10 s. The header file contains general information about the signal, such as sampling rate and units, including the channel names of the signal, and the data file contains 12-lead information encoded in 16 bits.
Metadata files:
-
demographics_ECG.csv: the column names are as follows (self-explanatory). Note that all dates are de-identified (shifted). However, all dates for a given subject are shifted in a consistent manner.
-
BDSPPatientID
DateOfBirth
DateOfDeath
DateOfDeathMARegistryData
LastKnownVisitDate
PatientRace
EthnicGroupDSC
MaritalStatusDSC
ReligionDSC
LanguageDSC
VeteranStatusDSC
SexDSC
PrimaryCauseOfDeathDSC
UNOSPrimaryCauseOfDeathTXT
FirstContributoryCauseOfDeathDSC
UNOSContributoryCauseOfDeath01TXT
SecondContributoryCauseOfDeathDSC
UNOSContributoryCauseOfDeath02TXT
EducationLevelDSC
GenderIdentityDSC
SexAssignedAtBirthDSC
BDSPLastModifiedDTS
RecordingTime
FileName
Age
-
-
ecg_interpretations.psv: pipe (|) separated value file containing ECG interpretations ("diagnoses"), expressed in the Marquette 12SL ("12SL") ECG Analysis Program (GE Healthcare) version 4 language. These annotations consist of textual reports that describe the morphology, rhythm, and diagnostic information of the ECGs. Each annotation includes statement numbers that correspond to human-readable diagnoses. Columns of this file are:
-
BDSPID: subject ID
-
file_source: name of the ECG file
-
diagnostic_codes: 12SL codes
-
diagnostic_text: human-readable 12SL text
-
measurements: HR, PR, QRSD, QT, QTc, Pax, Rax, Tax
-
-
diagnoses_dictionary.csv: columns include
-
codes (12SL integer codes)
-
acronym (human readable)
-
diagnoses (human readable)
-
Usage Notes
HEEDB is intended to support a wide range of ECG studies, in particular those exploring the relationship between ECG conditions and sleep. Python code for working with the HEEDB is available on GitHub.
Release Notes
v1.0: ECGs: 10,771,552; unique subjects: 1,818,247
v2.0: new data added: ecg_interpretations.psv, diagnoses_dictionary.csv, demographics_ECG.csv
Ethics
The study protocol was approved by the Institutional Review Board of Massachusetts General Hospital. The written informed consents were waived, because of the retrospective study design with minimal risk to participants. The study also complied with the Declaration of Helsinki.
Acknowledgements
Publication of HEEDB is supported by a grant (R01HL161253) from the National Heart Lung and Blood Institute (NHLBI) of the NIH to Massachusetts General Hospital, Emory University, Stanford University, Kaiser Permanente, Boston Children's Hospital, and Beth Israel Deaconess Medical Center
Conflicts of Interest
Dr. Westover is a co-founder, scientific advisor, consultant to, and has personal equity interest in Beacon Biosignals. The other authors declare that they have no conflicts of interest.
Parent Projects
Access
Access Policy:
Only credentialed users who sign the DUA can access the files.
License (for files):
BDSP Credentialed Health Data License 1.5.0
Data Use Agreement:
BDSP Credentialed Health Data Use Agreement
Required training:
CITI Data or Specimens Only Research
Discovery
Corresponding Author
Files
- be a credentialed user
- complete required training:
- CITI Data or Specimens Only Research You may submit your training here.
- sign the data use agreement for the project