Database Credentialed Access

Harvard-Emory ECG Database

Zuzana Koscova Valdery Moura Junior Matthew Reyna Shenda Hong Aditya Gupta Manohar Ghanta Reza Sameni Jonathan Rosand Aaron Aguirre Qiao Li Gari Clifford M Brandon Westover

Published: Nov. 6, 2024. Version: 2.0


When using this resource, please cite: (show more options)
Koscova, Z., Moura Junior, V., Reyna, M., Hong, S., Gupta, A., Ghanta, M., Sameni, R., Rosand, J., Aguirre, A., Li, Q., Clifford, G., & Westover, M. B. (2024). Harvard-Emory ECG Database (version 2.0). Brain Data Science Platform. https://doi.org/10.60508/13rj-5d45.

Additionally, please cite the original publication:

The Harvard-Emory ECG Database Zuzana Koscova, Qiao Li, Chad Robichaux, Valdery Moura Junior, Manohar Ghanta, Aditya Gupta, Jonathan Rosand, Aaron Aguirre, Shenda Hong, David E. Albert, Joel Xue, Aarya Parekh, Reza Sameni, Matthew A. Reyna, M. Brandon Westover, Gari D. Cliford medRxiv 2024.09.27.24314503; doi: https://doi.org/10.1101/2024.09.27.24314503

Abstract

The Harvard-Emory ECG database (HEEDB) is a large collection of 12-lead electrocardiography (ECG) recordings, prepared through a collaboration between Harvard University and Emory University investigators.

In version 1.0 of the database, these ECGs were provided without labels or metadata, to enable pre-training of ECG analysis models.

In version 2.0, labels and metadata are included.

HEEDB is published as part of the Human Sleep Project (HSP), funded by a grant (R01HL161253) from the National Heart Lung and Blood Institute (NHLBI) of the NIH to Massachusetts General Hospital, Emory University, Stanford University, Kaiser Permanente, Boston Children's Hospital, and Beth Israel Deaconess Medical Center.


Background

These ECG data include clinical ECGs captured during routine clinical care over several decades. These are intended to be used to determine associations between cardiac abnormalities (e.g. abnormal rhythms) and sleep, sleep-related medical conditions, and health outcomes.


Methods

Data acquisition: These ECGs are 12-lead recordings. Most are of 10 seconds duration at 500 Hz. They were collected beginning in the 1990s and now. At the time of initial publication, the database includes 10,771,552 ECGs from 1,818,247 unique patients. New ECGs will be added periodically. All ECGs were collected in the course of routine clinical care. 

Data preprocessingData was de-identified following the Safe Harbor method.


Data Description

Data were converted to WFDB (Waveform Database) and Matlab (V4) compatible format. Each ECG recording includes one waveform data file (.mat) and one header file (.hea). The waveform data file can be read by WFDB library functions, applications, Toolbox, or be loaded to Matlab directly. Most waveform files are synchronized 12-lead ECG signals recorded at 500Hz for 10 s. The header file specifies the names of the associated waveform files and their attributes. It contains line-oriented and field-oriented ASCII text and can be read by the WFDB library or generic text editors.

General metadata: Information about ECG data, such as subject ID, sex, age, and (shifted) acquisition date. Person ID is a de-personalized unique identifier.

ECG data contained in WFDB format are: Waveform data: 12-lead ECG signals recorded at 500 Hz for 10 s. The header file contains general information about the signal, such as sampling rate and units, including the channel names of the signal, and the data file contains 12-lead information encoded in 16 bits.

Metadata files: 

  • demographics_ECG.csv: the column names are as follows (self-explanatory). Note that all dates are de-identified (shifted). However, all dates for a given subject are shifted in a consistent manner. 

    • BDSPPatientID
      DateOfBirth
      DateOfDeath
      DateOfDeathMARegistryData
      LastKnownVisitDate
      PatientRace
      EthnicGroupDSC
      MaritalStatusDSC
      ReligionDSC
      LanguageDSC
      VeteranStatusDSC
      SexDSC
      PrimaryCauseOfDeathDSC
      UNOSPrimaryCauseOfDeathTXT
      FirstContributoryCauseOfDeathDSC
      UNOSContributoryCauseOfDeath01TXT
      SecondContributoryCauseOfDeathDSC
      UNOSContributoryCauseOfDeath02TXT
      EducationLevelDSC
      GenderIdentityDSC
      SexAssignedAtBirthDSC
      BDSPLastModifiedDTS
      RecordingTime
      FileName
      Age

  • ecg_interpretations.psv: pipe (|) separated value file containing ECG interpretations ("diagnoses"), expressed in the Marquette 12SL ("12SL") ECG Analysis Program (GE Healthcare) version 4 language. These annotations consist of textual reports that describe the morphology, rhythm, and diagnostic information of the ECGs. Each annotation includes statement numbers that correspond to human-readable diagnoses. Columns of this file are: 

    • BDSPID: subject ID

    • file_source: name of the ECG file

    • diagnostic_codes: 12SL codes

    • diagnostic_text: human-readable 12SL text

    • measurements: HR, PR, QRSD, QT, QTc,  Pax, Rax, Tax

  • diagnoses_dictionary.csv: columns include

    • codes (12SL integer codes)

    • acronym (human readable)

    • diagnoses (human readable)


Usage Notes

HEEDB is intended to support a wide range of ECG studies, in particular those exploring the relationship between ECG conditions and sleep. Python code for working with the HEEDB is available on GitHub.


Release Notes

v1.0: ECGs: 10,771,552; unique subjects: 1,818,247

v2.0: new data added: ecg_interpretations.psv, diagnoses_dictionary.csv, demographics_ECG.csv


Ethics

The study protocol was approved by the Institutional Review Board of Massachusetts General Hospital. The written informed consents were waived, because of the retrospective study design with minimal risk to participants. The study also complied with the Declaration of Helsinki.


Acknowledgements

Publication of HEEDB is supported by a grant (R01HL161253) from the National Heart Lung and Blood Institute (NHLBI) of the NIH to Massachusetts General Hospital, Emory University, Stanford University, Kaiser Permanente, Boston Children's Hospital, and Beth Israel Deaconess Medical Center


Conflicts of Interest

Dr. Westover is a co-founder, scientific advisor, consultant to, and has personal equity interest in Beacon Biosignals. The other authors declare that they have no conflicts of interest.


Parent Projects
Harvard-Emory ECG Database was derived from: Please cite them when using this project.
Share
Access

Access Policy:
Only credentialed users who sign the DUA can access the files.

License (for files):
BDSP Credentialed Health Data License 1.5.0

Data Use Agreement:
BDSP Credentialed Health Data Use Agreement

Required training:
CITI Data or Specimens Only Research

Corresponding Author
You must be logged in to view the contact information.
Versions
  • 1.0 - Sept. 8, 2023
  • 2.0 - Nov. 6, 2024

Files