Name: Measuring Expertise in Identifying Interictal Epileptiform Discharges
Published: April 17, 2026
License: https://github.com/bdsp-core/bdsp-license-and-dua

Database Restricted Access

Nitish Harid , Jin Jing , Jacob Hogan , Fabio Nascimento , Wei-Long Zheng , Wendong Ge , Sahar Zafar , Jennifer Kim , Alice Lam , Aline Herlopian , Douglas Maus , Ioannis Karakis , Marcus Ng , Shenda Hong , Peter Kaplan , Sydney Cash , Mouhsin Shafi , Jonathan Halford , M Brandon Westover

Published: April 17, 2026. Version: 1.0.0

When using this resource, please cite: (show more options)
Harid, N., Jing, J., Hogan, J., Nascimento, F., Zheng, W., Ge, W., Zafar, S., Kim, J., Lam, A., Herlopian, A., Maus, D., Karakis, I., Ng, M., Hong, S., Kaplan, P., Cash, S., Shafi, M., Halford, J., & Westover, M. B. (2026). Measuring Expertise in Identifying Interictal Epileptiform Discharges (version 1.0.0). Brain Data Science Platform. https://doi.org/10.60508/dfp7-m853.

MLA	Harid, Nitish, et al. "Measuring Expertise in Identifying Interictal Epileptiform Discharges" (version 1.0.0). Brain Data Science Platform (2026), https://doi.org/10.60508/dfp7-m853.
APA	Harid, N., Jing, J., Hogan, J., Nascimento, F., Zheng, W., Ge, W., Zafar, S., Kim, J., Lam, A., Herlopian, A., Maus, D., Karakis, I., Ng, M., Hong, S., Kaplan, P., Cash, S., Shafi, M., Halford, J., & Westover, M. B. (2026). Measuring Expertise in Identifying Interictal Epileptiform Discharges (version 1.0.0). Brain Data Science Platform. https://doi.org/10.60508/dfp7-m853.
Chicago	Harid, Nitish, Jing, Jin, Hogan, Jacob, Nascimento, Fabio, Zheng, Wei-Long, Ge, Wendong, Zafar, Sahar, Kim, Jennifer, Lam, Alice, Herlopian, Aline, Maus, Douglas, Karakis, Ioannis, Ng, Marcus, Hong, Shenda, Kaplan, Peter, Cash, Sydney, Shafi, Mouhsin, Halford, Jonathan, and M Brandon Westover. "Measuring Expertise in Identifying Interictal Epileptiform Discharges" (version 1.0.0). Brain Data Science Platform (2026). https://doi.org/10.60508/dfp7-m853.
Harvard	Harid, N., Jing, J., Hogan, J., Nascimento, F., Zheng, W., Ge, W., Zafar, S., Kim, J., Lam, A., Herlopian, A., Maus, D., Karakis, I., Ng, M., Hong, S., Kaplan, P., Cash, S., Shafi, M., Halford, J., and Westover, M. B. (2026) 'Measuring Expertise in Identifying Interictal Epileptiform Discharges' (version 1.0.0), Brain Data Science Platform. Available at: https://doi.org/10.60508/dfp7-m853.
Vancouver	Harid N, Jing J, Hogan J, Nascimento F, Zheng W, Ge W, Zafar S, Kim J, Lam A, Herlopian A, Maus D, Karakis I, Ng M, Hong S, Kaplan P, Cash S, Shafi M, Halford J, Westover M B. Measuring Expertise in Identifying Interictal Epileptiform Discharges (version 1.0.0). Brain Data Science Platform. 2026. Available from: https://doi.org/10.60508/dfp7-m853.

Additionally, please cite the original publication:

Harid NM, Jing J, Hogan J, Nascimento FA, Ouyang A, Zheng WL, Ge W, Zafar SF, Kim JA, Alice DL, Herlopian A, Maus D, Karakis I, Ng M, Hong S, Yu Z, Kaplan PW, Cash S, Shafi M, Martz G, Halford JJ, Westover MB. Measuring expertise in identifying interictal epileptiform discharges. Epileptic Disord. 2022;24(3):496-506.

Abstract

Objective. Interictal epileptiform discharges (IEDs) in EEGs are integral to diagnosing epilepsy. However, EEGs are interpreted by readers with and without specialty training, and there is no accepted method to assess skill in interpretation. We aimed to develop a test to quantify IED recognition skills.

Methods. 13,262 candidate IEDs were selected from EEGs and scored by eight fellowship-trained reviewers to establish a gold standard. An online test was developed to assess how well readers with different training levels could distinguish candidate waveforms. Sensitivity, false positive rate, and calibration were calculated for each reader. A simple mathematical model was developed to estimate each reader's skill and threshold in identifying an IED, and to develop receiver operating curves for each reader. We investigated the number of IEDs needed to measure skill level with acceptable precision.

Results. 29 raters completed the test; 9 experts, 7 experienced readers, and 13 novices. Median calibration errors for experts, experienced raters, and novices were −0.056, 0.012, and 0.046; median sensitivities were 0.800, 0.811, and 0.715; and median false positive rates were 0.177, 0.272, and 0.396, respectively. The number of test questions needed to measure those scores was 549. Our analysis identified novices as having higher noise/uncertainty compared to experienced and expert readers. Using calculated noise and threshold levels, receiver operating curves were created, showing increasing median area under the curve from novices (0.735), to experienced (0.852), to experts (0.891).

Significance. Expert and non-expert readers can be distinguished based on ability to identify IEDs. This type of assessment could also be used to identify and correct differences in thresholds in identifying IEDs.

Background

Identification of interictal epileptiform discharges (IEDs) is key in diagnosing epilepsy and is currently performed by visual analysis of the EEG. Although attempts have been made to standardize criteria for identifying IEDs, accurate identification remains largely a matter of apprenticeship and experience. Formal training and clinical exposure are assumed to increase skill in EEG interpretation, but no objective method exists to measure that skill; the only formal constraint on practice is the minimum set by ACGME fellowship requirements.

Both fellowship-trained and non-fellowship-trained neurologists are permitted to interpret EEGs in the United States, and neurology residencies provide widely varying amounts of exposure to EEG reading. A majority of neurology residents report low confidence in interpreting EEGs independently, and varied training translates into varied tendencies toward over-calling IEDs by classifying benign sharp transients or artifacts as IEDs. Misdiagnosis of epilepsy has been attributed to EEG over-interpretation in a substantial fraction of cases, with serious downstream consequences including unnecessary medication, driving restrictions, and delayed identification of the true underlying diagnosis.

A valid IED skill-assessment tool would quantify both a reader's accuracy and their consistency, distinguishing over-callers (high sensitivity, low specificity) from under-callers (high specificity, low sensitivity) across a wide range of obvious and ambiguous IEDs. This project releases an online test and the underlying annotated EEG dataset so that the broader community can evaluate and improve IED recognition skill.

Methods

IED database. The dataset contains 13,262 candidate IEDs (including true IEDs and benign variants) collected from 991 abnormal and 60 normal consecutively selected routine and continuous scalp EEGs of pediatric and adult patients performed at Massachusetts General Hospital between 2012 and 2016. Candidate IEDs include focal, regional, and generalized discharges. All EEGs were recorded in the standard 10-20 system (19-electrode array). Each candidate was independently scored as epileptiform or non-epileptiform by eight epilepsy/CNP fellowship-trained physicians (the “Original 8”); each candidate was then assigned a consensus probability of being an IED equal to the proportion of Original 8 raters (0/8, 1/8, …, 8/8) who scored it as an IED. For binary analyses, the correct binary label is “IED” for candidates receiving at least 5/8 yes votes and “non-IED” otherwise.

Online IED classification test. Each rater is presented with 1,000 candidate IEDs drawn at random from the pool so that the nine probability categories (0/8, 1/8, …, 8/8) are equally represented. EEG is displayed in a 10-second window in multiple montages (physical C2 referential, common average, longitudinal bipolar), with one ECG channel. A high-pass filter at 1 Hz and a 60 Hz notch filter are applied after resampling to 128 Hz; amplitude scaling is adjustable but filter cutoffs are not. For each candidate, the rater classifies the waveform as epileptiform or non-epileptiform and receives immediate feedback (smile / frown / neutral, relative to the gold standard) before moving to the next sample.

Performance metrics. For each rater we compute sensitivity, false positive rate, and a calibration error (CE): within each of the nine probability bins, the deviation between the rater's proportion of yes votes and the gold-standard bin value, averaged across bins. We identify the minimum number of candidate IEDs needed to estimate sensitivity, false positive rate, and calibration error with acceptable precision (95% CI width < 0.1) via bootstrap.

Decision model. To disentangle skill from individual threshold preferences, we fit a simple decision model per rater: a noisy percept z′ = z + n (where z is the log-odds of the gold-standard probability and n is Gaussian noise with standard deviation σ) is compared to a rater-specific threshold θ; samples with z′ > θ are classified as IEDs. We estimate (σ, θ) per rater via grid search matching the rater's observed sensitivity and false positive rate, then generate an ROC curve by varying θ for the fitted σ.

Ethics. The institutional review board at Massachusetts General Hospital approved the study and waived the requirement for informed consent because the study poses no risk to patients.

Data Description

The dataset is hosted at s3://bdsp-opendata-restricted/spike-test/ and consists of a single HDF5 file:

SN1_deid.h5 (1.2 GB) — De-identified archive of the 13,262 candidate interictal epileptiform discharge (IED) waveforms used in the study, together with gold-standard expert annotations and per-candidate metadata. Each candidate contains a 10-second, 19-channel scalp EEG window recorded in the standard 10-20 montage (plus one ECG channel), resampled to 128 Hz with a 1 Hz high-pass and 60 Hz notch filter applied. Each candidate is accompanied by the number of yes votes from the eight fellowship-trained reviewers who defined the gold standard (score in {0, 1, 2, …, 8}).

All 19-channel EEG traces have been de-identified. The gold-standard annotations partition candidates into nine probability bins (0/8 through 8/8) used by the online IED-classification test. All patient-level identifiers have been removed per HIPAA Safe Harbor.

The candidate EEGs originate from 991 abnormal and 60 normal consecutively selected routine and continuous scalp EEGs of pediatric and adult patients performed at Massachusetts General Hospital between 2012 and 2016.

Usage Notes

The online IED classification test itself runs at https://cdac.massgeneral.org/tools/spikedetector/spikeTest. Code to reproduce all analyses in the manuscript (calibration curves, sensitivity/false-positive-rate estimation, decision-model fitting, ROC generation) is available at https://github.com/bdsp-core/Measuring-Expertise-in-Identifying-Interictal-Epileptiform-Discharges.

The HDF5 file can be read directly with h5py or pandas. For example:

import h5py
with h5py.File('SN1_deid.h5', 'r') as f:
    eeg = f['eeg'][:]             # shape: (N, channels, samples)
    votes = f['gold_standard'][:]  # shape: (N,) integer 0-8

Typical use cases include: (1) training or evaluating automated IED detectors against a large expert-labeled benchmark; (2) developing or benchmarking similar expertise-assessment tools for other EEG findings; (3) studying inter-rater variability and the effect of training on diagnostic skill.

Ethics

The study was approved by the Massachusetts General Hospital Institutional Review Board, which waived the requirement for informed consent because the study posed no risk to patients. All data have been de-identified.

Acknowledgements

The authors thank the IED test participants not included in authorship whose data contributed to this work: Sam Terman, Rani Sarkis, Dan Rubin, Garland Tang, Lindsay Joslyn, Maurice Abou Jaoude, Valeria Sacca, Anil Palepu, Elissa Ye, Julia Carlson, Lisa Dümmer, Ryan Gaudreau, Sean Bullock, Ziwei Fan, and Abubakar Muhammad Ayub.