Database Restricted Access

A randomized controlled educational pilot trial of interictal epileptiform discharge identification for neurology residents - Data and Code

Fabio Nascimento Jin Jing M Brandon Westover

Published: April 22, 2026. Version: 1.0.0


When using this resource, please cite: (show more options)
Nascimento, F., Jing, J., & Westover, M. B. (2026). A randomized controlled educational pilot trial of interictal epileptiform discharge identification for neurology residents - Data and Code (version 1.0.0). Brain Data Science Platform. https://doi.org/10.60508/q0h6-9676.

Additionally, please cite the original publication:

Nascimento FA, Jing J, Traner C, Kong WY, Olandoski M, Kapur S, Duhaime E, Strowd R, Moeller J, Westover MB. A randomized controlled educational pilot trial of interictal epileptiform discharge identification for neurology residents. Epileptic Disord. 2024;26(4):444-459.

Abstract

Objective. To assess the effectiveness of an educational program leveraging technology-enhanced learning and retrieval practice to teach trainees how to correctly identify interictal epileptiform discharges (IEDs).

Methods. Bi-institutional prospective randomized controlled trial with junior neurology residents. The intervention included three video tutorials on International Federation of Clinical Neurophysiology (IFCN) criteria for IED identification plus rating 500 candidate IEDs with immediate feedback via a web browser (intervention 1) or an iOS app (intervention 2). The control group received no intervention. All participants completed surveys and tests at baseline and study completion.

Results. Twenty-one residents enrolled (control n = 8; intervention 1 n = 6; intervention 2 n = 7); 19 had no prior EEG experience. Intervention 1 showed significant improvements in AUC (0.74 → 0.85), sensitivity (0.53 → 0.75), and confidence in IED identification (1.33 → 2.33; all p < 0.05). Intervention 2 improved AUC (0.81 → 0.86) and confidence in IED and spike-wave identification (2.00 → 3.14; p < 0.05). Controls showed no significant improvements.

Significance. This program led to significant subjective and objective improvements in IED identification. Rating candidate IEDs with instant feedback on a web browser (intervention 1) generated greater objective improvement than rating candidate IEDs on an iOS app.


Background

Identification of interictal epileptiform discharges (IEDs) is a core skill in EEG interpretation and epilepsy diagnosis. Accurate IED recognition is largely a matter of apprenticeship and experience, yet neurology residency programs in the United States have historically provided widely varying amounts of exposure to EEG reading, and many residents report low confidence in interpreting EEGs independently. The consequences of misinterpretation — both over-calling benign transients as IEDs and under-calling true IEDs — are well documented and include unnecessary antiseizure medication, driving restrictions, and delayed or incorrect diagnosis.

Technology-enhanced learning combined with retrieval practice (rating candidate waveforms with immediate feedback) has shown promise as a supplemental EEG teaching modality. Two delivery platforms have been developed by our group: a web-browser version and an iOS app. This project releases the data and code from a randomized controlled pilot trial designed to evaluate these two delivery formats against a no-intervention control, with the aim of generating evidence to guide the design of future formal EEG training curricula.


Methods

Study design. Bi-institutional, prospective, randomized controlled educational pilot trial. Junior neurology residents at Massachusetts General Hospital and Yale School of Medicine were randomized to one of three arms:

  • Control (n = 8): no intervention between baseline and follow-up testing.
  • Intervention 1 — Web browser (n = 6): three video tutorials on the IFCN six-feature IED criteria, followed by rating 500 candidate IEDs with immediate feedback delivered through a web-browser interface.
  • Intervention 2 — iOS app (n = 7): the same three video tutorials plus 500 candidate IED ratings with immediate feedback, delivered through the DiagnosUs iOS app (Centaur Labs).

All three arms took identical pre- and post-tests. Candidate IEDs were drawn from the combined SN1 EEG corpus (see Data Description). The gold-standard label for each candidate was the majority vote of eight fellowship-trained expert electroencephalographers.

Outcome measures. Primary (objective) outcomes were area under the receiver operating characteristic curve (AUC), sensitivity, specificity, accuracy, calibration index, threshold/bias parameter, and detection-noise (uncertainty) level, computed pre- and post-intervention on the same fixed test set. Secondary (subjective) outcomes were self-reported level of confidence in IED identification, level of confidence in spike-and-wave discharge identification, and free-text survey feedback about the educational experience.

Statistical analysis. Within-arm pre-vs-post comparisons were performed using paired non-parametric tests at α = 0.05. Between-arm comparisons of change scores were performed using Mann–Whitney U tests.

Ethics. The study was approved by the Institutional Review Boards at Massachusetts General Hospital and Yale School of Medicine. Written informed consent was obtained from all participating residents.


Data Description

The data package is hosted at s3://bdsp-opendata-restricted/spike-test/. The canonical release for this paper is the v2/ subfolder, which stitches the SpikeEd RCT rater responses into the unified SN1 combined-spike archive.

Files (v2/ subfolder):

  • v2/SN1_combined_v2.h5 (3.08 GB) — 20,521 ten-second 20-channel EEG clips at 128 Hz × 2,574 rater-sessions. HDF5 groups:
    • /eeg/signals — (20521, 1281, 20) float32, 10-20 montage + EKG at channel 19.
    • /segments/file_key — stimulus identifier. scr{0..8}_{####} for SN1 spike candidates (13,262 from the earlier SN1 bank), scr-1_{####} for 3,293 new benign-variant / spike-mimic stimuli added for this study, and Bonobo####_* for patient-paired candidates.
    • /experts/* — de-identified rater metadata: name (hash or R## for RCT residents), affiliation, years_eeg, neurologist, epileptologist, board_certified, group (Crowd / Super8 / Bonobo / New28 / spikeed_*), trial_pool, rct_arm (Control / Int1 / Int2, populated for the 21 SpikeEd residents).
    • /scores/matrix — (N × M) int8: −1 unscored, 0 called-not-spike, 1 called-spike.
    • /scores/user_time — (N × M) float32, per-trial response time in seconds.
  • v2/linking_local_v2_public.csv (947 KB) — row-wise segment metadata (no PHI).
  • v2/raters_v2.csv (23 KB) — SpikeEd-added rater-session metadata with RCT arm labels for the 21 residents × 2 sessions + expert reference raters.
  • v2/README.md — full v2 schema and provenance documentation.

Arm mapping (for reproducing the paper): G1 = Int1 (jj, web-browser + expert feedback, n = 6); G2 = Int2 (centaur, DiagnosUs iOS app, n = 7); G3 = Control (no intervention, n = 8). The per-resident de-identified response vectors used in the paper's MATLAB pipeline live in the companion GitHub repository under Data-DeIDed/.

Legacy v1 files are still present at the top level of the S3 prefix (SN1_combined_public.h5, crosswalk_public.csv, expertise_levels.csv, missing_eeg_public.csv) for users of earlier versions. New work should use v2/.

De-identification. All patient identifiers are pseudonyms. Real medical-record numbers and absolute timestamps have been removed from the public release. Rater names from published studies are retained as they appear in the original papers; RCT-cohort residents are identified only as R01–R21.


Usage Notes

Code to reproduce the pilot-trial analyses (MATLAB pipeline with step1_getRealPerformance.m, step2_fitLatentTraitModel.m, step3_getFigures.m and the supporting callbacks) is available at https://github.com/bdsp-core/spike-test-pilot-trial. The 21 residents' de-identified response vectors are shipped in-repo under Data-DeIDed/{pre,post}-study-test/G{1,2,3}/, so the three MATLAB scripts can reproduce Figures 2–6 without any external data.

To work with the stimulus EEG signals themselves — re-score them, extract new features, or recompute the analysis in Python — load the v2 HDF5 archive:

import h5py, numpy as np, pandas as pd

with h5py.File('v2/SN1_combined_v2.h5', 'r') as f:
    signals     = f['/eeg/signals']                    # (20521, 1281, 20) float32
    file_key    = f['/segments/file_key'].asstr()[:]    # stimulus id per segment
    scores      = f['/scores/matrix']                   # (N, M) int8
    rater_group = f['/experts/group'].asstr()[:]
    rct_arm     = f['/experts/rct_arm'].asstr()[:]
    # Identify the SpikeEd RCT rater sessions
    spikeed_idx = np.where(np.char.startswith(rater_group, 'spikeed_'))[0]
    # Identify the SN1 spike candidates vs benign variants
    sn1_idx     = np.where(np.char.startswith(file_key, 'scr') &
                            ~np.char.startswith(file_key, 'scr-1_'))[0]

raters_meta = pd.read_csv('v2/raters_v2.csv')
segments_meta = pd.read_csv('v2/linking_local_v2_public.csv')

Typical use cases:

  • Replicating the pilot-trial pre/post analyses and extending them with alternative latent-trait or ROC-fitting methods.
  • Training or evaluating automated IED detectors on a unified, multi-rater-labeled benchmark spanning both SN1 spike candidates and Fabio's benign-variant stimuli.
  • Studying inter-rater reliability across expertise levels (novice residents, experienced non-experts, Super-8 / New28 experts, and ~2,270 crowdsourced DiagnosUs users).
  • Designing larger-scale RCTs of EEG-education interventions.

Ethics

This study was approved by the Institutional Review Boards at Massachusetts General Hospital and Yale School of Medicine. Written informed consent was obtained from all 21 participating residents. All EEG data have been de-identified; patient identifiers in the released HDF5 archive are pseudonyms only.


Acknowledgements

The authors thank the 21 neurology residents at Massachusetts General Hospital and Yale School of Medicine who volunteered to participate in this educational pilot trial, as well as Centaur Labs for providing the DiagnosUs iOS app used in intervention 2.


Conflicts of Interest

S. Kapur and E. Duhaime of Centaur Labs (co-authors on the underlying manuscript) developed and have a financial interest in the DiagnosUs app used as the iOS intervention delivery platform. The remaining authors declare no competing interests.


Share
Access

Access Policy:
Only registered users who sign the specified data use agreement can access the files.

License (for files):
BDSP Restricted Health Data License 1.0.0

Data Use Agreement:
BDSP Restricted Health Data Use Agreement

Corresponding Author
You must be logged in to view the contact information.

Files