OHDSI
Waveform WGThe OMOP CDM Waveform Extension provides a standardized framework for integrating physiological waveform data with electronic health records in observational research. By extending the OMOP Common Data Model with four interconnected tables, the extension enables researchers to combine high-resolution time-series data (ECG, EEG, arterial blood pressure, etc.) with structured clinical data for population-level analyses.
The Waveform Extension consists of four tables that work together to represent the complete lifecycle of waveform data:
┌─────────────────────────┐
│ waveform_occurrence │ ← Clinical & temporal context
│ (acquisition session) │
└───────────┬─────────────┘
│
│ 1:N
▼
┌─────────────────────────┐
│ waveform_registry │ ← Individual files
│ (file metadata) │
└───────┬─────────────────┘
│
│ 1:N
▼
┌─────────────────────────┐
│ waveform_channel_ │ ← Per-signal metadata
│ metadata │
└───────┬─────────────────┘
│
│ 1:N
▼
┌─────────────────────────┐
│ waveform_feature │ ← Derived measurements
│ │
└─────────────────────────┘
Purpose: Establishes the clinical and temporal context for a waveform recording session.
Key Concepts: - One occurrence = one clinical acquisition event (e.g., “12-lead diagnostic ECG” or “ICU telemetry session”) - Links to PERSON and VISIT_OCCURRENCE to establish who/when/where - May contain one or many files - Defines the temporal boundaries for the entire acquisition
Example: An ICU patient has continuous bedside monitoring from 8:00 AM to 8:00 PM on 2025-01-15. This entire 12-hour session is one waveform_occurrence, even if the data is split across multiple files.
Purpose: Registers individual waveform files with their storage locations and temporal boundaries.
Key Concepts: - One row per file - Links to waveform_occurrence (many files can belong to one occurrence) - Stores both source (original) and target (standardized) file paths - Captures file-level timestamps (may be subset of occurrence timestamps) - Records file format/extension
Example: The 12-hour ICU telemetry session is split into 12 hourly files. Each file gets one row in waveform_registry, all linked to the same waveform_occurrence.
Purpose: Describes per-channel signal characteristics needed to interpret raw waveform data.
Key Concepts: - One row per metadata attribute per channel per file - Captures sampling rate, gain, calibration, units, signal quality - Links signals to devices and procedures when known - Supports both numeric metadata (sampling_rate = 500 Hz) and categorical (filter_type = “High Pass”)
Example: For a 12-lead ECG file with leads I, II, III, aVR, aVL, aVF, V1-V6, there would be at least 12 rows (one per channel) describing each lead’s sampling rate. Additional rows capture gain, units, etc.
Purpose: Stores measurements and features derived from waveform signals.
Key Concepts: - Links back to specific files, channels, and time windows - Supports both traditional features (HR, QT interval, HRV) and AI-derived embeddings - Can optionally link to MEASUREMENT or OBSERVATION tables for vocabulary integration - Records the algorithm/method used for feature derivation - Supports both scalar values and file-based features (spectrograms, embeddings)
Example: From Lead II of an ECG, derive: mean heart rate (75 bpm), QTc interval (412 ms), SDNN (45 ms), PVC count (3). Each derived measure gets one row in waveform_feature.
Raw Waveform Files
↓
[File Discovery]
↓
[Extract Metadata from Headers]
↓
[Link to Patient/Visit]
↓
┌───────────────────────────┐
│ Populate │
│ waveform_occurrence │
└───────────┬───────────────┘
↓
┌───────────────────────────┐
│ Populate │
│ waveform_registry │
└───────────┬───────────────┘
↓
┌───────────────────────────┐
│ Populate │
│ waveform_channel_metadata │
└───────────┬───────────────┘
↓
[Apply Feature Extraction]
↓
┌───────────────────────────┐
│ Populate │
│ waveform_feature │
└───────────────────────────┘
↓
[Join with OMOP Clinical Tables]
See the Implementation Guide for detailed ETL specifications.
The Waveform Extension connects to core OMOP tables:
Scenario: Predict hemodynamic instability in ICU patients
Data Integration: - Waveforms: Continuous arterial blood pressure, ECG - Clinical: Vasopressor doses, fluid administration, lab results - Outcomes: MAP < 65 mmHg for > 15 minutes
Analysis: 1. Extract HRV, BP variability, and trend features from waveforms 2. Join with medication timing and lab values 3. Train prediction model using combined features 4. Validate across multiple sites using standardized data
Scenario: Detect QT prolongation associated with specific medications
Data Integration: - Waveforms: Serial 12-lead ECGs - Clinical: Medication exposures, electrolyte levels, comorbidities - Outcomes: QTc > 500 ms or increase > 60 ms from baseline
Analysis: 1. Extract QT intervals from ECG Lead II 2. Calculate QTc using Bazett’s or Fridericia formula 3. Link to DRUG_EXPOSURE within temporal window 4. Adjust for confounders (electrolytes, renal function)
Scenario: Predict seizure onset in epilepsy patients
Data Integration: - Waveforms: Continuous EEG monitoring - Clinical: Antiepileptic medication levels, sleep patterns, triggers - Outcomes: Clinically documented seizures
Analysis: 1. Extract power spectral features and entropy from EEG 2. Detect interictal spikes and slow-wave patterns 3. Align with medication timing and sleep stages 4. Train LSTM model to predict seizures 30 minutes in advance
See Table Specifications for complete schema details.
Ready to implement the Waveform Extension?