Pedagogical OMOP in Your Browser (GiBleed / Eunomia)

Mini-course: observational data analysis on the GiBleed Eunomia OMOP CDM using Ibis and DuckDB in-browser—joins, cohort construction, time-at-risk, and outcome counting.
NoteView over HTTP

Serve the built docs folder (e.g. cd docs && python -m http.server 4522), then open http://localhost:4522/tutorials/wasm_example_analysis.html. Run each code cell in order.

This page is a pedagogical mini-course in doing observational data analysis on an OMOP CDM. Code runs in your browser (Pyodide + Ibis + DuckDB + pandas). We use the real GiBleed Eunomia example dataset so you can work with an actual OMOP CDM without installing anything locally.


1 1. What you’ll learn

This tutorial teaches:

  • Ibis + DuckDB: lazy, backend-agnostic table operations that compile to SQL when you execute.
  • Core OMOP tables: person, visit_occurrence, condition_occurrence, drug_exposure, measurement, and concept.
  • Observational workflow: define a cohort (e.g. GI bleed), time-at-risk, baseline characteristics, outcome counts, and rates.
  • Patterns: joins to concept, cohort construction, time-window joins, and a small “study report” function.
  • SQL when needed: using SQLGlot to translate SQL between dialects (e.g. DuckDB → Postgres).

2 2. Prerequisites

You should be comfortable with:

  • basic Python
  • basic pandas or Ibis for tables and joins

Everything else is taught as we go.

3 3. Install and load packages

In the browser we use Pyodide plus micropip to install Ibis (not included in the default Pyodide bundle). Run the cell below once; it installs ibis-framework[duckdb] then imports the tools we need to download and analyze the GiBleed Eunomia example data in-browser.

4 4. Get the GiBleed Eunomia example and connect

Eunomia provides synthetic OMOP CDM datasets for teaching and testing. In this browser demo we download the real GiBleed 5.3 Eunomia archive, cache its parquet files in Pyodide’s local filesystem, and register the core OMOP tables in DuckDB.

4.1 4.1 What is a “CDM reference”?

In CDMConnector you get a single object (for example cdm) whose elements are lazy Ibis table references. Here we use a plain Ibis connection and the GiBleed OMOP tables loaded from Eunomia parquet files; we query them with the same Ibis expressions.

5 5. First contact with OMOP data

5.1 5.1 person: who are the individuals?

5.1.1 Key OMOP idea: “concept_id everywhere”

OMOP stores vocabulary as integer IDs. gender_concept_id points to the concept table. Decode gender by joining to concept.

5.2 5.2 Visits: where do records cluster in time?

6 6. Ibis on a database: mental model

When you build an Ibis expression, nothing runs until you execute it (e.g. con.to_pandas(expr)). In a real CDM you call collect() or execute(). Same idea: query first, execute when needed. You can inspect the compiled SQL; here we show a typical filter.

7 7. A first descriptive analysis: “Top conditions”

Most frequent condition concepts, then decode to names.

8 8. The observational workflow: define a cohort, then analyze

Common pattern:

  1. Define an index event (cohort entry).
  2. Optional inclusion/exclusion.
  3. Define time-at-risk (TAR).
  4. Estimate outcomes, covariates, rates.

We build a GI bleed cohort from condition concept 192671 (Gastrointestinal hemorrhage): first occurrence per person, cohort end = start + 10 days.

8.1 8.1 Define a “GI bleed” cohort

A cohort table has at minimum: cohort_definition_id, subject_id, cohort_start_date, cohort_end_date.

8.2 8.2 Cohort size and person-time

9 9. Baseline characteristics at cohort entry

Age and sex at cohort_start_date. We approximate age from year/month/day of birth.

10 10. Outcomes: count events during time-at-risk

For each cohort entry, look for records between cohort_start_date and cohort_end_date.

10.1 10.1 Death during cohort window (if available)

10.2 10.2 Condition outcome during cohort window

Join cohort to condition_occurrence, filter condition dates within TAR, then summarise. (Here we use the same concept 192671 as a toy outcome.)

11 11. Rates: events per person-time

Crude incidence rate = events / total time-at-risk (e.g. per person-year).

\[\text{rate} = \frac{\text{events}}{\text{total time-at-risk (years)}}\]

12 12. Practical Ibis-on-CDM skills

12.1 12.1 semi_join: restrict to cohort members

Filter any CDM table to people in your cohort. Example: drug exposures among cohort members (any time).

12.2 12.2 Time-window joins

Restrict drug exposures to the cohort TAR window: drug_exposure_start_date between cohort_start_date and cohort_end_date.

13 13. When you have SQL: translate with SQLGlot

When you do have raw SQL (e.g. from CIRCE cohort definitions or legacy scripts), use SQLGlot to keep it generic and translate between dialects (DuckDB, Postgres, Spark, etc.). That way one SQL source can run on different backends.

Use Ibis as the primary way to express queries (backend-agnostic, composable). Use SQLGlot when you need to work with existing SQL strings and run them on another dialect.

14 14. Materializing intermediate results

On large CDMs you keep queries lazy and materialize only when reusing. Here we create a table from an Ibis expression for reuse.

15 15. Putting it together: a tiny “study report” function

A realistic workflow: a function that takes the connection and cohort table name and returns a summary (using Ibis).

16 16. Next steps

To grow this into a full observational analysis course:

  1. Cohort design rigor: washout periods, prior observation, inclusion criteria, censoring.
  2. Confounding + covariates: baseline covariate construction from condition/drug/measurement history.
  3. Comparative effectiveness: target trial emulation, exposure cohorts, propensity scores.
  4. Outcome modeling: incidence rates, Kaplan–Meier, Cox models (with careful design).
  5. Reusable pipelines: parameterized functions producing standardized outputs.

17 17. Clean up


Provenance: This in-browser lesson uses the real GiBleed Eunomia example dataset and mirrors the Pedagogical OMOP (GiBleed) tutorial. For local work, use CDMConnector with Eunomia (for example eunomia_dir("GiBleed")) and run the same workflow with Ibis + DuckDB or your backend. Prefer Ibis for queries; use SQLGlot to translate existing SQL between dialects when needed.