CDMConnector

CDMConnector is a Python port of the R CDMConnector and omopgenerics packages. It provides tools for working with observational health data in the OMOP Common Data Model format using Ibis for lazy SQL pushdown (replacing dplyr/dbplyr in R).

What it does

  • Cdm: A single object holding OMOP table references (Ibis tables) plus metadata (name, version, schemas).
  • CdmSource / DbCdmSource: Backed by an Ibis connection (DuckDB, Postgres, etc.) with write-schema support.
  • Cohort tables: OMOP cohort structure (cohort_definition_id, subject_id, cohort_start_date, cohort_end_date), attrition, cohort counts.
  • Eunomia: Helpers to download and use example OMOP datasets (e.g. GiBleed).
  • Schema-aware: cdm_schema, write_schema, optional achilles_schema; works with Ibis backends.

Install

# Roughly equivalent to R's devtools::install_github(...)
pip install "cdmconnector @ git+https://github.com/OHDSI/pyCDMConnector.git"

With optional backends:

# Built-in by default
# duckdb

# Extras currently exposed by the package
pip install "cdmconnector[postgres] @ git+https://github.com/OHDSI/pyCDMConnector.git"
pip install "cdmconnector[snowflake] @ git+https://github.com/OHDSI/pyCDMConnector.git"
pip install "cdmconnector[bigquery] @ git+https://github.com/OHDSI/pyCDMConnector.git"

# Also supported through Ibis backends used by the package
# redshift
# sqlserver
# spark / databricks

pip install "cdmconnector[dev,docs] @ git+https://github.com/OHDSI/pyCDMConnector.git"

Supported database backends are: duckdb, postgres, redshift, sqlserver, snowflake, bigquery, and spark / databricks.

Quick example

import cdmconnector as cc
import ibis

# From the Eunomia GiBleed example in DuckDB
path = cc.eunomia_dir("GiBleed", cdm_version="5.3")
con = ibis.duckdb.connect(path)
cdm = cc.cdm_from_con(
    con,
    cdm_schema="main",
    write_schema="main",
    cdm_name="GiBleed",
)

# Access tables (Ibis expressions)
person = cdm.person
person = cdm["person"]

# Lazy query (Ibis)
result = (
    cdm.person
    .filter(cdm.person.year_of_birth >= 1990)
    .select("person_id", "year_of_birth")
)
# Execute: result.execute() or materialize with cc.compute(...)

Next steps

License

Apache 2.0.