Cdm

Cdm(
    tables,
    *,
    cdm_name,
    cdm_version=None,
    cdm_schema=None,
    write_schema=None,
    achilles_schema=None,
    source,
)

OMOP CDM reference: holds a mapping of table names to Ibis table expressions plus metadata (name, version, schemas, source).

Attributes

Name Description
achilles_schema Schema for Achilles tables (optional).
cdm_schema Schema where CDM tables live.
con Underlying Ibis connection, or None if not database-backed.
name CDM name (e.g. from cdm_source or user).
source Source (connection) for this CDM.
tables List of table names in this CDM.
version OMOP CDM version (5.3 or 5.4).
write_schema Schema for write/cohort tables.

Methods

Name Description
disconnect Disconnect the underlying source.
flatten Flatten the CDM into a single observation table.
sample Subset the CDM to a random sample of n persons.
select_tables Return a new Cdm with only the given tables (subset).
snapshot Snapshot CDM metadata. Executes and returns a 1-row pandas DataFrame.
subset Subset the CDM to a set of person IDs.
subset_cohort Subset the CDM to individuals in one or more cohorts.

disconnect

Cdm.disconnect(drop_write_schema=False)

Disconnect the underlying source.

Parameters

Name Type Description Default
drop_write_schema bool If True, drop write-schema tables before disconnecting (default False). False

flatten

Cdm.flatten(domain=None, include_concept_name=True)

Flatten the CDM into a single observation table.

Transforms selected domain tables into a common schema (person_id, observation_concept_id, start_date, end_date, type_concept_id, domain) and unions them. Recommended only for filtered or small CDMs.

Parameters

Name Type Description Default
domain list[str] or None Domains to include. Must be a subset of: “condition_occurrence”, “drug_exposure”, “procedure_occurrence”, “measurement”, “visit_occurrence”, “death”, “observation”. Default is condition_occurrence, drug_exposure, procedure_occurrence. None
include_concept_name bool If True (default), add observation_concept_name and type_concept_name via the concept table. True

Returns

Name Type Description
ibis.expr.types.Table Lazy table expression; use collect() to materialize.

sample

Cdm.sample(n, seed=None, name='person_sample')

Subset the CDM to a random sample of n persons.

Persons are drawn from the person table. The sample table is inserted into the write schema under the given name and all clinical tables are filtered to those persons.

Parameters

Name Type Description Default
n int Number of persons to include. required
seed int or None Random seed for reproducibility; None uses a random seed. None
name str Name of the table storing the sampled person_ids (default “person_sample”). 'person_sample'

Returns

Name Type Description
Cdm New CDM with tables subset to the sampled persons (and the sample table added).

select_tables

Cdm.select_tables(*names)

Return a new Cdm with only the given tables (subset).

Parameters

Name Type Description Default
*names str Table names to keep. ()

Returns

Name Type Description
Cdm New CDM with subset of tables.

Raises

Name Type Description
TableNotFoundError If any name is not in this CDM.

snapshot

Cdm.snapshot(compute_data_hash=False)

Snapshot CDM metadata. Executes and returns a 1-row pandas DataFrame.

Call with parentheses: cdm.snapshot(). Without () you get the method object, not the DataFrame.

Requires person, observation_period, cdm_source, and vocabulary tables. Column order matches R CDMConnector snapshot(): cdm_name, cdm_source_name, cdm_description, cdm_documentation_reference, cdm_version, cdm_holder, cdm_release_date, vocabulary_version, person_count, observation_period_count, earliest_observation_period_start_date, latest_observation_period_end_date, snapshot_date, cdm_data_hash.

Parameters

Name Type Description Default
compute_data_hash bool If True, include data hash in snapshot (default False). False

Returns

Name Type Description
pandas.DataFrame Exactly one row (1-row DataFrame) with CDM metadata.

subset

Cdm.subset(person_id)

Subset the CDM to a set of person IDs.

Returns a new CDM where all clinical tables are filtered to rows whose person_id (or subject_id) is in the given set. Requires a database-backed CDM with write_schema.

Parameters

Name Type Description Default
person_id list[int] or array - like Person IDs to include. required

Returns

Name Type Description
Cdm New CDM with tables subset to the given persons.

subset_cohort

Cdm.subset_cohort(cohort_table='cohort', cohort_id=None, verbose=False)

Subset the CDM to individuals in one or more cohorts.

Returns a new CDM where all clinical tables are filtered to persons present in the given cohort table (optionally restricted by cohort_id). Subset is lazy until tables are used.

Parameters

Name Type Description Default
cohort_table str Name of the cohort table in this CDM (default “cohort”). 'cohort'
cohort_id int or list[int] or None Cohort definition ID(s) to include; None uses all cohorts in the table. None
verbose bool If True, log subset size (default False). False

Returns

Name Type Description
Cdm New CDM subset to cohort persons.