Cdm

Cdm(
    tables,
    *,
    cdm_name,
    cdm_version=None,
    cdm_schema=None,
    write_schema=None,
    achilles_schema=None,
    source,
)

OMOP CDM reference: holds a mapping of table names to Ibis table expressions plus metadata (name, version, schemas, source).

Attributes

Name	Description
achilles_schema	Schema for Achilles tables (optional).
cdm_schema	Schema where CDM tables live.
con	Underlying Ibis connection, or None if not database-backed.
name	CDM name (e.g. from cdm_source or user).
source	Source (connection) for this CDM.
tables	List of table names in this CDM.
version	OMOP CDM version (5.3 or 5.4).
write_schema	Schema for write/cohort tables.

Methods

Name	Description
disconnect	Disconnect the underlying source.
flatten	Flatten the CDM into a single observation table.
sample	Subset the CDM to a random sample of n persons.
select_tables	Return a new Cdm with only the given tables (subset).
snapshot	Snapshot CDM metadata. Executes and returns a 1-row pandas DataFrame.
subset	Subset the CDM to a set of person IDs.
subset_cohort	Subset the CDM to individuals in one or more cohorts.

disconnect

Cdm.disconnect(drop_write_schema=False)

Disconnect the underlying source.

Parameters

Name	Type	Description	Default
drop_write_schema	bool	If True, drop write-schema tables before disconnecting (default False).	`False`

flatten

Cdm.flatten(domain=None, include_concept_name=True)

Flatten the CDM into a single observation table.

Transforms selected domain tables into a common schema (person_id, observation_concept_id, start_date, end_date, type_concept_id, domain) and unions them. Recommended only for filtered or small CDMs.

Parameters

Name	Type	Description	Default
domain	list[str] or None	Domains to include. Must be a subset of: “condition_occurrence”, “drug_exposure”, “procedure_occurrence”, “measurement”, “visit_occurrence”, “death”, “observation”. Default is condition_occurrence, drug_exposure, procedure_occurrence.	`None`
include_concept_name	bool	If True (default), add observation_concept_name and type_concept_name via the concept table.	`True`

Returns

Name	Type	Description
	ibis.expr.types.Table	Lazy table expression; use collect() to materialize.

sample

Cdm.sample(n, seed=None, name='person_sample')

Subset the CDM to a random sample of n persons.

Persons are drawn from the person table. The sample table is inserted into the write schema under the given name and all clinical tables are filtered to those persons.

Parameters

Name	Type	Description	Default
n	int	Number of persons to include.	required
seed	int or None	Random seed for reproducibility; None uses a random seed.	`None`
name	str	Name of the table storing the sampled person_ids (default “person_sample”).	`'person_sample'`

Returns

Name	Type	Description
	Cdm	New CDM with tables subset to the sampled persons (and the sample table added).

select_tables

Cdm.select_tables(*names)

Return a new Cdm with only the given tables (subset).

Parameters

Name	Type	Description	Default
*names	str	Table names to keep.	`()`

Returns

Name	Type	Description
	Cdm	New CDM with subset of tables.

Raises

Name	Type	Description
	TableNotFoundError	If any name is not in this CDM.

snapshot

Cdm.snapshot(compute_data_hash=False)

Snapshot CDM metadata. Executes and returns a 1-row pandas DataFrame.

Call with parentheses: cdm.snapshot(). Without () you get the method object, not the DataFrame.

Requires person, observation_period, cdm_source, and vocabulary tables. Column order matches R CDMConnector snapshot(): cdm_name, cdm_source_name, cdm_description, cdm_documentation_reference, cdm_version, cdm_holder, cdm_release_date, vocabulary_version, person_count, observation_period_count, earliest_observation_period_start_date, latest_observation_period_end_date, snapshot_date, cdm_data_hash.

Parameters

Name	Type	Description	Default
compute_data_hash	bool	If True, include data hash in snapshot (default False).	`False`

Returns

Name	Type	Description
	pandas.DataFrame	Exactly one row (1-row DataFrame) with CDM metadata.

subset

Cdm.subset(person_id)

Subset the CDM to a set of person IDs.

Returns a new CDM where all clinical tables are filtered to rows whose person_id (or subject_id) is in the given set. Requires a database-backed CDM with write_schema.

Parameters

Name	Type	Description	Default
person_id	list[int] or array - like	Person IDs to include.	required

Returns

Name	Type	Description
	Cdm	New CDM with tables subset to the given persons.

subset_cohort

Cdm.subset_cohort(cohort_table='cohort', cohort_id=None, verbose=False)

Subset the CDM to individuals in one or more cohorts.

Returns a new CDM where all clinical tables are filtered to persons present in the given cohort table (optionally restricted by cohort_id). Subset is lazy until tables are used.

Parameters

Name	Type	Description	Default
cohort_table	str	Name of the cohort table in this CDM (default “cohort”).	`'cohort'`
cohort_id	int or list[int] or None	Cohort definition ID(s) to include; None uses all cohorts in the table.	`None`
verbose	bool	If True, log subset size (default False).	`False`

Returns

Name	Type	Description
	Cdm	New CDM subset to cohort persons.