Transforms selected domain tables into a common schema (person_id, observation_concept_id, start_date, end_date, type_concept_id, domain) and unions them. Recommended only for filtered or small CDMs.
Parameters
Name
Type
Description
Default
domain
list[str] or None
Domains to include. Must be a subset of: “condition_occurrence”, “drug_exposure”, “procedure_occurrence”, “measurement”, “visit_occurrence”, “death”, “observation”. Default is condition_occurrence, drug_exposure, procedure_occurrence.
None
include_concept_name
bool
If True (default), add observation_concept_name and type_concept_name via the concept table.
True
Returns
Name
Type
Description
ibis.expr.types.Table
Lazy table expression; use collect() to materialize.
sample
Cdm.sample(n, seed=None, name='person_sample')
Subset the CDM to a random sample of n persons.
Persons are drawn from the person table. The sample table is inserted into the write schema under the given name and all clinical tables are filtered to those persons.
Parameters
Name
Type
Description
Default
n
int
Number of persons to include.
required
seed
int or None
Random seed for reproducibility; None uses a random seed.
None
name
str
Name of the table storing the sampled person_ids (default “person_sample”).
'person_sample'
Returns
Name
Type
Description
Cdm
New CDM with tables subset to the sampled persons (and the sample table added).
select_tables
Cdm.select_tables(*names)
Return a new Cdm with only the given tables (subset).
Parameters
Name
Type
Description
Default
*names
str
Table names to keep.
()
Returns
Name
Type
Description
Cdm
New CDM with subset of tables.
Raises
Name
Type
Description
TableNotFoundError
If any name is not in this CDM.
snapshot
Cdm.snapshot(compute_data_hash=False)
Snapshot CDM metadata. Executes and returns a 1-row pandas DataFrame.
Call with parentheses: cdm.snapshot(). Without () you get the method object, not the DataFrame.
Requires person, observation_period, cdm_source, and vocabulary tables. Column order matches R CDMConnector snapshot(): cdm_name, cdm_source_name, cdm_description, cdm_documentation_reference, cdm_version, cdm_holder, cdm_release_date, vocabulary_version, person_count, observation_period_count, earliest_observation_period_start_date, latest_observation_period_end_date, snapshot_date, cdm_data_hash.
Parameters
Name
Type
Description
Default
compute_data_hash
bool
If True, include data hash in snapshot (default False).
False
Returns
Name
Type
Description
pandas.DataFrame
Exactly one row (1-row DataFrame) with CDM metadata.
subset
Cdm.subset(person_id)
Subset the CDM to a set of person IDs.
Returns a new CDM where all clinical tables are filtered to rows whose person_id (or subject_id) is in the given set. Requires a database-backed CDM with write_schema.
Subset the CDM to individuals in one or more cohorts.
Returns a new CDM where all clinical tables are filtered to persons present in the given cohort table (optionally restricted by cohort_id). Subset is lazy until tables are used.
Parameters
Name
Type
Description
Default
cohort_table
str
Name of the cohort table in this CDM (default “cohort”).
'cohort'
cohort_id
int or list[int] or None
Cohort definition ID(s) to include; None uses all cohorts in the table.