from pathlib import Path
import cdmconnector as cc
import ibis
path = cc.eunomia_dir("synpuf-1k", cdm_version="5.3")
con = ibis.duckdb.connect(path)
cdm = cc.cdm_from_con(con, cdm_schema="main", write_schema="main", cdm_name="eunomia")Cohort Characterization: Table 1
Compute a Table 1 (demographics + baseline comorbidities) from a cohort table using synpuf-1k.
You will learn
- How to take a cohort table / membership set and join to person and observation_period
- How to compute Table 1: sex, age, observation period duration
- How to add baseline comorbidities (top 10 conditions in baseline window)
- Optional: pretty output via great-tables (guard with try/except and fallback to plain DataFrame)
Story question
“What do cohort members look like (demographics) and what are the top baseline conditions?”
Setup
synpuf-1k; we assume a cohort table exists (e.g. from generate_cohort_set or a pre-built cohort). For skeleton we use the first N persons as a “cohort” or an existing cohort table.
Explore: Cohort membership
If a cohort table exists (e.g. “cohort”), use it. Otherwise define a minimal cohort set (e.g. first 100 person_ids with observation_period) for demo.
# Option A: use existing cohort table if present
# cohort_members = cdm["cohort"]
# Option B: define demo cohort = persons with at least one observation_period (limit 100)
person = cdm.person
op = cdm.observation_period
# Simplified: cohort_demo = persons with observation_period and index_date = obs start
cohort_demo = op.group_by(op.person_id).aggregate(
index_date=op.observation_period_start_date.min(),
obs_end=op.observation_period_end_date.max(),
).limit(100)
cc.collect(cohort_demo.limit(5))Build: Table 1 — sex, age, observation period duration
Join cohort to person and observation_period; compute age at index, sex (join to concept), and obs period duration.
person = cdm.person
concept = cdm.concept
ref_year = 2020
cohort_with_person = cohort_demo.join(person, cohort_demo.person_id == person.person_id, how="left")
cohort_with_person = cohort_with_person.mutate(
age_at_index=ref_year - cohort_with_person.year_of_birth
)
cohort_with_sex = cohort_with_person.join(
concept,
cohort_with_person.gender_concept_id == concept.concept_id,
how="left",
)
table1_expr = cohort_with_sex.aggregate(
n=cohort_with_sex.person_id.count(),
mean_age=cohort_with_sex.age_at_index.mean(),
median_age=cohort_with_sex.age_at_index.median(),
)
# Duration: use obs_end - index_date from cohort_demo
duration_days = cohort_demo.obs_end - cohort_demo.index_date
duration_summary = cohort_demo.mutate(duration_days=duration_days).aggregate(
mean_duration_days=duration_days.mean(),
)
cc.collect(table1_expr)
cc.collect(duration_summary)Interpret: Baseline comorbidities (top 10 conditions)
Conditions in the baseline window (e.g. before index_date). Join to concept for names.
cond = cdm.condition_occurrence
concept = cdm.concept
# Conditions that start before cohort index_date (baseline)
baseline_cond = cond.join(cohort_demo, cond.person_id == cohort_demo.person_id, how="inner")
baseline_cond = baseline_cond.filter(cond.condition_start_date < cohort_demo.index_date) # baseline = before index
top_conditions = (
baseline_cond
.group_by(cond.condition_concept_id)
.aggregate(n=cond.condition_occurrence_id.count())
.order_by(ibis.desc("n"))
.limit(10)
)
top_with_name = top_conditions.join(
concept,
top_conditions.condition_concept_id == concept.concept_id,
how="left",
)
cc.collect(top_with_name.select("condition_concept_id", "concept_name", "n"))Optional: Pretty output with great-tables
If great-tables is installed, render Table 1 as a styled table; otherwise show plain DataFrame.
try:
from great_tables import GT
# df = cc.collect(table1_expr) # or combine demographics into one DataFrame
# GT(df) # render as HTML table
pass
except ImportError:
pass # fallback: use plain pandas DataFrame display
# For skeleton we skip actual GT() call; use cc.collect(...) and display(df)Exercises
- Add more Table 1 columns: race, ethnicity (join to concept), and year of birth distribution.
- Define baseline as -365 to 0 days before index and recompute top conditions.
- Export Table 1 to CSV and (if great-tables) to HTML.
What we learned
- Table 1: join cohort to person and observation_period; compute sex (concept), age at index, obs period duration.
- Baseline comorbidities: filter condition_occurrence to before index_date; aggregate by condition_concept_id; join to concept for names.
- great-tables: optional pretty tables; guard with try/except and fall back to plain DataFrame.
Cleanup
cdm.disconnect()