from pathlib import Path
import cdmconnector as cc
import ibis
OUTPUT_DIR = Path("./outputs")
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
path = cc.eunomia_dir("GiBleed", cdm_version="5.3")
con = ibis.duckdb.connect(path)
cdm = cc.cdm_from_con(con, cdm_schema="main", write_schema="main", cdm_name="eunomia")Exporting Results and Tables
Export results to CSV/Parquet, save rendered tables (HTML with great-tables), and bundle outputs for reproducibility.
You will learn
- How to export query results to ./outputs/ (CSV, Parquet)
- How to save rendered tables (HTML if great-tables installed)
- A simple “results bundle” pattern for reproducible outputs
Story question
“How do we save analysis results and tables for sharing or reporting?”
Setup
GiBleed; we create an outputs directory and export a few results.
Explore: Export to CSV and Parquet
Run a small query, collect to DataFrame, then write to CSV and Parquet.
person = cdm.person
result = person.select("person_id", "gender_concept_id", "year_of_birth").limit(100)
df = cc.collect(result)
df.to_csv(OUTPUT_DIR / "person_preview.csv", index=False)
df.to_parquet(OUTPUT_DIR / "person_preview.parquet", index=False)
print("Exported to", OUTPUT_DIR)Build: Rendered tables (HTML with great-tables)
If great-tables is installed, build a GT table and save as HTML; otherwise save a plain CSV.
try:
from great_tables import GT
gt_tbl = GT(df)
html_path = OUTPUT_DIR / "person_preview.html"
gt_tbl.save(html_path)
print("Saved HTML table to", html_path)
except ImportError:
print("great-tables not installed; skipping HTML export. Use CSV/Parquet.")
except Exception as e:
print("great-tables save failed:", e)Interpret: Results bundle pattern
Create a timestamped or run-named folder and write all outputs there (cohort membership, top conditions, Table 1) for reproducibility.
from datetime import datetime
run_name = datetime.now().strftime("%Y%m%d_%H%M%S")
bundle_dir = OUTPUT_DIR / run_name
bundle_dir.mkdir(parents=True, exist_ok=True)
# Export cohort membership (skeleton: use actual cohort table if available)
# cohort_df = cc.collect(cdm["cohort"].limit(100))
# cohort_df.to_csv(bundle_dir / "cohort_membership.csv", index=False)
# Export top conditions (skeleton)
# top_cond_df.to_csv(bundle_dir / "top_conditions.csv", index=False)
df.to_csv(bundle_dir / "person_preview.csv", index=False)
print("Results bundle:", bundle_dir)Exercises
- Export cohort membership (subject_id, cohort_start_date, cohort_end_date) to CSV and Parquet.
- Export “top conditions” table (concept_id, concept_name, n) to CSV; if great-tables installed, also to HTML.
- Add a README.txt in the bundle with run timestamp and dataset name (e.g. GiBleed).
What we learned
- CSV/Parquet: collect expression to DataFrame, then .to_csv() / .to_parquet() to ./outputs/.
- HTML tables: great_tables.GT(df).save(path) when installed; guard with try/except.
- Results bundle: one folder per run with all outputs for reproducibility.
Cleanup
cdm.disconnect()