Exporting Results and Tables

Export results to CSV/Parquet, save rendered tables (HTML with great-tables), and bundle outputs for reproducibility.

You will learn

  • How to export query results to ./outputs/ (CSV, Parquet)
  • How to save rendered tables (HTML if great-tables installed)
  • A simple “results bundle” pattern for reproducible outputs

Story question

“How do we save analysis results and tables for sharing or reporting?”


Setup

GiBleed; we create an outputs directory and export a few results.

from pathlib import Path
import cdmconnector as cc
import ibis

OUTPUT_DIR = Path("./outputs")
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

path = cc.eunomia_dir("GiBleed", cdm_version="5.3")
con = ibis.duckdb.connect(path)
cdm = cc.cdm_from_con(con, cdm_schema="main", write_schema="main", cdm_name="eunomia")

Explore: Export to CSV and Parquet

Run a small query, collect to DataFrame, then write to CSV and Parquet.

person = cdm.person
result = person.select("person_id", "gender_concept_id", "year_of_birth").limit(100)
df = cc.collect(result)
df.to_csv(OUTPUT_DIR / "person_preview.csv", index=False)
df.to_parquet(OUTPUT_DIR / "person_preview.parquet", index=False)
print("Exported to", OUTPUT_DIR)

Build: Rendered tables (HTML with great-tables)

If great-tables is installed, build a GT table and save as HTML; otherwise save a plain CSV.

try:
    from great_tables import GT
    gt_tbl = GT(df)
    html_path = OUTPUT_DIR / "person_preview.html"
    gt_tbl.save(html_path)
    print("Saved HTML table to", html_path)
except ImportError:
    print("great-tables not installed; skipping HTML export. Use CSV/Parquet.")
except Exception as e:
    print("great-tables save failed:", e)

Interpret: Results bundle pattern

Create a timestamped or run-named folder and write all outputs there (cohort membership, top conditions, Table 1) for reproducibility.

from datetime import datetime

run_name = datetime.now().strftime("%Y%m%d_%H%M%S")
bundle_dir = OUTPUT_DIR / run_name
bundle_dir.mkdir(parents=True, exist_ok=True)
# Export cohort membership (skeleton: use actual cohort table if available)
# cohort_df = cc.collect(cdm["cohort"].limit(100))
# cohort_df.to_csv(bundle_dir / "cohort_membership.csv", index=False)
# Export top conditions (skeleton)
# top_cond_df.to_csv(bundle_dir / "top_conditions.csv", index=False)
df.to_csv(bundle_dir / "person_preview.csv", index=False)
print("Results bundle:", bundle_dir)

Exercises

  • Export cohort membership (subject_id, cohort_start_date, cohort_end_date) to CSV and Parquet.
  • Export “top conditions” table (concept_id, concept_name, n) to CSV; if great-tables installed, also to HTML.
  • Add a README.txt in the bundle with run timestamp and dataset name (e.g. GiBleed).

What we learned

  • CSV/Parquet: collect expression to DataFrame, then .to_csv() / .to_parquet() to ./outputs/.
  • HTML tables: great_tables.GT(df).save(path) when installed; guard with try/except.
  • Results bundle: one folder per run with all outputs for reproducibility.

Cleanup

cdm.disconnect()