Skip to contents

Summarise Database Characteristics for OMOP CDM

Usage

databaseCharacteristics(
  cdm,
  omopTableName = c("visit_occurrence", "visit_detail", "condition_occurrence",
    "drug_exposure", "procedure_occurrence", "device_exposure", "measurement",
    "observation", "death"),
  sample = NULL,
  sex = FALSE,
  ageGroup = NULL,
  dateRange = NULL,
  interval = "overall",
  conceptIdCounts = FALSE,
  ...
)

Arguments

cdm

A cdm_reference object. Use CDMConnector to create a reference to a database or omock to create a reference to synthetic data.

omopTableName

A character vector of the names of the tables to summarise in the cdm object. Run clinicalTables() to check the available options.

sample

Either an integer or a character string.

  • If an integer (n > 0), the function will first sample n distinct person_ids from the person table and then subset the input tables to those subjects.

  • If a character string, it must be the name of a cohort in the cdm; in this case, the input tables are subset to subjects (subject_id) belonging to that cohort.

  • Use NULL to disable subsetting (default value).

sex

Logical; whether to stratify results by sex (TRUE) or not (FALSE).

ageGroup

A list of age groups to stratify the results by. Each element represents a specific age range. You can give them specific names, e.g. ageGroup = list(children = c(0, 17), adult = c(18, Inf)).

dateRange

A vector of two dates defining the desired study period. Only the start_date column of the OMOP table is checked to ensure it falls within this range. If dateRange is NULL, no restriction is applied.

interval

Time interval to stratify by. It can either be "years", "quarters", "months" or "overall".

conceptIdCounts

Logical; whether to summarise concept ID counts (TRUE) or not (FALSE).

...

additional arguments passed to the OmopSketch functions that are used internally.

Value

A summarised_result object with the results.

Examples

# \donttest{
library(OmopSketch)
library(omock)
library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union
library(here)
#> here() starts at /home/runner/work/OmopSketch/OmopSketch

cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb")
#>  Reading GiBleed tables.
#>  Adding drug_strength table.
#>  Creating local <cdm_reference> object.
#>  Inserting <cdm_reference> into duckdb.

result <- databaseCharacteristics(
  cdm = cdm,
  sample = 100,
  omopTableName = c("drug_exposure", "condition_occurrence"),
  sex = TRUE,
  ageGroup = list(c(0, 50), c(51, 100)),
  interval = "years",
  conceptIdCounts = FALSE
)
#> The characterisation will focus on the following OMOP tables: drug_exposure and
#> condition_occurrence
#> The cdm is sampled to 100
#> → Getting cdm snapshot
#> → Getting population characteristics
#>  Building new trimmed cohort
#> Adding demographics information
#> Creating initial cohort
#> Trim sex
#>  Cohort trimmed
#>  Building new trimmed cohort
#> Adding demographics information
#> Creating initial cohort
#> Trim sex
#> Trim age
#>  Cohort trimmed
#>  adding demographics columns
#>  summarising data
#>  summarising cohort general_population
#>  summarising cohort age_group_0_50
#>  summarising cohort age_group_51_100
#>  summariseCharacteristics finished!
#> → Summarising person table
#> → Summarising clinical records
#>  Adding variables of interest to drug_exposure.
#>  Summarising records per person in drug_exposure.
#>  Summarising subjects not in person table in drug_exposure.
#>  Summarising records in observation in drug_exposure.
#>  Summarising records with start before birth date in drug_exposure.
#>  Summarising records with end date before start date in drug_exposure.
#>  Summarising domains in drug_exposure.
#>  Summarising standard concepts in drug_exposure.
#>  Summarising source vocabularies in drug_exposure.
#>  Summarising concept types in drug_exposure.
#>  Summarising concept class in drug_exposure.
#>  Summarising missing data in drug_exposure.
#>  Adding variables of interest to condition_occurrence.
#>  Summarising records per person in condition_occurrence.
#>  Summarising subjects not in person table in condition_occurrence.
#>  Summarising records in observation in condition_occurrence.
#>  Summarising records with start before birth date in condition_occurrence.
#>  Summarising records with end date before start date in condition_occurrence.
#>  Summarising domains in condition_occurrence.
#>  Summarising standard concepts in condition_occurrence.
#>  Summarising source vocabularies in condition_occurrence.
#>  Summarising concept types in condition_occurrence.
#>  Summarising missing data in condition_occurrence.
#> → Summarising observation period
#> → Summarising trends: records, subjects, person-days, age and sex
#> → The number of person-days is not computed for event tables
#> ☺ Database characterisation finished. Code ran in 1 min and 4 sec
#>  1 table created: "person_sample".

result |>
  glimpse()
#> Rows: 73,945
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
#> $ cdm_name         <chr> "GiBleed", "GiBleed", "GiBleed", "GiBleed", "GiBleed"…
#> $ group_name       <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ group_level      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "general", "general", "general", "cdm", "cdm", "cdm",
#> $ variable_level   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N
#> $ estimate_name    <chr> "snapshot_date", "person_count", "vocabulary_version"…
#> $ estimate_type    <chr> "date", "integer", "character", "character", "charact…
#> $ estimate_value   <chr> "2025-11-21", "100", "v5.0 18-JAN-19", "Synthea synth…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…

shinyCharacteristics(result = result, directory = here())
#>  Creating shiny from provided results.
#> Warning: ! 2 packages are not installed: plotly and shinycssloaders.

cdmDisconnect(cdm = cdm)
# }