Summarise database characteristics • OmopSketch

Introduction

In this vignette, we explore how the OmopSketch function databaseCharacteristics() and shinyCharacteristics() can serve as a valuable tool for characterising databases containing electronic health records mapped to the OMOP Common Data Model.

Create a mock CDM

We begin by loading the necessary packages and creating a mock CDM using the R package omock:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(OmopSketch)
library(omock)

cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb")
#> ℹ Reading GiBleed tables.
#> ℹ Adding drug_strength table.
#> ℹ Creating local <cdm_reference> object.
#> ℹ Inserting <cdm_reference> into duckdb.

cdm
#> 
#> ── # OMOP CDM reference (duckdb) of GiBleed ────────────────────────────────────
#> • omop tables: care_site, cdm_source, concept, concept_ancestor, concept_class,
#> concept_relationship, concept_synonym, condition_era, condition_occurrence,
#> cost, death, device_exposure, domain, dose_era, drug_era, drug_exposure,
#> drug_strength, fact_relationship, location, measurement, metadata, note,
#> note_nlp, observation, observation_period, payer_plan_period, person,
#> procedure_occurrence, provider, relationship, source_to_concept_map, specimen,
#> visit_detail, visit_occurrence, vocabulary
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -

Database characteristics

Summarise Characteristics

The databaseCharacteristics() function provides a comprehensive overview of the Common Data Model (CDM). It returns a summarised result combining several characterisation components:

General database snapshot:
Generated using summariseOmopSnapshot(), this provides high-level metadata about the CDM, including size of person table, time span covered, source type, vocabulary version, etc.
Population characterisation:
Describes the demographics of population under observation, built using the CohortConstructor and CohortCharacteristics packages.
Person table characterisation:
Produced using summarisePerson(), this component summarises the content and missingness of the person table.
Observation period characterisation:
Produced using summariseObservationPeriod(), this component summarises the content and missingness of the observation period table.
Temporal trends — including changes in the number of records and subjects, median age, sex distribution, and total person-days — are then derived using summariseTrend().
Clinical tables characterisation:
Produced using summariseClinicalRecords(), this component summarises the content and missingness across all clinical tables.
Temporal trends in the number of records and subjects, median age, and sex distribution are also computed using summariseTrend().
Concept Counts: Optionally, concept-level summaries can be included by computing concept counts with summariseConceptIdCounts().

Together, these outputs provide a holistic view of the CDM’s structure, data completeness, and temporal behaviour — supporting both data quality assessment and study feasibility evaluation.

result <- databaseCharacteristics(cdm = cdm)

Selecting tables to characterise

By default, the following OMOP tables are included in the characterisation: visit_occurrence, visit_detail, condition_occurrence, drug_exposure, procedure_occurrence, device_exposure, measurement, observation, death.

You can customise which tables to include in the analysis by specifying them with the omopTableName argument.

result <- databaseCharacteristics(
  cdm = cdm, 
  omopTableName = c("drug_exposure", "condition_occurrence")
)

Stratifying by Sex

To stratify the characterisation results by sex, set the sex argument to TRUE:

result <- databaseCharacteristics(
  cdm = cdm,
  omopTableName = c("drug_exposure", "condition_occurrence"),
  sex = TRUE
)

Stratifying by Age Group

You can choose to characterise the data stratifying by age group by creating a list defining the age groups you want to use.

result <- databaseCharacteristics(
  cdm = cdm,
  omopTableName = c("drug_exposure", "condition_occurrence"),
  ageGroup = list(c(0, 50), c(51, 100))
)

Filtering by date range and time interval

Use the dateRange argument to limit the analysis to a specific period. Combine it with the interval argument to stratify results by time. Valid values for interval include “overall” (default), “years”, “quarters”, and “months”:

result <- databaseCharacteristics(
  cdm = cdm,
  interval = "years",
  dateRange = as.Date(c("2010-01-01", "2018-12-31"))
)

Sample the CDM

You can use the sample argument to limit the characterisation to a subset of the CDM.
This can be useful for quickly exploring large datasets or focusing on a specific cohort already included in the CDM.

The sample argument accepts either:

An integer, to randomly sample a specified number of people from the person table in the CDM.
A string, corresponding to the name of a cohort within the CDM to use for characterisation.

result <- databaseCharacteristics(
  cdm = cdm,
  sample = 1000L
)

result <- databaseCharacteristics(
  cdm = cdm,
  sample = "my_cohort"
)

Including Concept Counts

To include concept counts in the characterisation, set conceptIdCounts = TRUE:

result <- databaseCharacteristics(
  cdm = cdm,
  conceptIdCounts = TRUE
)

Other arguments

It is possible to pass arguments from any of the underlying functions to databaseCharacteristics() in order to customise the output. For example, to stratify trends and concept counts by records observed in or out of observation, you can pass the argument inObservation = TRUE:

result <- databaseCharacteristics(
  cdm = cdm,
  conceptIdCounts = TRUE, 
  inObservation = TRUE
)

Visualise the characterisation results

To explore the characterisation results interactively, you can use the shinyCharacteristics() function. This function generates a Shiny application in the specified directory, allowing you to browse, filter, and visualise the results through an intuitive user interface.

shinyCharacteristics(result = result, directory = "path/to/your/shiny")

Customise the Shiny App

You can customise the title, logo, and theme of the Shiny app by setting the appropriate arguments:

title: The title displayed at the top of the app
logo: Path to a custom logo (must be in SVG format)
theme: One of the available OmopViewer themes.
background: A custom background panel for the Shiny app

shinyCharacteristics(
  result = result, 
  directory = "path/to/my/shiny",
  title = "Characterisation of my data",
  logo = "path/to/my/logo.svg",
  theme = "scarlet", 
  background = "path/to/my/background.md"
)

An example of the Shiny application generated by shinyCharacteristics() can be explored here, where the characterisation of several synthetic datasets is available.

Disconnect from CDM

Finally, disconnect from the mock CDM.

cdmDisconnect(cdm = cdm)