
Summarise database characteristics
Source:vignettes/database_characteristics.Rmd
database_characteristics.RmdIntroduction
In this vignette, we explore how the OmopSketch function
databaseCharacteristics() and
shinyCharacteristics() can serve as a valuable tool for
characterising databases containing electronic health records mapped to
the OMOP Common Data Model.
Create a mock CDM
We begin by loading the necessary packages and creating a mock CDM using the R package omock:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(OmopSketch)
library(omock)
cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb")
#> ℹ Reading GiBleed tables.
#> ℹ Adding drug_strength table.
#> ℹ Creating local <cdm_reference> object.
#> ℹ Inserting <cdm_reference> into duckdb.
cdm
#>
#> ── # OMOP CDM reference (duckdb) of GiBleed ────────────────────────────────────
#> • omop tables: care_site, cdm_source, concept, concept_ancestor, concept_class,
#> concept_relationship, concept_synonym, condition_era, condition_occurrence,
#> cost, death, device_exposure, domain, dose_era, drug_era, drug_exposure,
#> drug_strength, fact_relationship, location, measurement, metadata, note,
#> note_nlp, observation, observation_period, payer_plan_period, person,
#> procedure_occurrence, provider, relationship, source_to_concept_map, specimen,
#> visit_detail, visit_occurrence, vocabulary
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -Database characteristics
Summarise Characteristics
The databaseCharacteristics() function provides a
comprehensive overview of the Common Data Model (CDM). It returns a summarised
result combining several characterisation components:
General database snapshot:
Generated usingsummariseOmopSnapshot(), this provides high-level metadata about the CDM, including size of person table, time span covered, source type, vocabulary version, etc.Population characterisation:
Describes the demographics of population under observation, built using the CohortConstructor and CohortCharacteristics packages.Person table characterisation:
Produced usingsummarisePerson(), this component summarises the content and missingness of thepersontable.Observation period characterisation:
Produced usingsummariseObservationPeriod(), this component summarises the content and missingness of the observation period table.
Temporal trends — including changes in the number of records and subjects, median age, sex distribution, and total person-days — are then derived usingsummariseTrend().Clinical tables characterisation:
Produced usingsummariseClinicalRecords(), this component summarises the content and missingness across all clinical tables.
Temporal trends in the number of records and subjects, median age, and sex distribution are also computed usingsummariseTrend().Concept Counts: Optionally, concept-level summaries can be included by computing concept counts with
summariseConceptIdCounts().
Together, these outputs provide a holistic view of the CDM’s structure, data completeness, and temporal behaviour — supporting both data quality assessment and study feasibility evaluation.
result <- databaseCharacteristics(cdm = cdm)Selecting tables to characterise
By default, the following OMOP tables are included in the characterisation: visit_occurrence, visit_detail, condition_occurrence, drug_exposure, procedure_occurrence, device_exposure, measurement, observation, death.
You can customise which tables to include in the analysis by
specifying them with the omopTableName argument.
result <- databaseCharacteristics(
cdm = cdm,
omopTableName = c("drug_exposure", "condition_occurrence")
)Stratifying by Sex
To stratify the characterisation results by sex, set the
sex argument to TRUE:
result <- databaseCharacteristics(
cdm = cdm,
omopTableName = c("drug_exposure", "condition_occurrence"),
sex = TRUE
)Stratifying by Age Group
You can choose to characterise the data stratifying by age group by creating a list defining the age groups you want to use.
result <- databaseCharacteristics(
cdm = cdm,
omopTableName = c("drug_exposure", "condition_occurrence"),
ageGroup = list(c(0, 50), c(51, 100))
)Filtering by date range and time interval
Use the dateRange argument to limit the analysis to a
specific period. Combine it with the interval argument to
stratify results by time. Valid values for interval include “overall”
(default), “years”, “quarters”, and “months”:
result <- databaseCharacteristics(
cdm = cdm,
interval = "years",
dateRange = as.Date(c("2010-01-01", "2018-12-31"))
)Sample the CDM
You can use the sample argument to limit the
characterisation to a subset of the CDM.
This can be useful for quickly exploring large datasets or focusing on a
specific cohort already included in the CDM.
The sample argument accepts either:
- An integer, to randomly sample a specified number of people from the person table in the CDM.
- A string, corresponding to the name of a cohort within the CDM to use for characterisation.
result <- databaseCharacteristics(
cdm = cdm,
sample = 1000L
)
result <- databaseCharacteristics(
cdm = cdm,
sample = "my_cohort"
)Including Concept Counts
To include concept counts in the characterisation, set
conceptIdCounts = TRUE:
result <- databaseCharacteristics(
cdm = cdm,
conceptIdCounts = TRUE
)Other arguments
It is possible to pass arguments from any of the underlying functions
to databaseCharacteristics() in order to customise the
output. For example, to stratify trends and concept counts by records
observed in or out of observation, you can pass the argument
inObservation = TRUE:
result <- databaseCharacteristics(
cdm = cdm,
conceptIdCounts = TRUE,
inObservation = TRUE
)Visualise the characterisation results
To explore the characterisation results interactively, you can use
the shinyCharacteristics() function. This function
generates a Shiny application in the specified directory,
allowing you to browse, filter, and visualise the results through an
intuitive user interface.
shinyCharacteristics(result = result, directory = "path/to/your/shiny")Customise the Shiny App
You can customise the title, logo, and theme of the Shiny app by setting the appropriate arguments:
title: The title displayed at the top of the applogo: Path to a custom logo (must be in SVG format)theme: One of the availableOmopViewerthemes.background: A custom background panel for the Shiny app
shinyCharacteristics(
result = result,
directory = "path/to/my/shiny",
title = "Characterisation of my data",
logo = "path/to/my/logo.svg",
theme = "scarlet",
background = "path/to/my/background.md"
)An example of the Shiny application generated by
shinyCharacteristics() can be explored here,
where the characterisation of several synthetic datasets is
available.