Skip to contents

Introduction

In this vignette, we will explore the OmopSketch functions designed to summarise concept use in clinical OMOP tables. Specifically, there are two key functions:

Create a mock cdm

Let’s see an example of these functions. To start with, we will load essential packages and create a mock CDM using the R package omock.

library(OmopSketch)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(omock)

cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb")
#>  Loading bundled GiBleed tables from package data.
#>  Adding drug_strength table.
#>  Creating local <cdm_reference> object.
#>  Inserting <cdm_reference> into duckdb.

cdm
#> 
#> ── # OMOP CDM reference (duckdb) of GiBleed ────────────────────────────────────
#> • omop tables: care_site, cdm_source, concept, concept_ancestor, concept_class,
#> concept_relationship, concept_synonym, condition_era, condition_occurrence,
#> cost, death, device_exposure, domain, dose_era, drug_era, drug_exposure,
#> drug_strength, fact_relationship, location, measurement, metadata, note,
#> note_nlp, observation, observation_period, payer_plan_period, person,
#> procedure_occurrence, provider, relationship, source_to_concept_map, specimen,
#> visit_detail, visit_occurrence, vocabulary
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -

Summarise concept level counts

We now use the summariseConceptIdCounts() function from the OmopSketch package to retrieve counts for each standard concept ID and name, together with the associated source concept ID, source concept name, standard vocabulary, and source vocabulary.

summariseConceptIdCounts(cdm = cdm, omopTableName = "drug_exposure") |>
  select(group_level, variable_name, variable_level, estimate_name, estimate_value, additional_name, additional_level) |>
  glimpse()
#> Rows: 113
#> Columns: 7
#> $ group_level      <chr> "drug_exposure", "drug_exposure", "drug_exposure", "d…
#> $ variable_name    <chr> "Aspirin 81 MG Oral Tablet", "hepatitis A vaccine, ad…
#> $ variable_level   <chr> "19059056", "40213296", "40213227", "40173590", "1912…
#> $ estimate_name    <chr> "count_records", "count_records", "count_records", "c…
#> $ estimate_value   <chr> "4380", "3211", "7430", "129", "1056", "1719", "306",
#> $ additional_name  <chr> "source_concept_id &&& source_concept_name &&& standa…
#> $ additional_level <chr> "19059056 &&& Aspirin 81 MG Oral Tablet &&& RxNorm &&…

By default, the function returns the number of records (estimate_name == "count_records") for each standard concept ID. To count distinct subjects instead, set countBy = "person". To return both record and subject counts, use countBy = c("record", "person").

summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "drug_exposure",
  countBy = c("record", "person")
) |>
  select(variable_name, estimate_name, estimate_value)
#> # A tibble: 226 × 3
#>    variable_name                              estimate_name  estimate_value
#>    <chr>                                      <chr>          <chr>         
#>  1 poliovirus vaccine, inactivated            count_records  7977          
#>  2 poliovirus vaccine, inactivated            count_subjects 2140          
#>  3 celecoxib                                  count_records  1844          
#>  4 celecoxib                                  count_subjects 1844          
#>  5 Penicillin G 375 MG/ML Injectable Solution count_records  1142          
#>  6 Penicillin G 375 MG/ML Injectable Solution count_subjects 831           
#>  7 Amoxicillin 250 MG Oral Capsule            count_records  205           
#>  8 Amoxicillin 250 MG Oral Capsule            count_subjects 157           
#>  9 Diclofenac                                 count_records  850           
#> 10 Diclofenac                                 count_subjects 850           
#> # ℹ 216 more rows

Further stratification can be applied using the interval, sex, and ageGroup arguments. The interval argument supports "overall" (no time stratification), "years", "quarters", or "months". Age groups and time intervals are assigned using the clinical record start date.

summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "condition_occurrence",
  countBy = "person",
  interval = "years",
  sex = TRUE,
  ageGroup = list("<=50" = c(0, 50), ">50" = c(51, Inf))
) |>
  select(group_level, strata_level, variable_name, estimate_name, additional_level) |>
  glimpse()
#> Rows: 28,358
#> Columns: 5
#> $ group_level      <chr> "condition_occurrence", "condition_occurrence", "cond…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "Gastrointestinal hemorrhage", "Acute viral pharyngit…
#> $ estimate_name    <chr> "count_subjects", "count_subjects", "count_subjects",
#> $ additional_level <chr> "35208414 &&& Gastrointestinal hemorrhage, unspecifie…

When inObservation = TRUE, results are stratified by whether each record occurred within the subject’s observation period.

summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "condition_occurrence",
  countBy = "record",
  inObservation = TRUE
) |>
  select(variable_name, strata_name, strata_level, estimate_name, estimate_value) |>
  glimpse()
#> Rows: 167
#> Columns: 5
#> $ variable_name  <chr> "Sprain of wrist", "Osteoarthritis", "Escherichia coli …
#> $ strata_name    <chr> "overall", "overall", "overall", "overall", "overall", 
#> $ strata_level   <chr> "overall", "overall", "overall", "overall", "overall", 
#> $ estimate_name  <chr> "count_records", "count_records", "count_records", "cou…
#> $ estimate_value <chr> "770", "2694", "482", "157", "464", "102", "322", "46",

We can also filter the clinical table to a specific time window by setting the dateRange argument. Only records with a start date inside the date range are included.

summarisedResult <- summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "condition_occurrence",
  dateRange = as.Date(c("1990-01-01", "2010-01-01"))
)
summarisedResult |>
  settings() |>
  glimpse()
#> Rows: 1
#> Columns: 10
#> $ result_id          <int> 1
#> $ result_type        <chr> "summarise_concept_id_counts"
#> $ package_name       <chr> "OmopSketch"
#> $ package_version    <chr> "1.1.0"
#> $ group              <chr> "omop_table"
#> $ strata             <chr> ""
#> $ additional         <chr> "source_concept_id &&& source_concept_name &&& stan…
#> $ min_cell_count     <chr> "0"
#> $ study_period_end   <chr> "2010-01-01"
#> $ study_period_start <chr> "1990-01-01"

You can restrict concept counts to a subset of subjects with the sample argument. Provide an integer to randomly select that many person_ids from the person table, or provide the name of a cohort table in the CDM to restrict counts to its subject_ids.

summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "condition_occurrence",
  sample = 50
) |>
  select(group_level, variable_name, estimate_name) |>
  glimpse()
#> Rows: 62
#> Columns: 3
#> $ group_level   <chr> "condition_occurrence", "condition_occurrence", "conditi…
#> $ variable_name <chr> "Viral sinusitis", "Osteoarthritis", "Fracture subluxati…
#> $ estimate_name <chr> "count_records", "count_records", "count_records", "coun…

Display the results

Concept counts can be displayed using tableConceptIdCounts(). By default, it generates an interactive reactable table, but DT datatables are also supported.

result <- summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "measurement",
  countBy = "record"
)
tableConceptIdCounts(result = result, type = "reactable")
tableConceptIdCounts(result = result, type = "datatable")

The display argument in tableConceptIdCounts() controls which concept columns are shown. The default is display = "overall", which shows both standard and source concept information.

tableConceptIdCounts(result = result, display = "overall")

If display = "standard", the table shows only standard concept ID, standard concept name, and standard vocabulary.

tableConceptIdCounts(result = result, display = "standard")

If display = "source", the table shows only source concept ID, source concept name, and source vocabulary.

tableConceptIdCounts(result = result, display = "source")

If display = "missing source", the table shows standard concepts for records that are missing a corresponding source concept ID.

tableConceptIdCounts(result = result, display = "missing source")
#> Warning in max(dplyr::pull(dplyr::tally(dplyr::group_by(result,
#> dplyr::across(-c("estimate_value")))), : no non-missing arguments to max;
#> returning -Inf

If display = "missing standard", the table shows source concepts for records that are missing a mapped standard concept ID.

tableConceptIdCounts(result = result, display = "missing standard")
#> Warning in max(dplyr::pull(dplyr::tally(dplyr::group_by(result,
#> dplyr::across(-c("estimate_value")))), : no non-missing arguments to max;
#> returning -Inf

Display the most frequent concepts

You can use the tableTopConceptCounts() function to display the most frequent concepts in an OMOP CDM table. By default, the function returns a gt table, but other formats supported by visOmopResults::tableType() can also be used.

result <- summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "drug_exposure",
  countBy = "record"
)
tableTopConceptCounts(result = result, type = "gt")
Top 10 concepts in drug_exposure table ranked by record count
Top
Data source
GiBleed
drug_exposure
1 Standard: Acetaminophen 325 MG Oral Tablet (1127433 - RxNorm)
Source: Acetaminophen 325 MG Oral Tablet (1127433 - RxNorm)
9365
2 Standard: poliovirus vaccine, inactivated (40213160 - CVX)
Source: poliovirus vaccine, inactivated (40213160 - CVX)
7977
3 Standard: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227 - CVX)
Source: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227 - CVX)
7430
4 Standard: Aspirin 81 MG Oral Tablet (19059056 - RxNorm)
Source: Aspirin 81 MG Oral Tablet (19059056 - RxNorm)
4380
5 Standard: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671 - RxNorm)
Source: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671 - RxNorm)
3851
6 Standard: hepatitis A vaccine, adult dosage (40213296 - CVX)
Source: hepatitis A vaccine, adult dosage (40213296 - CVX)
3211
7 Standard: Acetaminophen 160 MG Oral Tablet (1127078 - RxNorm)
Source: Acetaminophen 160 MG Oral Tablet (1127078 - RxNorm)
2158
8 Standard: zoster vaccine, live (40213260 - CVX)
Source: zoster vaccine, live (40213260 - CVX)
2125
9 Standard: Acetaminophen 21.7 MG/ML / Dextromethorphan Hydrobromide 1 MG/ML / doxylamine succinate 0.417 MG/ML Oral Solution (40229134 - RxNorm)
Source: Acetaminophen 21.7 MG/ML / Dextromethorphan Hydrobromide 1 MG/ML / doxylamine succinate 0.417 MG/ML Oral Solution (40229134 - RxNorm)
1993
10 Standard: hepatitis B vaccine, adult dosage (40213306 - CVX)
Source: hepatitis B vaccine, adult dosage (40213306 - CVX)
1916

Customising the number of top concepts

By default, the function shows the top 10 concepts within each table and stratum. You can change this using the top argument:

tableTopConceptCounts(result = result, top = 5)
Top 5 concepts in drug_exposure table ranked by record count
Top
Data source
GiBleed
drug_exposure
1 Standard: Acetaminophen 325 MG Oral Tablet (1127433 - RxNorm)
Source: Acetaminophen 325 MG Oral Tablet (1127433 - RxNorm)
9365
2 Standard: poliovirus vaccine, inactivated (40213160 - CVX)
Source: poliovirus vaccine, inactivated (40213160 - CVX)
7977
3 Standard: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227 - CVX)
Source: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227 - CVX)
7430
4 Standard: Aspirin 81 MG Oral Tablet (19059056 - RxNorm)
Source: Aspirin 81 MG Oral Tablet (19059056 - RxNorm)
4380
5 Standard: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671 - RxNorm)
Source: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671 - RxNorm)
3851

Choosing the count type

If your summary includes both record and subject counts, you must specify which type to rank by using the countBy argument:

result <- summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "drug_exposure",
  countBy = c("record", "person")
)
tableTopConceptCounts(result = result, countBy = "person")
Top 10 concepts in drug_exposure table ranked by person count
Top
Data source
GiBleed
drug_exposure
1 Standard: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227 - CVX)
Source: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227 - CVX)
2660
2 Standard: Acetaminophen 325 MG Oral Tablet (1127433 - RxNorm)
Source: Acetaminophen 325 MG Oral Tablet (1127433 - RxNorm)
2580
3 Standard: poliovirus vaccine, inactivated (40213160 - CVX)
Source: poliovirus vaccine, inactivated (40213160 - CVX)
2140
4 Standard: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671 - RxNorm)
Source: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671 - RxNorm)
2021
5 Standard: Aspirin 81 MG Oral Tablet (19059056 - RxNorm)
Source: Aspirin 81 MG Oral Tablet (19059056 - RxNorm)
1927
6 Standard: celecoxib (1118084 - RxNorm)
Source: celecoxib 200 MG Oral Capsule [Celebrex] (44923712 - NDC)
1844
7 Standard: hepatitis A vaccine, adult dosage (40213296 - CVX)
Source: hepatitis A vaccine, adult dosage (40213296 - CVX)
1737
8 Standard: hepatitis B vaccine, adult dosage (40213306 - CVX)
Source: hepatitis B vaccine, adult dosage (40213306 - CVX)
1560
9 Standard: Acetaminophen 160 MG Oral Tablet (1127078 - RxNorm)
Source: Acetaminophen 160 MG Oral Tablet (1127078 - RxNorm)
1428
10 Standard: Acetaminophen 21.7 MG/ML / Dextromethorphan Hydrobromide 1 MG/ML / doxylamine succinate 0.417 MG/ML Oral Solution (40229134 - RxNorm)
Source: Acetaminophen 21.7 MG/ML / Dextromethorphan Hydrobromide 1 MG/ML / doxylamine succinate 0.417 MG/ML Oral Solution (40229134 - RxNorm)
1393

Disconnect from CDM

Finally, disconnect from the mock CDM.

cdmDisconnect(cdm = cdm)