Summarise concept id counts • OmopSketch

Introduction

In this vignette, we will explore the OmopSketch functions designed to summarise concept use in clinical OMOP tables. Specifically, there are two key functions:

summariseConceptIdCounts(): counts standard concept IDs and their associated source concept IDs in one or more clinical tables.
tableConceptIdCounts(): displays concept count results in a formatted table.

Create a mock cdm

Let’s see an example of these functions. To start with, we will load essential packages and create a mock CDM using the R package omock.

library(OmopSketch)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(omock)

cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb")
#> ℹ Loading bundled GiBleed tables from package data.
#> ℹ Adding drug_strength table.
#> ℹ Creating local <cdm_reference> object.
#> ℹ Inserting <cdm_reference> into duckdb.

cdm
#> 
#> ── # OMOP CDM reference (duckdb) of GiBleed ────────────────────────────────────
#> • omop tables: care_site, cdm_source, concept, concept_ancestor, concept_class,
#> concept_relationship, concept_synonym, condition_era, condition_occurrence,
#> cost, death, device_exposure, domain, dose_era, drug_era, drug_exposure,
#> drug_strength, fact_relationship, location, measurement, metadata, note,
#> note_nlp, observation, observation_period, payer_plan_period, person,
#> procedure_occurrence, provider, relationship, source_to_concept_map, specimen,
#> visit_detail, visit_occurrence, vocabulary
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -

Summarise concept level counts

We now use the summariseConceptIdCounts() function from the OmopSketch package to retrieve counts for each standard concept ID and name, together with the associated source concept ID, source concept name, standard vocabulary, and source vocabulary.

summariseConceptIdCounts(cdm = cdm, omopTableName = "drug_exposure") |>
  select(group_level, variable_name, variable_level, estimate_name, estimate_value, additional_name, additional_level) |>
  glimpse()
#> Rows: 113
#> Columns: 7
#> $ group_level      <chr> "drug_exposure", "drug_exposure", "drug_exposure", "d…
#> $ variable_name    <chr> "Aspirin 81 MG Oral Tablet", "hepatitis A vaccine, ad…
#> $ variable_level   <chr> "19059056", "40213296", "40213227", "40173590", "1912…
#> $ estimate_name    <chr> "count_records", "count_records", "count_records", "c…
#> $ estimate_value   <chr> "4380", "3211", "7430", "129", "1056", "1719", "306",…
#> $ additional_name  <chr> "source_concept_id &&& source_concept_name &&& standa…
#> $ additional_level <chr> "19059056 &&& Aspirin 81 MG Oral Tablet &&& RxNorm &&…

By default, the function returns the number of records (estimate_name == "count_records") for each standard concept ID. To count distinct subjects instead, set countBy = "person". To return both record and subject counts, use countBy = c("record", "person").

summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "drug_exposure",
  countBy = c("record", "person")
) |>
  select(variable_name, estimate_name, estimate_value)
#> # A tibble: 226 × 3
#>    variable_name                                    estimate_name estimate_value
#>    <chr>                                            <chr>         <chr>         
#>  1 Aspirin 81 MG Oral Tablet                        count_records 4380          
#>  2 Aspirin 81 MG Oral Tablet                        count_subjec… 1927          
#>  3 hepatitis A vaccine, adult dosage                count_records 3211          
#>  4 hepatitis A vaccine, adult dosage                count_subjec… 1737          
#>  5 tetanus and diphtheria toxoids, adsorbed, prese… count_records 7430          
#>  6 tetanus and diphtheria toxoids, adsorbed, prese… count_subjec… 2660          
#>  7 Alendronic acid 10 MG Oral Tablet                count_records 129           
#>  8 Alendronic acid 10 MG Oral Tablet                count_subjec… 129           
#>  9 {7 (Inert Ingredients 1 MG Oral Tablet) / 21 (M… count_records 1056          
#> 10 {7 (Inert Ingredients 1 MG Oral Tablet) / 21 (M… count_subjec… 605           
#> # ℹ 216 more rows

Further stratification can be applied using the interval, sex, and ageGroup arguments. The interval argument supports "overall" (no time stratification), "years", "quarters", or "months". Age groups and time intervals are assigned using the clinical record start date.

summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "condition_occurrence",
  countBy = "person",
  interval = "years",
  sex = TRUE,
  ageGroup = list("<=50" = c(0, 50), ">50" = c(51, Inf))
) |>
  select(group_level, strata_level, variable_name, estimate_name, additional_level) |>
  glimpse()
#> Rows: 28,358
#> Columns: 5
#> $ group_level      <chr> "condition_occurrence", "condition_occurrence", "cond…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "Otitis media", "Acute viral pharyngitis", "Acute bac…
#> $ estimate_name    <chr> "count_subjects", "count_subjects", "count_subjects",…
#> $ additional_level <chr> "372328 &&& Otitis media &&& SNOMED &&& SNOMED", "411…

When inObservation = TRUE, results are stratified by whether each record occurred within the subject’s observation period.

summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "condition_occurrence",
  countBy = "record",
  inObservation = TRUE
) |>
  select(variable_name, strata_name, strata_level, estimate_name, estimate_value) |>
  glimpse()
#> Rows: 167
#> Columns: 5
#> $ variable_name  <chr> "Streptococcal sore throat", "Chronic sinusitis", "Lace…
#> $ strata_name    <chr> "overall", "overall", "overall", "overall", "overall", …
#> $ strata_level   <chr> "overall", "overall", "overall", "overall", "overall", …
#> $ estimate_name  <chr> "count_records", "count_records", "count_records", "cou…
#> $ estimate_value <chr> "2656", "825", "500", "80", "229", "492", "31", "1001",…

We can also filter the clinical table to a specific time window by setting the dateRange argument. Only records with a start date inside the date range are included.

summarisedResult <- summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "condition_occurrence",
  dateRange = as.Date(c("1990-01-01", "2010-01-01"))
)
summarisedResult |>
  settings() |>
  glimpse()
#> Rows: 1
#> Columns: 10
#> $ result_id          <int> 1
#> $ result_type        <chr> "summarise_concept_id_counts"
#> $ package_name       <chr> "OmopSketch"
#> $ package_version    <chr> "1.1.0.900"
#> $ group              <chr> "omop_table"
#> $ strata             <chr> ""
#> $ additional         <chr> "source_concept_id &&& source_concept_name &&& stan…
#> $ min_cell_count     <chr> "0"
#> $ study_period_end   <chr> "2010-01-01"
#> $ study_period_start <chr> "1990-01-01"

You can restrict concept counts to a subset of subjects with the sample argument. Provide an integer to randomly select that many person_ids from the person table, or provide the name of a cohort table in the CDM to restrict counts to its subject_ids.

summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "condition_occurrence",
  sample = 50
) |>
  select(group_level, variable_name, estimate_name) |>
  glimpse()
#> Rows: 60
#> Columns: 3
#> $ group_level   <chr> "condition_occurrence", "condition_occurrence", "conditi…
#> $ variable_name <chr> "Viral sinusitis", "Osteoarthritis", "Polyp of colon", "…
#> $ estimate_name <chr> "count_records", "count_records", "count_records", "coun…

Display the results

Concept counts can be displayed using tableConceptIdCounts(). By default, it generates an interactive reactable table, but DT datatables are also supported.

result <- summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "measurement",
  countBy = "record"
)
tableConceptIdCounts(result = result, type = "reactable")

tableConceptIdCounts(result = result, type = "datatable")

The display argument in tableConceptIdCounts() controls which concept columns are shown. The default is display = "overall", which shows both standard and source concept information.

tableConceptIdCounts(result = result, display = "overall")

If display = "standard", the table shows only standard concept ID, standard concept name, and standard vocabulary.

tableConceptIdCounts(result = result, display = "standard")

If display = "source", the table shows only source concept ID, source concept name, and source vocabulary.

tableConceptIdCounts(result = result, display = "source")

If display = "missing source", the table shows standard concepts for records that are missing a corresponding source concept ID.

tableConceptIdCounts(result = result, display = "missing source")
#> Warning in max(dplyr::pull(dplyr::tally(dplyr::group_by(result,
#> dplyr::across(-c("estimate_value")))), : no non-missing arguments to max;
#> returning -Inf

If display = "missing standard", the table shows source concepts for records that are missing a mapped standard concept ID.

tableConceptIdCounts(result = result, display = "missing standard")
#> Warning in max(dplyr::pull(dplyr::tally(dplyr::group_by(result,
#> dplyr::across(-c("estimate_value")))), : no non-missing arguments to max;
#> returning -Inf

Display the most frequent concepts

You can use the tableTopConceptCounts() function to display the most frequent concepts in an OMOP CDM table. By default, the function returns a gt table, but other formats supported by visOmopResults::tableType() can also be used.

result <- summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "drug_exposure",
  countBy = "record"
)
tableTopConceptCounts(result = result, type = "gt")

Top 10 concepts in drug_exposure table ranked by record count
Top	Data source
Top	GiBleed
drug_exposure
1	Standard: Acetaminophen 325 MG Oral Tablet (1127433 - RxNorm) Source: Acetaminophen 325 MG Oral Tablet (1127433 - RxNorm) 9365
2	Standard: poliovirus vaccine, inactivated (40213160 - CVX) Source: poliovirus vaccine, inactivated (40213160 - CVX) 7977
3	Standard: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227 - CVX) Source: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227 - CVX) 7430
4	Standard: Aspirin 81 MG Oral Tablet (19059056 - RxNorm) Source: Aspirin 81 MG Oral Tablet (19059056 - RxNorm) 4380
5	Standard: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671 - RxNorm) Source: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671 - RxNorm) 3851
6	Standard: hepatitis A vaccine, adult dosage (40213296 - CVX) Source: hepatitis A vaccine, adult dosage (40213296 - CVX) 3211
7	Standard: Acetaminophen 160 MG Oral Tablet (1127078 - RxNorm) Source: Acetaminophen 160 MG Oral Tablet (1127078 - RxNorm) 2158
8	Standard: zoster vaccine, live (40213260 - CVX) Source: zoster vaccine, live (40213260 - CVX) 2125
9	Standard: Acetaminophen 21.7 MG/ML / Dextromethorphan Hydrobromide 1 MG/ML / doxylamine succinate 0.417 MG/ML Oral Solution (40229134 - RxNorm) Source: Acetaminophen 21.7 MG/ML / Dextromethorphan Hydrobromide 1 MG/ML / doxylamine succinate 0.417 MG/ML Oral Solution (40229134 - RxNorm) 1993
10	Standard: hepatitis B vaccine, adult dosage (40213306 - CVX) Source: hepatitis B vaccine, adult dosage (40213306 - CVX) 1916

Customising the number of top concepts

By default, the function shows the top 10 concepts within each table and stratum. You can change this using the top argument:

tableTopConceptCounts(result = result, top = 5)

Top 5 concepts in drug_exposure table ranked by record count
Top	Data source
Top	GiBleed
drug_exposure
1	Standard: Acetaminophen 325 MG Oral Tablet (1127433 - RxNorm) Source: Acetaminophen 325 MG Oral Tablet (1127433 - RxNorm) 9365
2	Standard: poliovirus vaccine, inactivated (40213160 - CVX) Source: poliovirus vaccine, inactivated (40213160 - CVX) 7977
3	Standard: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227 - CVX) Source: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227 - CVX) 7430
4	Standard: Aspirin 81 MG Oral Tablet (19059056 - RxNorm) Source: Aspirin 81 MG Oral Tablet (19059056 - RxNorm) 4380
5	Standard: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671 - RxNorm) Source: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671 - RxNorm) 3851

Choosing the count type

If your summary includes both record and subject counts, you must specify which type to rank by using the countBy argument:

result <- summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "drug_exposure",
  countBy = c("record", "person")
)
tableTopConceptCounts(result = result, countBy = "person")

Top 10 concepts in drug_exposure table ranked by person count
Top	Data source
Top	GiBleed
drug_exposure
1	Standard: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227 - CVX) Source: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227 - CVX) 2660
2	Standard: Acetaminophen 325 MG Oral Tablet (1127433 - RxNorm) Source: Acetaminophen 325 MG Oral Tablet (1127433 - RxNorm) 2580
3	Standard: poliovirus vaccine, inactivated (40213160 - CVX) Source: poliovirus vaccine, inactivated (40213160 - CVX) 2140
4	Standard: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671 - RxNorm) Source: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671 - RxNorm) 2021
5	Standard: Aspirin 81 MG Oral Tablet (19059056 - RxNorm) Source: Aspirin 81 MG Oral Tablet (19059056 - RxNorm) 1927
6	Standard: celecoxib (1118084 - RxNorm) Source: celecoxib 200 MG Oral Capsule [Celebrex] (44923712 - NDC) 1844
7	Standard: hepatitis A vaccine, adult dosage (40213296 - CVX) Source: hepatitis A vaccine, adult dosage (40213296 - CVX) 1737
8	Standard: hepatitis B vaccine, adult dosage (40213306 - CVX) Source: hepatitis B vaccine, adult dosage (40213306 - CVX) 1560
9	Standard: Acetaminophen 160 MG Oral Tablet (1127078 - RxNorm) Source: Acetaminophen 160 MG Oral Tablet (1127078 - RxNorm) 1428
10	Standard: Acetaminophen 21.7 MG/ML / Dextromethorphan Hydrobromide 1 MG/ML / doxylamine succinate 0.417 MG/ML Oral Solution (40229134 - RxNorm) Source: Acetaminophen 21.7 MG/ML / Dextromethorphan Hydrobromide 1 MG/ML / doxylamine succinate 0.417 MG/ML Oral Solution (40229134 - RxNorm) 1393

Disconnect from CDM

Finally, disconnect from the mock CDM.

cdmDisconnect(cdm = cdm)