Skip to contents

Introduction

In this vignette, we will explore the OmopSketch functions designed to provide information about the number of counts of concepts in tables. Specifically, there are two key functions that facilitate this, summariseConceptIdCounts() and tableConceptIdCounts(). The former one creates a summary statistics results with the number of counts per each concept in the clinical table, and the latter one displays the result in a table.

Create a mock cdm

Let’s see an example of the previous functions. To start with, we will load essential packages and create a mock cdm using mockOmopSketch().

library(duckdb)
#> Loading required package: DBI
library(OmopSketch)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union


cdm <- mockOmopSketch()

cdm
#> 
#> ── # OMOP CDM reference (duckdb) of mockOmopSketch ─────────────────────────────
#> • omop tables: cdm_source, concept, concept_ancestor, concept_relationship,
#> concept_synonym, condition_occurrence, death, device_exposure, drug_exposure,
#> drug_strength, measurement, observation, observation_period, person,
#> procedure_occurrence, visit_occurrence, vocabulary
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -

Summarise concept id counts

We now use the summariseConceptIdCounts() function from the OmopSketch package to retrieve counts for each concept id and name, as well as for each source concept id and name, across the clinical tables.

summariseConceptIdCounts(cdm, omopTableName = "drug_exposure") |>
  select(group_level, variable_name, variable_level, estimate_name, estimate_value, additional_name, additional_level) |>
  glimpse()
#> Rows: 216
#> Columns: 7
#> $ group_level      <chr> "drug_exposure", "drug_exposure", "drug_exposure", "d…
#> $ variable_name    <chr> "Midazolam", "Diclofenac Sodium 75 MG Delayed Release…
#> $ variable_level   <chr> "708298", "40162359", "40213227", "723013", "1150770"…
#> $ estimate_name    <chr> "count_records", "count_records", "count_records", "c…
#> $ estimate_value   <chr> "100", "100", "100", "100", "100", "100", "100", "100…
#> $ additional_name  <chr> "source_concept_id &&& source_concept_name", "source_…
#> $ additional_level <chr> "0 &&& No matching concept", "0 &&& No matching conce…

By default, the function returns the number of records (estimate_name == "count_records") for each concept_id. To include counts by person, you can set the countBy argument to "person" or to c("record", "person") to obtain both record and person counts.

summariseConceptIdCounts(cdm,
  omopTableName = "drug_exposure",
  countBy = c("record", "person")
) |>
  select( variable_name, estimate_name, estimate_value) 
#> # A tibble: 432 × 3
#>    variable_name                                    estimate_name estimate_value
#>    <chr>                                            <chr>         <chr>         
#>  1 pneumococcal polysaccharide vaccine, 23 valent   count_records 100           
#>  2 pneumococcal polysaccharide vaccine, 23 valent   count_subjec… 59            
#>  3 Alendronate                                      count_records 100           
#>  4 Alendronate                                      count_subjec… 61            
#>  5 poliovirus vaccine, inactivated                  count_records 100           
#>  6 poliovirus vaccine, inactivated                  count_subjec… 65            
#>  7 fluticasone                                      count_records 100           
#>  8 fluticasone                                      count_subjec… 63            
#>  9 diphtheria, tetanus toxoids and acellular pertu… count_records 100           
#> 10 diphtheria, tetanus toxoids and acellular pertu… count_subjec… 58            
#> # ℹ 422 more rows

Further stratification can be applied using the interval, sex, and ageGroup arguments. The interval argument supports “overall” (no time stratification), “years”, “quarters”, or “months”.

summariseConceptIdCounts(cdm,
  omopTableName = "condition_occurrence",
  countBy = "person",
  interval = "years",
  sex = TRUE,
  ageGroup = list("<=50" = c(0, 50), ">50" = c(51, Inf))
) |>
  select(group_level, strata_level, variable_name, estimate_name, additional_level) |>
  glimpse()
#> Rows: 17,266
#> Columns: 5
#> $ group_level      <chr> "condition_occurrence", "condition_occurrence", "cond…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "Fracture of vertebral column without spinal cord inj…
#> $ estimate_name    <chr> "count_subjects", "count_subjects", "count_subjects",
#> $ additional_level <chr> "0 &&& No matching concept", "0 &&& No matching conce…

We can also filter the clinical table to a specific time window by setting the dateRange argument.

summarisedResult <- summariseConceptIdCounts(cdm,
                                             omopTableName = "condition_occurrence",
                                             dateRange = as.Date(c("1990-01-01", "2010-01-01"))) 
summarisedResult |>
  omopgenerics::settings()|>
  glimpse()
#> Rows: 1
#> Columns: 10
#> $ result_id          <int> 1
#> $ result_type        <chr> "summarise_concept_id_counts"
#> $ package_name       <chr> "OmopSketch"
#> $ package_version    <chr> "0.5.1"
#> $ group              <chr> "omop_table"
#> $ strata             <chr> ""
#> $ additional         <chr> "source_concept_id &&& source_concept_name"
#> $ min_cell_count     <chr> "0"
#> $ study_period_end   <chr> "2010-01-01"
#> $ study_period_start <chr> "1990-01-01"

Finally, you can summarise concept counts on a subset of records by specifying the sample argument.

summariseConceptIdCounts(cdm,
                         omopTableName = "condition_occurrence",
                         sample = 50) |>
  select(group_level, variable_name, estimate_name) |>
  glimpse()
#> Rows: 39
#> Columns: 3
#> $ group_level   <chr> "condition_occurrence", "condition_occurrence", "conditi…
#> $ variable_name <chr> "Fracture of rib", "Fracture of clavicle", "Polyp of col…
#> $ estimate_name <chr> "count_records", "count_records", "count_records", "coun…

Display the results

Finally, concept counts can be visualised using tableConceptIdCounts(). By default, it generates an interactive reactable table, but DT datatables are also supported.

result <- summariseConceptIdCounts(cdm,
  omopTableName = "measurement",
  countBy = "record"
) 
tableConceptIdCounts(result, type = "reactable")
tableConceptIdCounts(result, type = "datatable")

The display argument in tableConceptIdCounts() controls which concept counts are shown. Available options include display = "overall". It is the default option and it shows both standard and source concept counts.

tableConceptIdCounts(result, display = "overall")

If display = "standard" the table shows only standard concept_id and concept_name counts.

tableConceptIdCounts(result, display = "standard")

If display = "source" the table shows only source concept_id and concept_name counts.

tableConceptIdCounts(result, display = "source")
#> Warning: Values from `estimate_value` are not uniquely identified; output will contain
#> list-cols.
#>  Use `values_fn = list` to suppress this warning.
#>  Use `values_fn = {summary_fun}` to summarise duplicates.
#>  Use the following dplyr code to identify duplicates.
#>   {data} |>
#>   dplyr::summarise(n = dplyr::n(), .by = c(cdm_name, group_level,
#>   source_concept_name, source_concept_id, result_id, group_name, estimate_type,
#>   estimate_name)) |>
#>   dplyr::filter(n > 1L)

If display = "missing source" the table shows only counts for concept ids that are missing a corresponding source concept id.

tableConceptIdCounts(result, display = "missing source")

If display = "missing standard" the table shows only counts for source concept ids that are missing a mapped standard concept id.

tableConceptIdCounts(result, display = "missing standard")
#> Warning: `result` does not contain any `summarise_concept_id_counts`
#> data.

Display the most frequent concepts

You can use the tableTopConceptCounts() function to display the most frequent concepts in a OMOP CDM table in formatted table. By default, the function returns a gt table, but you can also choose from other output formats, including flextable, datatable, and reactable.

result <- summariseConceptIdCounts(cdm,
  omopTableName = "drug_exposure",
  countBy = "record"
) 
tableTopConceptCounts(result, type = "gt")
Top
Cdm name
mockOmopSketch
drug_exposure
1 Standard: pneumococcal polysaccharide vaccine, 23 valent (40213201)
Source: No matching concept (0)
100
2 Standard: Alendronate (1557272)
Source: No matching concept (0)
100
3 Standard: poliovirus vaccine, inactivated (40213160)
Source: No matching concept (0)
100
4 Standard: fluticasone (1149380)
Source: No matching concept (0)
100
5 Standard: diphtheria, tetanus toxoids and acellular pertussis vaccine (40213281)
Source: No matching concept (0)
100
6 Standard: Diclofenac (1124300)
Source: No matching concept (0)
100
7 Standard: hepatitis A vaccine, pediatric/adolescent dosage, 2 dose schedule (40213299)
Source: No matching concept (0)
100
8 Standard: fexofenadine (1153428)
Source: No matching concept (0)
100
9 Standard: Haemophilus influenzae type b vaccine, PRP-OMP conjugate (40213314)
Source: No matching concept (0)
100
10 Standard: Propofol (753626)
Source: No matching concept (0)
100

Customising the number of top concepts

By default, the function shows the top 10 concepts. You can change this using the top argument:

tableTopConceptCounts(result, top = 5)
Top
Cdm name
mockOmopSketch
drug_exposure
1 Standard: pneumococcal polysaccharide vaccine, 23 valent (40213201)
Source: No matching concept (0)
100
2 Standard: Alendronate (1557272)
Source: No matching concept (0)
100
3 Standard: poliovirus vaccine, inactivated (40213160)
Source: No matching concept (0)
100
4 Standard: fluticasone (1149380)
Source: No matching concept (0)
100
5 Standard: diphtheria, tetanus toxoids and acellular pertussis vaccine (40213281)
Source: No matching concept (0)
100

Choosing the count type

If your summary includes both record and person counts, you must specify which type to display using the countBy argument:

result <- summariseConceptIdCounts(cdm,
  omopTableName = "drug_exposure",
  countBy = c("record", "person")
) 
tableTopConceptCounts(result, countBy = "person")
Top
Cdm name
mockOmopSketch
drug_exposure
1 Standard: {28 (Norethindrone 0.35 MG Oral Tablet) } Pack [Camila 28 Day] (19127922)
Source: No matching concept (0)
73
2 Standard: 120 ACTUAT Fluticasone propionate 0.044 MG/ACTUAT Metered Dose Inhaler (40169216)
Source: No matching concept (0)
71
3 Standard: 1 ML medroxyprogesterone acetate 150 MG/ML Injection (40224805)
Source: No matching concept (0)
71
4 Standard: Chlorpheniramine Maleate 4 MG Oral Tablet (43012036)
Source: No matching concept (0)
71
5 Standard: Sodium Chloride (967823)
Source: No matching concept (0)
70
6 Standard: Tacrine (836654)
Source: No matching concept (0)
70
7 Standard: norgestimate (1515774)
Source: No matching concept (0)
70
8 Standard: Piperacillin (1746114)
Source: No matching concept (0)
70
9 Standard: Terfenadine (1150836)
Source: No matching concept (0)
69
10 Standard: Warfarin Sodium 5 MG Oral Tablet (40163554)
Source: No matching concept (0)
69