
Summarise concept id counts
Source:vignettes/articles/summarise_concept_id_counts.Rmd
summarise_concept_id_counts.RmdIntroduction
In this vignette, we will explore the OmopSketch functions designed to summarise concept use in clinical OMOP tables. Specifically, there are two key functions:
summariseConceptIdCounts(): counts standard concept IDs and their associated source concept IDs in one or more clinical tables.tableConceptIdCounts(): displays concept count results in a formatted table.
Create a mock cdm
Let’s see an example of these functions. To start with, we will load essential packages and create a mock CDM using the R package omock.
library(OmopSketch)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(omock)
cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb")
#> ℹ Loading bundled GiBleed tables from package data.
#> ℹ Adding drug_strength table.
#> ℹ Creating local <cdm_reference> object.
#> ℹ Inserting <cdm_reference> into duckdb.
cdm
#>
#> ── # OMOP CDM reference (duckdb) of GiBleed ────────────────────────────────────
#> • omop tables: care_site, cdm_source, concept, concept_ancestor, concept_class,
#> concept_relationship, concept_synonym, condition_era, condition_occurrence,
#> cost, death, device_exposure, domain, dose_era, drug_era, drug_exposure,
#> drug_strength, fact_relationship, location, measurement, metadata, note,
#> note_nlp, observation, observation_period, payer_plan_period, person,
#> procedure_occurrence, provider, relationship, source_to_concept_map, specimen,
#> visit_detail, visit_occurrence, vocabulary
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -Summarise concept level counts
We now use the summariseConceptIdCounts() function from
the OmopSketch package to retrieve counts for each
standard concept ID and name, together with the associated source
concept ID, source concept name, standard vocabulary, and source
vocabulary.
summariseConceptIdCounts(cdm = cdm, omopTableName = "drug_exposure") |>
select(group_level, variable_name, variable_level, estimate_name, estimate_value, additional_name, additional_level) |>
glimpse()
#> Rows: 113
#> Columns: 7
#> $ group_level <chr> "drug_exposure", "drug_exposure", "drug_exposure", "d…
#> $ variable_name <chr> "Aspirin 81 MG Oral Tablet", "hepatitis A vaccine, ad…
#> $ variable_level <chr> "19059056", "40213296", "40213227", "40173590", "1912…
#> $ estimate_name <chr> "count_records", "count_records", "count_records", "c…
#> $ estimate_value <chr> "4380", "3211", "7430", "129", "1056", "1719", "306",…
#> $ additional_name <chr> "source_concept_id &&& source_concept_name &&& standa…
#> $ additional_level <chr> "19059056 &&& Aspirin 81 MG Oral Tablet &&& RxNorm &&…By default, the function returns the number of records
(estimate_name == "count_records") for each standard
concept ID. To count distinct subjects instead, set
countBy = "person". To return both record and subject
counts, use countBy = c("record", "person").
summariseConceptIdCounts(
cdm = cdm,
omopTableName = "drug_exposure",
countBy = c("record", "person")
) |>
select(variable_name, estimate_name, estimate_value)
#> # A tibble: 226 × 3
#> variable_name estimate_name estimate_value
#> <chr> <chr> <chr>
#> 1 poliovirus vaccine, inactivated count_records 7977
#> 2 poliovirus vaccine, inactivated count_subjects 2140
#> 3 celecoxib count_records 1844
#> 4 celecoxib count_subjects 1844
#> 5 Penicillin G 375 MG/ML Injectable Solution count_records 1142
#> 6 Penicillin G 375 MG/ML Injectable Solution count_subjects 831
#> 7 Amoxicillin 250 MG Oral Capsule count_records 205
#> 8 Amoxicillin 250 MG Oral Capsule count_subjects 157
#> 9 Diclofenac count_records 850
#> 10 Diclofenac count_subjects 850
#> # ℹ 216 more rowsFurther stratification can be applied using the
interval, sex, and ageGroup
arguments. The interval argument supports
"overall" (no time stratification), "years",
"quarters", or "months". Age groups and time
intervals are assigned using the clinical record start date.
summariseConceptIdCounts(
cdm = cdm,
omopTableName = "condition_occurrence",
countBy = "person",
interval = "years",
sex = TRUE,
ageGroup = list("<=50" = c(0, 50), ">50" = c(51, Inf))
) |>
select(group_level, strata_level, variable_name, estimate_name, additional_level) |>
glimpse()
#> Rows: 28,358
#> Columns: 5
#> $ group_level <chr> "condition_occurrence", "condition_occurrence", "cond…
#> $ strata_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name <chr> "Gastrointestinal hemorrhage", "Acute viral pharyngit…
#> $ estimate_name <chr> "count_subjects", "count_subjects", "count_subjects",…
#> $ additional_level <chr> "35208414 &&& Gastrointestinal hemorrhage, unspecifie…When inObservation = TRUE, results are stratified by
whether each record occurred within the subject’s observation
period.
summariseConceptIdCounts(
cdm = cdm,
omopTableName = "condition_occurrence",
countBy = "record",
inObservation = TRUE
) |>
select(variable_name, strata_name, strata_level, estimate_name, estimate_value) |>
glimpse()
#> Rows: 167
#> Columns: 5
#> $ variable_name <chr> "Sprain of wrist", "Osteoarthritis", "Escherichia coli …
#> $ strata_name <chr> "overall", "overall", "overall", "overall", "overall", …
#> $ strata_level <chr> "overall", "overall", "overall", "overall", "overall", …
#> $ estimate_name <chr> "count_records", "count_records", "count_records", "cou…
#> $ estimate_value <chr> "770", "2694", "482", "157", "464", "102", "322", "46",…We can also filter the clinical table to a specific time window by
setting the dateRange argument. Only records with a start
date inside the date range are included.
summarisedResult <- summariseConceptIdCounts(
cdm = cdm,
omopTableName = "condition_occurrence",
dateRange = as.Date(c("1990-01-01", "2010-01-01"))
)
summarisedResult |>
settings() |>
glimpse()
#> Rows: 1
#> Columns: 10
#> $ result_id <int> 1
#> $ result_type <chr> "summarise_concept_id_counts"
#> $ package_name <chr> "OmopSketch"
#> $ package_version <chr> "1.1.0"
#> $ group <chr> "omop_table"
#> $ strata <chr> ""
#> $ additional <chr> "source_concept_id &&& source_concept_name &&& stan…
#> $ min_cell_count <chr> "0"
#> $ study_period_end <chr> "2010-01-01"
#> $ study_period_start <chr> "1990-01-01"You can restrict concept counts to a subset of subjects with the
sample argument. Provide an integer to randomly select that
many person_ids from the person table, or
provide the name of a cohort table in the CDM to restrict counts to its
subject_ids.
summariseConceptIdCounts(
cdm = cdm,
omopTableName = "condition_occurrence",
sample = 50
) |>
select(group_level, variable_name, estimate_name) |>
glimpse()
#> Rows: 62
#> Columns: 3
#> $ group_level <chr> "condition_occurrence", "condition_occurrence", "conditi…
#> $ variable_name <chr> "Viral sinusitis", "Osteoarthritis", "Fracture subluxati…
#> $ estimate_name <chr> "count_records", "count_records", "count_records", "coun…Display the results
Concept counts can be displayed using
tableConceptIdCounts(). By default, it generates an
interactive reactable
table, but DT datatables are
also supported.
result <- summariseConceptIdCounts(
cdm = cdm,
omopTableName = "measurement",
countBy = "record"
)
tableConceptIdCounts(result = result, type = "reactable")
tableConceptIdCounts(result = result, type = "datatable")The display argument in
tableConceptIdCounts() controls which concept columns are
shown. The default is display = "overall", which shows both
standard and source concept information.
tableConceptIdCounts(result = result, display = "overall")If display = "standard", the table shows only standard
concept ID, standard concept name, and standard vocabulary.
tableConceptIdCounts(result = result, display = "standard")If display = "source", the table shows only source
concept ID, source concept name, and source vocabulary.
tableConceptIdCounts(result = result, display = "source")If display = "missing source", the table shows standard
concepts for records that are missing a corresponding source concept
ID.
tableConceptIdCounts(result = result, display = "missing source")
#> Warning in max(dplyr::pull(dplyr::tally(dplyr::group_by(result,
#> dplyr::across(-c("estimate_value")))), : no non-missing arguments to max;
#> returning -InfIf display = "missing standard", the table shows source
concepts for records that are missing a mapped standard concept ID.
tableConceptIdCounts(result = result, display = "missing standard")
#> Warning in max(dplyr::pull(dplyr::tally(dplyr::group_by(result,
#> dplyr::across(-c("estimate_value")))), : no non-missing arguments to max;
#> returning -InfDisplay the most frequent concepts
You can use the tableTopConceptCounts() function to
display the most frequent concepts in an OMOP CDM table. By default, the
function returns a gt table, but
other formats supported by visOmopResults::tableType() can
also be used.
result <- summariseConceptIdCounts(
cdm = cdm,
omopTableName = "drug_exposure",
countBy = "record"
)
tableTopConceptCounts(result = result, type = "gt")| Top |
Data source
|
|---|---|
| GiBleed | |
| drug_exposure | |
| 1 | Standard: Acetaminophen 325 MG Oral Tablet (1127433 - RxNorm) Source: Acetaminophen 325 MG Oral Tablet (1127433 - RxNorm) 9365 |
| 2 | Standard: poliovirus vaccine, inactivated (40213160 - CVX) Source: poliovirus vaccine, inactivated (40213160 - CVX) 7977 |
| 3 | Standard: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227 - CVX) Source: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227 - CVX) 7430 |
| 4 | Standard: Aspirin 81 MG Oral Tablet (19059056 - RxNorm) Source: Aspirin 81 MG Oral Tablet (19059056 - RxNorm) 4380 |
| 5 | Standard: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671 - RxNorm) Source: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671 - RxNorm) 3851 |
| 6 | Standard: hepatitis A vaccine, adult dosage (40213296 - CVX) Source: hepatitis A vaccine, adult dosage (40213296 - CVX) 3211 |
| 7 | Standard: Acetaminophen 160 MG Oral Tablet (1127078 - RxNorm) Source: Acetaminophen 160 MG Oral Tablet (1127078 - RxNorm) 2158 |
| 8 | Standard: zoster vaccine, live (40213260 - CVX) Source: zoster vaccine, live (40213260 - CVX) 2125 |
| 9 | Standard: Acetaminophen 21.7 MG/ML / Dextromethorphan Hydrobromide 1 MG/ML / doxylamine succinate 0.417 MG/ML Oral Solution (40229134 - RxNorm) Source: Acetaminophen 21.7 MG/ML / Dextromethorphan Hydrobromide 1 MG/ML / doxylamine succinate 0.417 MG/ML Oral Solution (40229134 - RxNorm) 1993 |
| 10 | Standard: hepatitis B vaccine, adult dosage (40213306 - CVX) Source: hepatitis B vaccine, adult dosage (40213306 - CVX) 1916 |
Customising the number of top concepts
By default, the function shows the top 10 concepts within each table
and stratum. You can change this using the top
argument:
tableTopConceptCounts(result = result, top = 5)| Top |
Data source
|
|---|---|
| GiBleed | |
| drug_exposure | |
| 1 | Standard: Acetaminophen 325 MG Oral Tablet (1127433 - RxNorm) Source: Acetaminophen 325 MG Oral Tablet (1127433 - RxNorm) 9365 |
| 2 | Standard: poliovirus vaccine, inactivated (40213160 - CVX) Source: poliovirus vaccine, inactivated (40213160 - CVX) 7977 |
| 3 | Standard: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227 - CVX) Source: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227 - CVX) 7430 |
| 4 | Standard: Aspirin 81 MG Oral Tablet (19059056 - RxNorm) Source: Aspirin 81 MG Oral Tablet (19059056 - RxNorm) 4380 |
| 5 | Standard: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671 - RxNorm) Source: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671 - RxNorm) 3851 |
Choosing the count type
If your summary includes both record and subject counts, you must
specify which type to rank by using the countBy
argument:
result <- summariseConceptIdCounts(
cdm = cdm,
omopTableName = "drug_exposure",
countBy = c("record", "person")
)
tableTopConceptCounts(result = result, countBy = "person")| Top |
Data source
|
|---|---|
| GiBleed | |
| drug_exposure | |
| 1 | Standard: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227 - CVX) Source: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227 - CVX) 2660 |
| 2 | Standard: Acetaminophen 325 MG Oral Tablet (1127433 - RxNorm) Source: Acetaminophen 325 MG Oral Tablet (1127433 - RxNorm) 2580 |
| 3 | Standard: poliovirus vaccine, inactivated (40213160 - CVX) Source: poliovirus vaccine, inactivated (40213160 - CVX) 2140 |
| 4 | Standard: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671 - RxNorm) Source: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671 - RxNorm) 2021 |
| 5 | Standard: Aspirin 81 MG Oral Tablet (19059056 - RxNorm) Source: Aspirin 81 MG Oral Tablet (19059056 - RxNorm) 1927 |
| 6 | Standard: celecoxib (1118084 - RxNorm) Source: celecoxib 200 MG Oral Capsule [Celebrex] (44923712 - NDC) 1844 |
| 7 | Standard: hepatitis A vaccine, adult dosage (40213296 - CVX) Source: hepatitis A vaccine, adult dosage (40213296 - CVX) 1737 |
| 8 | Standard: hepatitis B vaccine, adult dosage (40213306 - CVX) Source: hepatitis B vaccine, adult dosage (40213306 - CVX) 1560 |
| 9 | Standard: Acetaminophen 160 MG Oral Tablet (1127078 - RxNorm) Source: Acetaminophen 160 MG Oral Tablet (1127078 - RxNorm) 1428 |
| 10 | Standard: Acetaminophen 21.7 MG/ML / Dextromethorphan Hydrobromide 1 MG/ML / doxylamine succinate 0.417 MG/ML Oral Solution (40229134 - RxNorm) Source: Acetaminophen 21.7 MG/ML / Dextromethorphan Hydrobromide 1 MG/ML / doxylamine succinate 0.417 MG/ML Oral Solution (40229134 - RxNorm) 1393 |