
Summarise concept id counts
Source:vignettes/summarise_concept_id_counts.Rmd
summarise_concept_id_counts.RmdIntroduction
In this vignette, we will explore the OmopSketch functions
designed to provide information about the number of counts of concepts
in tables. Specifically, there are two key functions that facilitate
this, summariseConceptIdCounts() and
tableConceptIdCounts(). The former one creates a summary
statistics results with the number of counts per each concept in the
clinical table, and the latter one displays the result in a table.
Create a mock cdm
Let’s see an example of the previous functions. To start with, we
will load essential packages and create a mock cdm using
mockOmopSketch().
library(duckdb)
#> Loading required package: DBI
library(OmopSketch)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
cdm <- mockOmopSketch()
cdm
#>
#> ── # OMOP CDM reference (duckdb) of mockOmopSketch ─────────────────────────────
#> • omop tables: cdm_source, concept, concept_ancestor, concept_relationship,
#> concept_synonym, condition_occurrence, death, device_exposure, drug_exposure,
#> drug_strength, measurement, observation, observation_period, person,
#> procedure_occurrence, visit_occurrence, vocabulary
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -Summarise concept id counts
We now use the summariseConceptIdCounts() function from
the OmopSketch package to retrieve counts for each concept id and name,
as well as for each source concept id and name, across the clinical
tables.
summariseConceptIdCounts(cdm, omopTableName = "drug_exposure") |>
select(group_level, variable_name, variable_level, estimate_name, estimate_value, additional_name, additional_level) |>
glimpse()
#> Rows: 216
#> Columns: 7
#> $ group_level <chr> "drug_exposure", "drug_exposure", "drug_exposure", "d…
#> $ variable_name <chr> "pneumococcal polysaccharide vaccine, 23 valent", "Al…
#> $ variable_level <chr> "40213201", "1557272", "40213160", "1149380", "402132…
#> $ estimate_name <chr> "count_records", "count_records", "count_records", "c…
#> $ estimate_value <chr> "100", "100", "100", "100", "100", "100", "100", "100…
#> $ additional_name <chr> "source_concept_id &&& source_concept_name", "source_…
#> $ additional_level <chr> "0 &&& No matching concept", "0 &&& No matching conce…By default, the function returns the number of records
(estimate_name == "count_records") for each concept_id. To
include counts by person, you can set the countBy argument
to "person" or to c("record", "person") to
obtain both record and person counts.
summariseConceptIdCounts(cdm,
omopTableName = "drug_exposure",
countBy = c("record", "person")
) |>
select(variable_name, estimate_name, estimate_value)
#> # A tibble: 432 × 3
#> variable_name estimate_name estimate_value
#> <chr> <chr> <chr>
#> 1 celecoxib 200 MG Oral Capsule [Celebrex] count_records 100
#> 2 celecoxib 200 MG Oral Capsule [Celebrex] count_subjec… 62
#> 3 Ampicillin 100 MG/ML Injectable Solution count_records 100
#> 4 Ampicillin 100 MG/ML Injectable Solution count_subjec… 62
#> 5 rotavirus, live, monovalent vaccine count_records 100
#> 6 rotavirus, live, monovalent vaccine count_subjec… 63
#> 7 Memantine count_records 100
#> 8 Memantine count_subjec… 61
#> 9 tetanus toxoid, reduced diphtheria toxoid, and … count_records 100
#> 10 tetanus toxoid, reduced diphtheria toxoid, and … count_subjec… 63
#> # ℹ 422 more rowsFurther stratification can be applied using the
interval, sex, and ageGroup
arguments. The interval argument supports “overall” (no time
stratification), “years”, “quarters”, or “months”.
summariseConceptIdCounts(cdm,
omopTableName = "condition_occurrence",
countBy = "person",
interval = "years",
sex = TRUE,
ageGroup = list("<=50" = c(0, 50), ">50" = c(51, Inf))
) |>
select(group_level, strata_level, variable_name, estimate_name, additional_level) |>
glimpse()
#> Rows: 17,266
#> Columns: 5
#> $ group_level <chr> "condition_occurrence", "condition_occurrence", "cond…
#> $ strata_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name <chr> "Escherichia coli urinary tract infection", "Childhoo…
#> $ estimate_name <chr> "count_subjects", "count_subjects", "count_subjects",…
#> $ additional_level <chr> "0 &&& No matching concept", "0 &&& No matching conce…We can also filter the clinical table to a specific time window by setting the dateRange argument.
summarisedResult <- summariseConceptIdCounts(cdm,
omopTableName = "condition_occurrence",
dateRange = as.Date(c("1990-01-01", "2010-01-01"))
)
summarisedResult |>
omopgenerics::settings() |>
glimpse()
#> Rows: 1
#> Columns: 10
#> $ result_id <int> 1
#> $ result_type <chr> "summarise_concept_id_counts"
#> $ package_name <chr> "OmopSketch"
#> $ package_version <chr> "0.5.1.900"
#> $ group <chr> "omop_table"
#> $ strata <chr> ""
#> $ additional <chr> "source_concept_id &&& source_concept_name"
#> $ min_cell_count <chr> "0"
#> $ study_period_end <chr> "2010-01-01"
#> $ study_period_start <chr> "1990-01-01"Finally, you can restrict concept counts to a subset of subjects via
the sample argument: provide an integer to randomly select
that many person_ids from the person table, or
a character string naming a cohort table to limit counts to
its subject_ids.
summariseConceptIdCounts(cdm,
omopTableName = "condition_occurrence",
sample = 50
) |>
select(group_level, variable_name, estimate_name) |>
glimpse()
#> Rows: 84
#> Columns: 3
#> $ group_level <chr> "condition_occurrence", "condition_occurrence", "conditi…
#> $ variable_name <chr> "Escherichia coli urinary tract infection", "Childhood a…
#> $ estimate_name <chr> "count_records", "count_records", "count_records", "coun…Display the results
Finally, concept counts can be visualised using
tableConceptIdCounts(). By default, it generates an
interactive reactable
table, but DT datatables are
also supported.
result <- summariseConceptIdCounts(cdm,
omopTableName = "measurement",
countBy = "record"
)
tableConceptIdCounts(result, type = "reactable")
tableConceptIdCounts(result, type = "datatable")The display argument in tableConceptIdCounts() controls
which concept counts are shown. Available options include
display = "overall". It is the default option and it shows
both standard and source concept counts.
tableConceptIdCounts(result, display = "overall")If display = "standard" the table shows only
standard concept_id and concept_name counts.
tableConceptIdCounts(result, display = "standard")If display = "source" the table shows only
source concept_id and concept_name counts.
tableConceptIdCounts(result, display = "source")
#> Warning: Values from `estimate_value` are not uniquely identified; output will contain
#> list-cols.
#> • Use `values_fn = list` to suppress this warning.
#> • Use `values_fn = {summary_fun}` to summarise duplicates.
#> • Use the following dplyr code to identify duplicates.
#> {data} |>
#> dplyr::summarise(n = dplyr::n(), .by = c(cdm_name, group_level,
#> source_concept_name, source_concept_id, result_id, group_name, estimate_type,
#> estimate_name)) |>
#> dplyr::filter(n > 1L)If display = "missing source" the table shows only
counts for concept ids that are missing a corresponding source concept
id.
tableConceptIdCounts(result, display = "missing source")If display = "missing standard" the table shows only
counts for source concept ids that are missing a mapped standard concept
id.
tableConceptIdCounts(result, display = "missing standard")
#> Warning: `result` does not contain any `summarise_concept_id_counts`
#> data.Display the most frequent concepts
You can use the tableTopConceptCounts() function to
display the most frequent concepts in a OMOP CDM table in formatted
table. By default, the function returns a gt table, but you can also choose
from other output formats, including flextable, datatable, and reactable.
result <- summariseConceptIdCounts(cdm,
omopTableName = "drug_exposure",
countBy = "record"
)
tableTopConceptCounts(result, type = "gt")| Top |
Cdm name
|
|---|---|
| mockOmopSketch | |
| drug_exposure | |
| 1 | Standard: pneumococcal polysaccharide vaccine, 23 valent (40213201) Source: No matching concept (0) 100 |
| 2 | Standard: Alendronate (1557272) Source: No matching concept (0) 100 |
| 3 | Standard: poliovirus vaccine, inactivated (40213160) Source: No matching concept (0) 100 |
| 4 | Standard: fluticasone (1149380) Source: No matching concept (0) 100 |
| 5 | Standard: diphtheria, tetanus toxoids and acellular pertussis vaccine (40213281) Source: No matching concept (0) 100 |
| 6 | Standard: Diclofenac (1124300) Source: No matching concept (0) 100 |
| 7 | Standard: hepatitis A vaccine, pediatric/adolescent dosage, 2 dose schedule (40213299) Source: No matching concept (0) 100 |
| 8 | Standard: fexofenadine (1153428) Source: No matching concept (0) 100 |
| 9 | Standard: Haemophilus influenzae type b vaccine, PRP-OMP conjugate (40213314) Source: No matching concept (0) 100 |
| 10 | Standard: Propofol (753626) Source: No matching concept (0) 100 |
Customising the number of top concepts
By default, the function shows the top 10 concepts. You can change
this using the top argument:
tableTopConceptCounts(result, top = 5)| Top |
Cdm name
|
|---|---|
| mockOmopSketch | |
| drug_exposure | |
| 1 | Standard: pneumococcal polysaccharide vaccine, 23 valent (40213201) Source: No matching concept (0) 100 |
| 2 | Standard: Alendronate (1557272) Source: No matching concept (0) 100 |
| 3 | Standard: poliovirus vaccine, inactivated (40213160) Source: No matching concept (0) 100 |
| 4 | Standard: fluticasone (1149380) Source: No matching concept (0) 100 |
| 5 | Standard: diphtheria, tetanus toxoids and acellular pertussis vaccine (40213281) Source: No matching concept (0) 100 |
Choosing the count type
If your summary includes both record and person counts, you must
specify which type to display using the countBy
argument:
result <- summariseConceptIdCounts(cdm,
omopTableName = "drug_exposure",
countBy = c("record", "person")
)
tableTopConceptCounts(result, countBy = "person")| Top |
Cdm name
|
|---|---|
| mockOmopSketch | |
| drug_exposure | |
| 1 | Standard: {28 (Norethindrone 0.35 MG Oral Tablet) } Pack [Camila 28 Day] (19127922) Source: No matching concept (0) 73 |
| 2 | Standard: 120 ACTUAT Fluticasone propionate 0.044 MG/ACTUAT Metered Dose Inhaler (40169216) Source: No matching concept (0) 71 |
| 3 | Standard: 1 ML medroxyprogesterone acetate 150 MG/ML Injection (40224805) Source: No matching concept (0) 71 |
| 4 | Standard: Chlorpheniramine Maleate 4 MG Oral Tablet (43012036) Source: No matching concept (0) 71 |
| 5 | Standard: Sodium Chloride (967823) Source: No matching concept (0) 70 |
| 6 | Standard: Tacrine (836654) Source: No matching concept (0) 70 |
| 7 | Standard: norgestimate (1515774) Source: No matching concept (0) 70 |
| 8 | Standard: Piperacillin (1746114) Source: No matching concept (0) 70 |
| 9 | Standard: Warfarin Sodium 5 MG Oral Tablet (40163554) Source: No matching concept (0) 69 |
| 10 | Standard: Terfenadine (1150836) Source: No matching concept (0) 69 |