Summarise concept counts

Introduction

In this vignette, we will explore the OmopSketch functions designed to provide information about the number of counts of specific concepts. Specifically, there are two key functions that facilitate this, summariseConceptSetCounts() and plotConceptCounts(). The former one creates a summary statistics results with the number of counts per each concept, and the latter one creates a histogram plot.

Create a mock cdm

Let’s see an example of the previous functions. To start with, we will load essential packages and create a mock cdm using Eunomia database.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(CDMConnector)
library(DBI)
library(duckdb)
library(OmopSketch)
library(CodelistGenerator)

# Connect to Eunomia database
con <- DBI::dbConnect(duckdb::duckdb(), CDMConnector::eunomiaDir())
cdm <- CDMConnector::cdmFromCon(
  con = con, cdmSchema = "main", writeSchema = "main"
)
#> Note: method with signature 'DBIConnection#Id' chosen for function 'dbExistsTable',
#>  target signature 'duckdb_connection#Id'.
#>  "duckdb_connection#ANY" would also be valid
#> ! cdm name not specified and could not be inferred from the cdm source table

cdm 
#> 
#> ── # OMOP CDM reference (duckdb) of An OMOP CDM database ───────────────────────
#> • omop tables: person, observation_period, visit_occurrence, visit_detail,
#> condition_occurrence, drug_exposure, procedure_occurrence, device_exposure,
#> measurement, observation, death, note, note_nlp, specimen, fact_relationship,
#> location, care_site, provider, payer_plan_period, cost, drug_era, dose_era,
#> condition_era, metadata, cdm_source, concept, vocabulary, domain,
#> concept_class, concept_relationship, relationship, concept_synonym,
#> concept_ancestor, source_to_concept_map, drug_strength
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -

First, let’s generate a list of codes for the concept dementia using CodelistGenerator package.

acetaminophen <- getCandidateCodes(
  cdm = cdm,
  keywords = "acetaminophen",
  domains = "Drug",
  includeDescendants = TRUE
) |>
  dplyr::pull("concept_id")
#> Limiting to domains of interest
#> Getting concepts to include
#> Adding descendants
#> Search completed. Finishing up.
#> ✔ 7 candidate concepts identified
#> 
#> Time taken: 0 minutes and 0 seconds

sinusitis <- getCandidateCodes(
  cdm = cdm,
  keywords = "sinusitis",
  domains = "Condition",
  includeDescendants = TRUE
) |>
  dplyr::pull("concept_id")
#> Limiting to domains of interest
#> Getting concepts to include
#> Adding descendants
#> Search completed. Finishing up.
#> ✔ 4 candidate concepts identified
#> 
#> Time taken: 0 minutes and 0 seconds

Now we want to explore the occurrence of these concepts within the database. For that, we can use summariseConceptSetCounts() from OmopSketch:

summariseConceptSetCounts(cdm,
                       conceptSet = list("acetaminophen" = acetaminophen,                          
                                        "sinusitis" = sinusitis)) |>   
  select(group_level, variable_name, variable_level, estimate_name, estimate_value) |>   
  glimpse() 
#> ℹ Searching concepts from domain drug in drug_exposure.
#> ℹ Searching concepts from domain condition in condition_occurrence.
#> ℹ Counting concepts
#> Rows: 24
#> Columns: 5
#> $ group_level    <chr> "sinusitis", "sinusitis", "acetaminophen", "acetaminoph…
#> $ variable_name  <chr> "Number records", "Number subjects", "Number records", …
#> $ variable_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ estimate_name  <chr> "count", "count", "count", "count", "count", "count", "…
#> $ estimate_value <chr> "20033", "2689", "14205", "2679", "9365", "2580", "1993…

By default, the function will provide information about either the number of records (estimate_name == "record_count") for each concept_id or the number of people (estimate_name == "person_count"):

summariseConceptSetCounts(cdm, 
                       conceptSet = list("acetaminophen" = acetaminophen, 
                                        "sinusitis" = sinusitis), 
                       countBy = c("record","person")) |>
  select(group_level, variable_name, estimate_name) |>
  distinct() |>
  arrange(group_level, variable_name)
#> ℹ Searching concepts from domain drug in drug_exposure.
#> ℹ Searching concepts from domain condition in condition_occurrence.
#> ℹ Counting concepts
#> # A tibble: 4 × 3
#>   group_level   variable_name   estimate_name
#>   <chr>         <chr>           <chr>        
#> 1 acetaminophen Number records  count        
#> 2 acetaminophen Number subjects count        
#> 3 sinusitis     Number records  count        
#> 4 sinusitis     Number subjects count

However, we can specify which one is of interest using countBy argument:

summariseConceptSetCounts(cdm, 
                       conceptSet = list("acetaminophen" = acetaminophen,
                                        "sinusitis" = sinusitis),
                       countBy = "record") |>
  select(group_level, variable_name, estimate_name) |>
  distinct() |>
  arrange(group_level, variable_name) 
#> ℹ Searching concepts from domain drug in drug_exposure.
#> ℹ Searching concepts from domain condition in condition_occurrence.
#> ℹ Counting concepts
#> # A tibble: 2 × 3
#>   group_level   variable_name  estimate_name
#>   <chr>         <chr>          <chr>        
#> 1 acetaminophen Number records count        
#> 2 sinusitis     Number records count

One can further stratify by year, sex or age group using the year, sex, and ageGroup arguments.

summariseConceptSetCounts(cdm,
                       conceptSet = list("acetaminophen" = acetaminophen,
                                        "sinusitis" = sinusitis),
                       countBy = "person",
                       interval = "years",
                       sex  = TRUE,  
                       ageGroup = list("<=50" = c(0,50), ">50" = c(51,Inf))) |>   
  select(group_level, strata_level, variable_name, estimate_name) |>   glimpse() 
#> ℹ Searching concepts from domain drug in drug_exposure.
#> ℹ Searching concepts from domain condition in condition_occurrence.
#> ℹ Counting concepts
#> Rows: 7,545
#> Columns: 4
#> $ group_level   <chr> "acetaminophen", "sinusitis", "acetaminophen", "sinusiti…
#> $ strata_level  <chr> "overall", "overall", "<=50", "<=50", ">50", ">50", "Fem…
#> $ variable_name <chr> "Number subjects", "Number subjects", "Number subjects",…
#> $ estimate_name <chr> "count", "count", "count", "count", "count", "count", "c…

Visualise the results

Finally, we can visualise the concept counts using plotRecordCounts().

summariseConceptSetCounts(cdm, 
                       conceptSet = list("sinusitis" = sinusitis), 
                       countBy = "person") |> 
  plotConceptSetCounts()
#> ℹ Searching concepts from domain condition in condition_occurrence.
#> ℹ Counting concepts

Notice that either person counts or record counts can be plotted. If both have been included in the summarised result, you will have to filter to only include one variable at time:

summariseConceptSetCounts(cdm, 
                       conceptSet = list("sinusitis" = sinusitis),
                       countBy = c("person","record")) |>
  filter(variable_name == "Number subjects") |>
  plotConceptSetCounts()
#> ℹ Searching concepts from domain condition in condition_occurrence.
#> ℹ Counting concepts

Additionally, if results were stratified by year, sex or age group, we can further use facet or colour arguments to highlight the different results in the plot. To help us identify by which variables we can colour or facet by, we can use visOmopResult package.

summariseConceptSetCounts(cdm, 
                       conceptSet = list("sinusitis" = sinusitis),
                       countBy = c("person"),
                       sex = TRUE, 
                       ageGroup = list("<=50" = c(0,50), ">50" = c(51, Inf))) |>
  visOmopResults::tidyColumns()
#> ℹ Searching concepts from domain condition in condition_occurrence.
#> ℹ Counting concepts
#>  [1] "cdm_name"              "codelist_name"         "age_group"            
#>  [4] "sex"                   "variable_name"         "variable_level"       
#>  [7] "count"                 "standard_concept_name" "standard_concept_id"  
#> [10] "source_concept_name"   "source_concept_id"     "domain_id"

summariseConceptSetCounts(cdm, 
                       conceptSet = list("sinusitis" = sinusitis),
                       countBy = c("person"),
                       sex = TRUE, 
                       ageGroup = list("<=50" = c(0,50), ">50" = c(51, Inf))) |>
  plotConceptSetCounts(facet = "sex", colour = "age_group")
#> ℹ Searching concepts from domain condition in condition_occurrence.
#> ℹ Counting concepts