Building base cohorts • CohortConstructor

Introduction

Let’s first create a cdm reference to the Eunomia synthetic data.

library(CDMConnector)
library(CodelistGenerator)
library(PatientProfiles)
library(CohortConstructor)
library(dplyr)

con <- DBI::dbConnect(duckdb::duckdb(), 
                      dbdir = eunomia_dir())
cdm <- cdm_from_con(con, cdm_schema = "main", 
                    write_schema = c(prefix = "my_study_", schema = "main"))

Concept based cohort creation

A way of defining base cohorts is to identify clinical records with codes from some pre-specified list. Here for example we’ll first find codes for diclofenac and acetaminophen.

drug_codes <- getDrugIngredientCodes(cdm, 
                                     name = c("acetaminophen",
                                              "amoxicillin", 
                                              "diclofenac", 
                                              "simvastatin",
                                              "warfarin"))

drug_codes
#> 
#> - 11289_warfarin (2 codes)
#> - 161_acetaminophen (7 codes)
#> - 3355_diclofenac (1 codes)
#> - 36567_simvastatin (2 codes)
#> - 723_amoxicillin (4 codes)

Now we have our codes of interest, we’ll make cohorts for each of these where cohort exit is defined as the event start date (which for these will be their drug exposure end date).

cdm$drugs <- conceptCohort(cdm, 
                           conceptSet = drug_codes,
                           exit = "event_end_date",
                           name = "drugs")

settings(cdm$drugs)
#> # A tibble: 5 × 4
#>   cohort_definition_id cohort_name       cdm_version vocabulary_version
#>                  <int> <chr>             <chr>       <chr>             
#> 1                    1 11289_warfarin    5.3         v5.0 18-JAN-19    
#> 2                    2 161_acetaminophen 5.3         v5.0 18-JAN-19    
#> 3                    3 3355_diclofenac   5.3         v5.0 18-JAN-19    
#> 4                    4 36567_simvastatin 5.3         v5.0 18-JAN-19    
#> 5                    5 723_amoxicillin   5.3         v5.0 18-JAN-19
cohortCount(cdm$drugs)
#> # A tibble: 5 × 3
#>   cohort_definition_id number_records number_subjects
#>                  <int>          <int>           <int>
#> 1                    1            137             137
#> 2                    2          13908            2679
#> 3                    3            830             830
#> 4                    4            182             182
#> 5                    5           4307            2130
attrition(cdm$drugs)
#> # A tibble: 20 × 7
#>    cohort_definition_id number_records number_subjects reason_id reason         
#>                   <int>          <int>           <int>     <int> <chr>          
#>  1                    1            137             137         1 Initial qualif…
#>  2                    1            137             137         2 Record start <…
#>  3                    1            137             137         3 Record in obse…
#>  4                    1            137             137         4 Merge overlapp…
#>  5                    2          14205            2679         1 Initial qualif…
#>  6                    2          14205            2679         2 Record start <…
#>  7                    2          14205            2679         3 Record in obse…
#>  8                    2          13908            2679         4 Merge overlapp…
#>  9                    3            850             850         1 Initial qualif…
#> 10                    3            850             850         2 Record start <…
#> 11                    3            830             830         3 Record in obse…
#> 12                    3            830             830         4 Merge overlapp…
#> 13                    4            182             182         1 Initial qualif…
#> 14                    4            182             182         2 Record start <…
#> 15                    4            182             182         3 Record in obse…
#> 16                    4            182             182         4 Merge overlapp…
#> 17                    5           4309            2130         1 Initial qualif…
#> 18                    5           4309            2130         2 Record start <…
#> 19                    5           4309            2130         3 Record in obse…
#> 20                    5           4307            2130         4 Merge overlapp…
#> # ℹ 2 more variables: excluded_records <int>, excluded_subjects <int>

Demographic based cohort creation

One base cohort we can create is based around patient demographics. Here for example we create a cohort where people enter on their 18th birthday and leave at on the day before their 66th birthday.

cdm$working_age_cohort <- demographicsCohort(cdm = cdm, 
                                             ageRange = c(18, 65), 
                                             name = "working_age_cohort")