Skip to contents

For this example we’ll use the Eunomia synthetic data from the CDMConnector package.

con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomia_dir())
cdm <- cdm_from_con(con, cdm_schema = "main", 
                    write_schema = c(prefix = "my_study_", schema = "main"))

Let’s start by creating two drug cohorts, one for users of diclofenac and another for users of acetaminophen.

cdm$medications <- conceptCohort(cdm = cdm, 
                                 conceptSet = list("diclofenac" = 1124300,
                                                   "acetaminophen" = 1127433), 
                                 name = "medications")

settings(cdm$medications)
#> # A tibble: 2 × 4
#>   cohort_definition_id cohort_name   cdm_version vocabulary_version
#>                  <int> <chr>         <chr>       <chr>             
#> 1                    1 acetaminophen 5.3         v5.0 18-JAN-19    
#> 2                    2 diclofenac    5.3         v5.0 18-JAN-19
cohortCount(cdm$medications)
#> # A tibble: 2 × 3
#>   cohort_definition_id number_records number_subjects
#>                  <int>          <int>           <int>
#> 1                    1           9365            2580
#> 2                    2            830             830

Cohort 1 contains users of acetaminophen and has 2580 subjects and 9365 records. Cohort 2 contains users of diclofenac and has 830 subjects and 830 records.

Keep only the first record per person

Individuals can contribute multiple records per cohort. However now we’ll keep only their earliest cohort entry of the remaining records using requireIsFirstEntry() from CohortConstructor.

cdm$medications <- cdm$medications %>% 
  requireIsFirstEntry()

summary_attrition <- summariseCohortAttrition(cdm$medications)
plotCohortAttrition(summary_attrition, cohortId = 1)

The flow chart above illustrates changes to cohort 1 (acetaminophen users) when restricted to only the first record for each individual. While the number of individuals remains unchanged, 6,785 records are excluded.

Keep only a specific range of records

We can also choose a specific range of records using requireIsEntry() from CohortConstructor.

cdm$medications <- cdm$medications %>%
  requireIsEntry(c(1,5)) 

summary_attrition <- summariseCohortAttrition(cdm$medications)
plotCohortAttrition(summary_attrition, cohortId = 1)

The flow chart above illustrates the changes to cohort 1 when restricted to only the first five records for each individual. While the number of individuals remains unchanged, 6,785 records are excluded.

Keep only the last record per person

It is also possible to only include the last record for each individual using requireIsLastEntry() from CohortConstructor.

cdm$medications <- cdm$medications %>% 
  requireIsLastEntry()

summary_attrition <- summariseCohortAttrition(cdm$medications)
plotCohortAttrition(summary_attrition, cohortId = 1)

The flow chart above illustrates changes to cohort 1 when restricted to only the last record for each individual. While the number of individuals remains unchanged, 6,785 records are excluded.

Keep only records within a date range

Individuals may contribute multiple records over extended periods. We can define the study’s start and end dates, filtering out records that fall outside the specified date range using the requireInDateRang function from CohortConstructor.

cdm$medications <- cdm$medications %>% 
  requireInDateRange(dateRange = as.Date(c("2010-01-01", "2015-01-01")))

summary_attrition <- summariseCohortAttrition(cdm$medications)
plotCohortAttrition(summary_attrition, cohortId = 1)

The flow chart above illustrates the changes to cohort 1 when restricted to a specified date range. 1,948 individuals and 8,660 records are excluded.

Keep only records from cohorts with a minimum number of individuals

Some studies might require a minimum cohort size. We can define the minimum size, filtering out records that are smaller than required, using the requireMinCohortCount function from CohortConstructor.

cdm$medications <- cdm$medications %>% 
  requireMinCohortCount(minCohortCount = 1000)

summary_attrition <- summariseCohortAttrition(cdm$medications)
plotCohortAttrition(summary_attrition, cohortId = 1)

Cohort 1 includes 2,580 individuals, so none were excluded due to the minimum cohort size restriction of 1,000.

Running multiple requirements

Multiple restrictions can be applied to a cohort, however care needs to be taken that the restrictions are placed in the correct order. For example, it is recommended to apply the minimum size restriction last.

cdm$medications <- cdm$medications %>% 
  requireIsFirstEntry() %>%
  requireInDateRange(dateRange = as.Date(c("2010-01-01", "2016-01-01")))

summary_attrition <- summariseCohortAttrition(cdm$medications)
plotCohortAttrition(summary_attrition, cohortId = 1)

The flow chart above illustrates the changes to cohort 1 when restricted to only include the first record of each individual over a specified date range. 2,529 individuals and 9,314 records are excluded.