Generate a matched age and sex cohort
a03_age_sex_matching.Rmd
Introduction
CohortConstructor packages includes a function to obtain an age and
sex matched cohort, the generateMatchedCohortSet()
function. In this vignette, we will explore the usage of this
function.
Create mock data
We will first use mockDrugUtilisation()
function from
DrugUtilisation package to create mock data.
library(CohortConstructor)
library(dplyr)
cdm <- mockCohortConstructor(nPerson = 1000)
As we will use cohort1
to explore
generateMatchedCohortSet()
, let us first use
cohort_attrition()
from CDMConnector package to explore
this cohort:
CDMConnector::cohort_set(cdm$cohort1)
Use generateMatchedCohortSet() to create an age-sex matched cohort
Let us first see an example of how this function works. For its
usage, we need to provide a cdm
object, the
targetCohortName
, which is the name of the table containing
the cohort of interest, and the name
of the new generated
tibble containing the cohort and the matched cohort. We will also use
the argument targetCohortId
to specify that we only want a
matched cohort for cohort_definition_id = 1
.
cdm$matched_cohort1 <- matchCohorts(
cohort = cdm$cohort1,
cohortId = 1,
name = "matched_cohort1")
CDMConnector::cohort_set(cdm$matched_cohort1)
Notice that in the generated tibble, there are two cohorts:
cohort_definition_id = 1
(original cohort), and
cohort_definition_id = 4
(matched cohort).
target_cohort_name column indicates which is the original
cohort. match_sex and match_year_of_birth adopt
boolean values (TRUE
/FALSE
) indicating if we
have matched for sex and age, or not. match_status indicate if
it is the original cohort (target
) or if it is the matched
cohort (matched
). target_cohort_id indicates which
is the cohort_id of the original cohort.
Check the exclusion criteria applied to generate the new cohorts by
using cohort_attrition()
from CDMConnector package:
# Original cohort
CDMConnector::cohort_attrition(cdm$matched_cohort1) %>% filter(cohort_definition_id == 1)
# Matched cohort
CDMConnector::cohort_attrition(cdm$matched_cohort1) %>% filter(cohort_definition_id == 4)
Briefly, from the original cohort, we exclude first those individuals that do not have a match, and then individuals that their matching pair is not in observation during the assigned cohort_start_date. From the matched cohort, we start from the whole database and we first exclude individuals that are in the original cohort. Afterwards, we exclude individuals that do not have a match, then individuals that are not in observation during the assigned cohort_start_date, and finally we remove as many individuals as required to fulfill the ratio.
Notice that matching pairs are randomly assigned, so it is probable
that every time you execute this function, the generated cohorts change.
Use set.seed()
to avoid this.
matchSex parameter
matchSex
is a boolean parameter
(TRUE
/FALSE
) indicating if we want to match by
sex (TRUE
) or we do not want to (FALSE
).
matchYear parameter
matchYear
is another boolean parameter
(TRUE
/FALSE
) indicating if we want to match by
age (TRUE
) or we do not want (FALSE
).
Notice that if matchSex = FALSE
and
matchYear = FALSE
, we will obtain an unmatched comparator
cohort.
ratio parameter
The default matching ratio is 1:1 (ratio = 1
). Use
cohort_counts()
from CDMConnector to check if the matching
has been done as desired.
CDMConnector::cohort_count(cdm$matched_cohort1)
You can modify the ratio
parameter to tailor your
matched cohort. ratio
can adopt values from 1 to Inf.
cdm$matched_cohort2 <- matchCohorts(
cohort = cdm$cohort1,
cohortId = 1,
name = "matched_cohort2",
ratio = Inf)
CDMConnector::cohort_count(cdm$matched_cohort2)
Generate matched cohorts simultaneously across multiple cohorts
All these functionalities can be implemented across multiple cohorts
simultaneously. Specify in targetCohortId
parameter which
are the cohorts of interest. If set to NULL, all the cohorts present in
targetCohortName
will be matched.
cdm$matched_cohort3 <- matchCohorts(
cohort = cdm$cohort1,
cohortId = c(1,3),
name = "matched_cohort3",
ratio = 2)
CDMConnector::cohort_set(cdm$matched_cohort3) %>% arrange(cohort_definition_id)
CDMConnector::cohort_count(cdm$matched_cohort3) %>% arrange(cohort_definition_id)
Notice that each cohort has their own (and independent of other cohorts) matched cohort.