matchCohorts()
generate a new cohort matched to individuals in an
existing cohort. Individuals can be matched based on year of birth and sex.
Matching is done at the record level, so if individuals have multiple
cohort entries they can be matched to different individuals for each of their
records.
Two new cohorts will be created when matching. The first is those cohort entries which were matched ("_sampled" is added to the original cohort name for this cohort). The other is the matches found from the database population ("_matched" is added to the original cohort name for this cohort).
Usage
matchCohorts(
cohort,
cohortId = NULL,
matchSex = TRUE,
matchYearOfBirth = TRUE,
ratio = 1,
keepOriginalCohorts = FALSE,
name = tableName(cohort)
)
Arguments
- cohort
A cohort table in a cdm reference.
- cohortId
Vector identifying which cohorts to include (cohort_definition_id or cohort_name). Cohorts not included will be removed from the cohort set.
- matchSex
Whether to match in sex.
- matchYearOfBirth
Whether to match in year of birth.
- ratio
Number of allowed matches per individual in the target cohort.
- keepOriginalCohorts
If TRUE the original cohorts will be return together with the new ones. If FALSE only the new cohort will be returned.
- name
Name of the new cohort table created in the cdm object.
Examples
# \donttest{
library(CohortConstructor)
library(dplyr)
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
cdm <- mockCohortConstructor(nPerson = 200)
cdm$new_matched_cohort <- cdm$cohort2 |>
matchCohorts(
name = "new_matched_cohort",
cohortId = 2,
matchSex = TRUE,
matchYearOfBirth = TRUE,
ratio = 1)
#> Starting matching
#> ℹ Creating copy of target cohort.
#> • 1 cohort to be matched.
#> ℹ Creating controls cohorts.
#> ℹ Excluding cases from controls
#> • Matching by gender_concept_id and year_of_birth
#> • Removing controls that were not in observation at index date
#> • Excluding target records whose pair is not in observation
#> • Adjusting ratio
#> Binding cohorts
#> ✔ Done
cdm$new_matched_cohort
#> # Source: table<main.new_matched_cohort> [?? x 5]
#> # Database: DuckDB v1.1.3 [unknown@Linux 6.8.0-1020-azure:R 4.4.2/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date cluster_id
#> <int> <int> <date> <date> <dbl>
#> 1 1 89 2007-10-04 2011-10-31 54
#> 2 1 150 2008-01-10 2008-11-21 113
#> 3 1 110 2005-06-30 2005-09-30 99
#> 4 1 19 2015-04-24 2015-09-01 108
#> 5 1 166 2017-05-16 2017-09-25 59
#> 6 1 33 1993-05-10 1997-04-01 14
#> 7 1 39 2002-07-11 2004-08-07 45
#> 8 1 16 2007-05-18 2007-10-08 102
#> 9 1 54 2015-03-29 2016-03-31 18
#> 10 1 91 1995-05-16 2002-02-02 42
#> # ℹ more rows
# }