matchCohorts()
generate a new cohort matched to individuals in an
existing cohort. Individuals can be matched based on year of birth and sex.
Usage
matchCohorts(
cohort,
cohortId = NULL,
matchSex = TRUE,
matchYearOfBirth = TRUE,
ratio = 1,
keepOriginalCohorts = FALSE,
name = tableName(cohort)
)
Arguments
- cohort
A cohort table in a cdm reference.
- cohortId
Vector identifying which cohorts to include (cohort_definition_id or cohort_name). Cohorts not included will be removed from the cohort set.
- matchSex
Whether to match in sex.
- matchYearOfBirth
Whether to match in year of birth.
- ratio
Number of allowed matches per individual in the target cohort.
- keepOriginalCohorts
If TRUE the original cohorts will be return together with the new ones. If FALSE only the new cohort will be returned.
- name
Name of the new cohort table created in the cdm object.
Examples
# \donttest{
library(CohortConstructor)
library(dplyr)
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
cdm <- mockCohortConstructor(nPerson = 200)
cdm$new_matched_cohort <- cdm$cohort2 |>
matchCohorts(
name = "new_matched_cohort",
cohortId = 2,
matchSex = TRUE,
matchYearOfBirth = TRUE,
ratio = 1)
#> Starting matching
#> Warning: Multiple records per person detected. The matchCohorts() function is designed
#> to operate under the assumption that there is only one record per person within
#> each cohort. If this assumption is not met, each record will be treated
#> independently. As a result, the same individual may be matched multiple times,
#> leading to inconsistent and potentially misleading results.
#> ℹ Creating copy of target cohort.
#> • 1 cohort to be matched.
#> ℹ Creating controls cohorts.
#> ℹ Excluding cases from controls
#> • Matching by gender_concept_id and year_of_birth
#> • Removing controls that were not in observation at index date
#> • Excluding target records whose pair is not in observation
#> • Adjusting ratio
#> Binding cohorts
#> ✔ Done
cdm$new_matched_cohort
#> # Source: table<main.new_matched_cohort> [?? x 5]
#> # Database: DuckDB v1.1.2 [unknown@Linux 6.5.0-1025-azure:R 4.4.2/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date cluster_id
#> <int> <int> <date> <date> <dbl>
#> 1 1 110 2005-10-01 2006-06-12 99
#> 2 1 19 2015-04-24 2015-09-01 108
#> 3 1 166 2017-05-16 2017-09-25 59
#> 4 1 33 1993-05-10 1997-04-01 14
#> 5 1 16 2007-05-18 2007-10-08 102
#> 6 1 54 2015-03-29 2016-03-31 18
#> 7 1 91 1995-05-16 2002-02-02 42
#> 8 1 30 2008-04-20 2010-01-04 27
#> 9 1 47 1994-03-23 1997-08-27 89
#> 10 1 62 2008-04-08 2008-07-26 45
#> # ℹ more rows
# }