Generate a new cohort matched cohort — matchCohorts • CohortConstructor

matchCohorts() generate a new cohort matched to individuals in an existing cohort. Individuals can be matched based on year of birth and sex. Matching is done at the record level, so if individuals have multiple cohort entries they can be matched to different individuals for each of their records.

Two new cohorts will be created when matching. The first is those cohort entries which were matched ("_sampled" is added to the original cohort name for this cohort). The other is the matches found from the database population ("_matched" is added to the original cohort name for this cohort).

Usage

matchCohorts(
  cohort,
  cohortId = NULL,
  matchSex = TRUE,
  matchYearOfBirth = TRUE,
  ratio = 1,
  keepOriginalCohorts = FALSE,
  name = tableName(cohort)
)

Arguments

cohort: A cohort table in a cdm reference.
cohortId: Vector identifying which cohorts to include (cohort_definition_id or cohort_name). Cohorts not included will be removed from the cohort set.
matchSex: Whether to match in sex.
matchYearOfBirth: Whether to match in year of birth.
ratio: Number of allowed matches per individual in the target cohort.
keepOriginalCohorts: If TRUE the original cohorts will be return together with the new ones. If FALSE only the new cohort will be returned.
name: Name of the new cohort table created in the cdm object.

Value

A cohort table.

Examples

# \donttest{
library(CohortConstructor)
library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union
cdm <- mockCohortConstructor(nPerson = 200)
cdm$new_matched_cohort <- cdm$cohort2 |>
  matchCohorts(
    name = "new_matched_cohort",
    cohortId = 2,
    matchSex = TRUE,
    matchYearOfBirth = TRUE,
    ratio = 1)
#> Starting matching
#> ℹ Creating copy of target cohort.
#> • 1 cohort to be matched.
#> ℹ Creating controls cohorts.
#> ℹ Excluding cases from controls
#> • Matching by gender_concept_id and year_of_birth
#> • Removing controls that were not in observation at index date
#> • Excluding target records whose pair is not in observation
#> • Adjusting ratio
#> Binding cohorts
#> ✔ Done
cdm$new_matched_cohort
#> # Source:   table<new_matched_cohort> [?? x 5]
#> # Database: DuckDB v1.2.0 [unknown@Linux 6.8.0-1021-azure:R 4.4.3/:memory:]
#>    cohort_definition_id subject_id cohort_start_date cohort_end_date cluster_id
#>                   <int>      <int> <date>            <date>               <dbl>
#>  1                    1         89 2002-04-17        2005-07-29              54
#>  2                    1        150 2007-10-23        2008-01-09             113
#>  3                    1         19 2015-04-24        2015-09-01             108
#>  4                    1         33 1993-05-10        1997-04-01              14
#>  5                    1         39 2004-08-08        2006-11-19              44
#>  6                    1        103 1982-05-27        1987-10-22              21
#>  7                    1         30 2008-04-20        2010-01-04              27
#>  8                    1         16 2007-05-18        2007-10-08             101
#>  9                    1         16 2004-09-11        2006-10-01             102
#> 10                    1         80 2000-04-04        2000-12-21              18
#> # ℹ more rows
# }