
Updating cohort start and end dates
Source:vignettes/a05_update_cohort_start_end.Rmd
a05_update_cohort_start_end.Rmd
Introduction
Accurately defining cohort entry and exit dates is crucial in
observational research to ensure the validity of study findings. The
CohortConstructor
package provides several functions to
adjust these dates based on specific criteria, and this vignette
demonstrates how to use them.
Functions to update cohort dates can be categorized into four groups:
Exit at Specific Date Functions: Adjust the cohort end date to predefined events (observation end and death date).
Cohort Entry or Exit Based on Other Date Columns: Modify cohort start or end dates to the earliest or latests from a set of date columns.
Trim Dates Functions: Restrict cohort entries based on demographic criteria or specific date ranges.
Pad Dates Functions: Adjust cohort start or end dates by adding or subtracting a specified number of days.
We’ll explore each category in the following sections.
First, we’ll connect to the Eunomia synthetic data and create a mock cohort of women in the database to use as example in the vignette.
library(CohortConstructor)
library(CohortCharacteristics)
library(PatientProfiles)
library(CDMConnector)
if (Sys.getenv("EUNOMIA_DATA_FOLDER") == ""){
Sys.setenv("EUNOMIA_DATA_FOLDER" = file.path(tempdir(), "eunomia"))}
if (!dir.exists(Sys.getenv("EUNOMIA_DATA_FOLDER"))){ dir.create(Sys.getenv("EUNOMIA_DATA_FOLDER"))
downloadEunomiaData()
}
#>
#> Download completed!
con <- DBI::dbConnect(duckdb::duckdb(), dbdir = CDMConnector::eunomiaDir())
#> Creating CDM database /tmp/RtmpEw3j9w/eunomia/GiBleed_5.3.zip
cdm <- CDMConnector::cdmFromCon(con, cdmSchema = "main",
writeSchema = "main", writePrefix = "my_study_")
cdm$cohort <- demographicsCohort(cdm = cdm, name = "cohort", sex = "Female")
#> ℹ Building new trimmed cohort
#> Adding demographics information
#> Creating initial cohort
#> Trim sex
#> ✔ Cohort trimmed
Exit at Specific Date
exitAtObservationEnd()
The exitAtObservationEnd()
function updates the cohort
end date to the end of the observation period for each subject. This
ensures that the cohort exit does not extend beyond the period during
which data is available for the subject.
cdm$cohort_observation_end <- cdm$cohort |>
exitAtObservationEnd(name = "cohort_observation_end")
As cohort entries cannot overlap, updating the end date to the observation end may result in overlapping records. In such cases, overlapping records are collapsed into a single entry (starting at the earliest entry and ending at the end of observation).
This function has an argument limitToCurrentPeriod
to
consider cases when a subject may have more than one observation period.
If limitToCurrentPeriod = TRUE
(default) end date will be
set to the end of the observation period where the record is registered.
If limitToCurrentPeriod = FALSE
, in addition to updating
the cohort end to the allocated observation end, cohort entries are
created for each of the subsequent observation periods.
exitAtDeath()
The exitAtDeath()
function sets the cohort end date to
the recorded death date of the subject.
By default, it keeps the end date of subjects who do not have a death
record unmodified; however, these can be dropped with the argument
requireDeath.
cdm$cohort_death <- cdm$cohort |>
exitAtDeath(requireDeath = TRUE, name = "cohort_death")
Cohort Entry or Exit Based on Other Date Columns
entryAtFirstDate()
The entryAtFirstDate()
function updates the cohort start
date to the earliest date among specified columns.
Next we want to set the entry date to the first of: diclofenac or acetaminophen prescriptions after cohort start, or cohort end date.
# create cohort with of drugs diclofenac and acetaminophen
cdm$medications <- conceptCohort(
cdm = cdm, name = "medications",
conceptSet = list("diclofenac" = 1124300, "acetaminophen" = 1127433)
)
#> Warning: ! `codelist` casted to integers.
#> ℹ Subsetting table drug_exposure using 2 concepts with domain: drug.
#> ℹ Combining tables.
#> ℹ Creating cohort attributes.
#> ℹ Applying cohort requirements.
#> ℹ Merging overlapping records.
#> ✔ Cohort medications created.
# add date first ocurrence of these drugs from index date
cdm$cohort_dates <- cdm$cohort |>
addCohortIntersectDate(
targetCohortTable = "medications",
nameStyle = "{cohort_name}",
name = "cohort_dates"
)
# set cohort start at the first ocurrence of one of the drugs, or the end date
cdm$cohort_entry_first <- cdm$cohort_dates |>
entryAtFirstDate(
dateColumns = c("diclofenac", "acetaminophen", "cohort_end_date"),
name = "cohort_entry_first"
)
#> Joining with `by = join_by(cohort_definition_id, subject_id, cohort_end_date)`
cdm$cohort_entry_first
#> # Source: table<my_study_cohort_entry_first> [?? x 7]
#> # Database: DuckDB v1.2.1 [unknown@Linux 6.8.0-1021-azure:R 4.4.3//tmp/RtmpEw3j9w/file27dd410eaee7.duckdb]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <int> <int> <date> <date>
#> 1 1 349 1994-01-21 2019-05-04
#> 2 1 986 1932-02-09 1996-08-31
#> 3 1 1642 1951-06-22 2018-10-23
#> 4 1 2771 1913-07-30 2018-09-07
#> 5 1 3430 2019-05-25 2019-05-25
#> 6 1 3920 1989-01-29 2018-09-16
#> 7 1 5146 1947-03-04 2018-11-15
#> 8 1 135 1988-12-10 2018-11-14
#> 9 1 449 1975-02-09 2018-09-13
#> 10 1 1046 1988-06-13 2019-06-02
#> # ℹ more rows
#> # ℹ 3 more variables: entry_reason <chr>, diclofenac <date>,
#> # acetaminophen <date>
entryAtLastDate()
The entryAtLastDate()
function works similarly to
entryAtFirstDate()
, however now the selected column is the
latest date among specified columns.
cdm$cohort_entry_last <- cdm$cohort_dates |>
entryAtLastDate(
dateColumns = c("diclofenac", "acetaminophen", "cohort_end_date"),
keepDateColumns = FALSE,
name = "cohort_entry_last"
)
cdm$cohort_entry_last
#> # Source: table<my_study_cohort_entry_last> [?? x 5]
#> # Database: DuckDB v1.2.1 [unknown@Linux 6.8.0-1021-azure:R 4.4.3//tmp/RtmpEw3j9w/file27dd410eaee7.duckdb]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <int> <int> <date> <date>
#> 1 1 372 2019-06-29 2019-06-29
#> 2 1 2004 2019-06-25 2019-06-25
#> 3 1 2217 2019-02-09 2019-02-09
#> 4 1 3283 2018-11-11 2018-11-11
#> 5 1 135 2018-11-14 2018-11-14
#> 6 1 814 2018-05-13 2018-05-13
#> 7 1 2534 2017-08-08 2017-08-08
#> 8 1 5302 2018-09-08 2018-09-08
#> 9 1 587 2018-05-30 2018-05-30
#> 10 1 639 2019-05-18 2019-05-18
#> # ℹ more rows
#> # ℹ 1 more variable: entry_reason <chr>
In this example, we set keepDateColumns
to FALSE, which
drops columns in dateColumns
.
exitAtFirstDate()
The exitAtFirstDate()
function updates the cohort end
date to the earliest date among specified columns.
For instance, next we want the exit to be observation end, except if there is a record of diclofenac or acetaminophen, in which case that would be the end:
cdm$cohort_exit_first <- cdm$cohort_dates |>
addFutureObservation(futureObservationType = "date", name = "cohort_exit_first") |>
exitAtFirstDate(
dateColumns = c("future_observation", "acetaminophen", "diclofenac"),
keepDateColumns = FALSE
)
cdm$cohort_exit_first
#> # Source: table<my_study_cohort_exit_first> [?? x 5]
#> # Database: DuckDB v1.2.1 [unknown@Linux 6.8.0-1021-azure:R 4.4.3//tmp/RtmpEw3j9w/file27dd410eaee7.duckdb]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date exit_reason
#> <int> <int> <date> <date> <chr>
#> 1 1 153 1943-06-10 1944-06-06 acetaminop…
#> 2 1 715 1983-02-15 1992-11-26 acetaminop…
#> 3 1 743 1975-11-02 1991-11-08 acetaminop…
#> 4 1 828 1952-07-22 2000-01-13 acetaminop…
#> 5 1 3248 1945-07-13 1968-09-29 acetaminop…
#> 6 1 4133 1976-07-31 1985-04-15 acetaminop…
#> 7 1 4347 1966-04-11 2001-05-07 acetaminop…
#> 8 1 4685 1939-09-04 1957-12-10 acetaminop…
#> 9 1 5145 1955-09-11 1968-04-04 acetaminop…
#> 10 1 5199 1962-01-16 1963-10-09 acetaminop…
#> # ℹ more rows
exitAtLastDate()
Similarly, the exitAtLastDate()
function sets the cohort
end date to the latest date among specified columns.
cdm$cohort_exit_last <- cdm$cohort_dates |>
exitAtLastDate(
dateColumns = c("cohort_end_date", "acetaminophen", "diclofenac"),
returnReason = FALSE,
keepDateColumns = FALSE,
name = "cohort_exit_last"
)
cdm$cohort_exit_last
#> # Source: table<my_study_cohort_exit_last> [?? x 4]
#> # Database: DuckDB v1.2.1 [unknown@Linux 6.8.0-1021-azure:R 4.4.3//tmp/RtmpEw3j9w/file27dd410eaee7.duckdb]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <int> <int> <date> <date>
#> 1 1 4447 1951-08-22 2019-03-10
#> 2 1 4755 1924-11-07 2019-02-01
#> 3 1 2518 1953-08-21 2019-02-08
#> 4 1 4321 1962-12-30 2018-12-09
#> 5 1 148 1978-10-15 2018-10-21
#> 6 1 1267 1952-05-08 2019-02-14
#> 7 1 1905 1966-02-18 2019-02-08
#> 8 1 2004 1971-06-01 2019-06-25
#> 9 1 2291 1960-10-14 2019-03-22
#> 10 1 2925 1960-01-11 2019-04-08
#> # ℹ more rows
In this last example, the return cohort doesn’t have the specified
date columns, neither the “reason” column indicating which date was used
for entry/exit. These was set with the keepDateColumns
and
returnReason
arguments, common throughout the functions in
this category.
Trim Dates Functions
trimDemographics()
The trimDemographics()
function restricts the cohort
based on patient demographics. This means that cohort start and end
dates are moved (within the original cohort entry dates) to ensure that
individuals meet specific demographic criteria throughout their cohort
participation. If individuals do not satisfy the criteria at any point
during their cohort period, their records are excluded.
For instance, if we trim using an age range from 18 to 65, individuals will only contribute in the cohort form the day they are 18 or older, up to the day before turning 66 (or before if they leave the database).
cdm$cohort_trim <- cdm$cohort |>
trimDemographics(ageRange = c(18, 65), name = "cohort_trim")
#> ℹ Building new trimmed cohort
#> Adding demographics information
#> Creating initial cohort
#> Trim age
#> ✔ Cohort trimmed
trimToDateRange()
The trimToDateRange()
function confines cohort entry and
exit dates within a specified date range, ensuring that cohort periods
align with the defined timeframe. If only the start or end of a range is
required, the other can be set to NA
.
For example, to restrict cohort dates to be on or after January 1st, 2015:
# Trim cohort dates to be within the year 2000
cdm$cohort_trim <- cdm$cohort_trim |>
trimToDateRange(dateRange = as.Date(c("2015-01-01", NA)))
Pad Dates Functions
padCohortStart()
The padCohortStart()
function adds (or subtracts) a
specified number of days to the cohort start date.
For example, to subtract 50 days from the cohort start date:
# Substract 50 days to cohort start
cdm$cohort <- cdm$cohort |> padCohortStart(days = -50, collapse = FALSE)
When subtracting days, it may result in cohort start dates preceding
the observation period start. By default, such entries are corrected to
the observation period start. To drop these entries instead, set the
padObservation
argument to FALSE.
Additionally, adjusting cohort start dates may lead to overlapping
entries for the same subject. The collapse
argument manages
this: if TRUE, merges overlapping entries into a single record with the
earliest start and latest end date (default), if FALSE retains only the
first of the overlapping entries.
padCohortEnd()
Similarly, the padCohortEnd()
function adjusts the
cohort end date by adding (or subtracting) a specified number of
days.
The example below adds 1000 days to cohort end date, while dropping records that are outside of observation after adding days.
cdm$cohort_pad <- cdm$cohort |>
padCohortEnd(days = 1000, padObservation = FALSE, name = "cohort_pad")
Additionally, days to add can also be specified with a numeric column in the cohort, which allows to add a specific number of days for each record:
cdm$cohort <- cdm$cohort %>%
dplyr::mutate(days_to_add = !!CDMConnector::datediff("cohort_start_date", "cohort_end_date")) |>
padCohortEnd(days = "days_to_add", padObservation = FALSE)
padCohortDate()
The padCohortDate()
function provides a more flexible
approach by allowing adjustments to either the cohort start or end date
based on specified parameters. You can define which date to adjust
(cohortDate
), the reference date for the adjustment
(indexDate
), and the number of days to add or subtract.
For example, to set the cohort end date to be 365 days after the cohort start date:
cdm$cohort <- cdm$cohort |>
padCohortDate(days = 365, cohortDate = "cohort_end_date", indexDate = "cohort_start_date")
Cohort ID argument
For all these functions, the cohortId argument specifies which cohorts to modify. This allows for targeted adjustments without altering other cohorts. For instance, to add 10 days to the end date of the acetaminophen cohort and 20 days to the diclofenac cohort we can do the following:
cdm$medications <- cdm$medications |>
padCohortDate(days = 10, cohortId = "acetaminophen") |>
padCohortDate(days = 20, cohortId = "diclofenac")