library(omock)
library(dplyr)
library(CodelistGenerator)
library(PatientProfiles)
library(CohortCharacteristics)
library(CohortConstructor)
library(omopgenerics)
cdm <- mockCdmFromDataset(
datasetName = "GiBleed",
source = "duckdb"
)9 Working with cohorts
9.1 Adding intersection variables
Studies using the OMOP CDM often start with creating patient cohorts. Typically we begin by identifying a target (exposure) cohort. We can then create various other cohorts, for example, to characterise the comorbidities of individuals at time of entry into the target cohort or to summarise the occurrence of a health outcome after entering the target cohort. These intersections between our target cohort and other cohorts (or even other tables in the OMOP CDM) can take many forms and will typically require temporal logic. The PatientProfiles R package addresses these challenges by providing a suite of flexible functions to support the calculation of intersections between our target cohorts and other cohorts, concept sets, or other OMOP CDM tables.
9.1.1 Intersections between cohorts
Suppose we are interested in studying patients with gastrointestinal (GI) bleeding and describing their use of different medicines. We will first create one cohort for patients with GI bleeding (our target cohort). Next we can create another cohort for patients with exposure to acetaminophen, celecoxib, and diclofenac. When creating these medication cohorts we will only create them for individuals in our GI bleeding cohort Below we create these cohorts using the GiBleed synthetic database (a characterisation of this dataset can be found here).
First we will load libraries and create a cdm reference for the dataset.
cdm
── # OMOP CDM reference (duckdb) of GiBleed ────────────────────────────────────
• omop tables: care_site, cdm_source, concept, concept_ancestor, concept_class,
concept_relationship, concept_synonym, condition_era, condition_occurrence,
cost, death, device_exposure, domain, dose_era, drug_era, drug_exposure,
drug_strength, fact_relationship, location, measurement, metadata, note,
note_nlp, observation, observation_period, payer_plan_period, person,
procedure_occurrence, provider, relationship, source_to_concept_map, specimen,
visit_detail, visit_occurrence, vocabulary
• cohort tables: -
• achilles tables: -
• other tables: -
Next we will define the codes used to identify our cohorts. With our small synthetic dataset we will only have a few relevant codes, whereas in a real study this would be a stage where time would be spent to ensure we were using the correct codes.
gi_bleed_codes <- list("gi_bleed" = 192671L) |> newCodelist()
gi_bleed_codes
- gi_bleed (1 codes)
medication_codes <- getDrugIngredientCodes(cdm,
name = c("acetaminophen", "celecoxib", "diclofenac"),
nameStyle = "{concept_name}"
)
medication_codes
- acetaminophen (7 codes)
- celecoxib (1 codes)
- diclofenac (1 codes)
Now we will create our cohorts. For our GI bleed cohort we will only include the first occurrence per person. For our medicines cohort we will include all events, collapsing records up to a week apart.
cdm$gi_bleed <- conceptCohort(
cdm = cdm,
conceptSet = gi_bleed_codes,
name = "gi_bleed",
exit = "event_start_date"
) |>
requireIsFirstEntry()
cdm$medicines <- conceptCohort(
cdm = cdm,
conceptSet = medication_codes,
name = "medicines",
exit = "event_end_date",
subsetCohort = "gi_bleed"
) |>
collapseCohorts(gap = 7)Now we have these two cohort tables we can use functions from the PatientProfiles to add variables summarising the intersection between them as either a flag, count, date, or number of days.
9.1.1.1 Flag
To get a binary indicator showing the presence of an intersection between the cohorts within a given time window, we can use addCohortIntersectFlag().
cdm$gi_bleed <- cdm$gi_bleed |>
addCohortIntersectFlag(
targetCohortTable = "medicines",
window = list(
"flag_prior" = c(-Inf, -1),
"flag_on_index" = c(0, 0),
"flag_post" = c(1, Inf)
)
)
cdm$gi_bleed |>
glimpse()Rows: ??
Columns: 13
Database: DuckDB 1.4.1 [unknown@Linux 6.11.0-1018-azure:R 4.4.1//tmp/RtmpLKMGhY/file421c3f456b83.duckdb]
$ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ subject_id <int> 2591, 3115, 3833, 320, 3958, 2966, 2088, 3…
$ cohort_start_date <date> 1974-09-16, 2006-05-28, 2002-01-27, 1995-…
$ cohort_end_date <date> 1974-09-16, 2006-05-28, 2002-01-27, 1995-…
$ acetaminophen_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_prior <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, …
$ acetaminophen_flag_post <dbl> 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, …
$ acetaminophen_flag_prior <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ celecoxib_flag_prior <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, …
$ celecoxib_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
Windows work very similarly to age groups that we have seen before. If a name is not provided, an automatic name will be obtained from the values of the window limits:
cdm$gi_bleed |>
addCohortIntersectFlag(
targetCohortTable = "medicines",
window = list(c(-Inf, -1), c(0, 0), c(1, Inf))
) |>
glimpse()Rows: ??
Columns: 22
Database: DuckDB 1.4.1 [unknown@Linux 6.11.0-1018-azure:R 4.4.1//tmp/RtmpLKMGhY/file421c3f456b83.duckdb]
$ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ subject_id <int> 2591, 3115, 3833, 320, 3958, 2966, 2088, 3…
$ cohort_start_date <date> 1974-09-16, 2006-05-28, 2002-01-27, 1995-…
$ cohort_end_date <date> 1974-09-16, 2006-05-28, 2002-01-27, 1995-…
$ acetaminophen_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_prior <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, …
$ acetaminophen_flag_post <dbl> 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, …
$ acetaminophen_flag_prior <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ celecoxib_flag_prior <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, …
$ celecoxib_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_minf_to_m1 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, …
$ acetaminophen_minf_to_m1 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ celecoxib_minf_to_m1 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, …
$ acetaminophen_0_to_0 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ acetaminophen_1_to_inf <dbl> 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, …
$ celecoxib_0_to_0 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_0_to_0 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_1_to_inf <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_1_to_inf <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
Note that to avoid conflicts with column naming, all names will be lower case, spaces are not allowed, and the - symbol for negative values is replaced by m. That’s why it is usually nice to provide your own custom names:
cdm$gi_bleed |>
addCohortIntersectFlag(
targetCohortTable = "medicines",
window = list("prior" = c(-Inf, -1), "on_index" = c(0, 0), "post" = c(1, Inf))
) |>
glimpse()Rows: ??
Columns: 22
Database: DuckDB 1.4.1 [unknown@Linux 6.11.0-1018-azure:R 4.4.1//tmp/RtmpLKMGhY/file421c3f456b83.duckdb]
$ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ subject_id <int> 2591, 3115, 3833, 320, 3958, 2966, 2088, 3…
$ cohort_start_date <date> 1974-09-16, 2006-05-28, 2002-01-27, 1995-…
$ cohort_end_date <date> 1974-09-16, 2006-05-28, 2002-01-27, 1995-…
$ acetaminophen_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_prior <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, …
$ acetaminophen_flag_post <dbl> 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, …
$ acetaminophen_flag_prior <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ celecoxib_flag_prior <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, …
$ celecoxib_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_prior <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, …
$ acetaminophen_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ acetaminophen_prior <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ celecoxib_prior <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, …
$ acetaminophen_post <dbl> 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, …
$ celecoxib_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
By default, the name of new columns is ‘{cohort_name}_{window_name}’ as we have seen in the prior examples. In some cases, you only have one variable to add and you might want to rename the column to something simpler. Or you just want some custom naming style. If this is the case, you can use the nameStyle argument to change the naming of the columns:
cdm$gi_bleed |>
addCohortIntersectFlag(
targetCohortTable = "medicines",
window = list("prior" = c(-Inf, -1), "index" = c(0, 0), "post" = c(1, Inf)),
nameStyle = "var_{window_name}_for_{cohort_name}"
) |>
glimpse()Rows: ??
Columns: 22
Database: DuckDB 1.4.1 [unknown@Linux 6.11.0-1018-azure:R 4.4.1//tmp/RtmpLKMGhY/file421c3f456b83.duckdb]
$ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ subject_id <int> 2591, 3115, 3833, 320, 3958, 2966, 2088, 3…
$ cohort_start_date <date> 1974-09-16, 2006-05-28, 2002-01-27, 1995-…
$ cohort_end_date <date> 1974-09-16, 2006-05-28, 2002-01-27, 1995-…
$ acetaminophen_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_prior <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, …
$ acetaminophen_flag_post <dbl> 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, …
$ acetaminophen_flag_prior <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ celecoxib_flag_prior <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, …
$ celecoxib_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ var_prior_for_diclofenac <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, …
$ var_post_for_acetaminophen <dbl> 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, …
$ var_prior_for_acetaminophen <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ var_prior_for_celecoxib <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, …
$ var_index_for_acetaminophen <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ var_index_for_celecoxib <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ var_index_for_diclofenac <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ var_post_for_celecoxib <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ var_post_for_diclofenac <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
If multiple windows are provided but ‘{window_name}’ is not included in nameStyle, then an error will prompt:
cdm$gi_bleed |>
addCohortIntersectFlag(
targetCohortTable = "medicines",
window = list("prior" = c(-Inf, -1), "index" = c(0, 0), "post" = c(1, Inf)),
nameStyle = "my_new_column_{cohort_name}"
) |>
glimpse()Error in `.addIntersect()`:
! The following elements are not present in nameStyle:
• {window_name}
Many functions that create new columns (usually functions that start with add*()) have this nameStyle functionality that allows you to control the naming of the new columns created.
9.1.1.2 Count
To get the count of occurrences of intersection between two cohorts, we can use addCohortIntersectCount():
cdm$gi_bleed <- cdm$gi_bleed |>
addCohortIntersectCount(
targetCohortTable = "medicines",
window = list(
"count_prior" = c(-Inf, -1),
"count_index" = c(0, 0),
"count_post" = c(1, Inf)
),
)
cdm$gi_bleed |>
glimpse()Rows: ??
Columns: 22
Database: DuckDB 1.4.1 [unknown@Linux 6.11.0-1018-azure:R 4.4.1//tmp/RtmpLKMGhY/file421c3f456b83.duckdb]
$ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ subject_id <int> 2265, 2332, 371, 4223, 3686, 4855, 280, 35…
$ cohort_start_date <date> 2002-06-16, 1996-12-04, 2008-05-27, 1989-…
$ cohort_end_date <date> 2002-06-16, 1996-12-04, 2008-05-27, 1989-…
$ acetaminophen_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_prior <dbl> 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, …
$ acetaminophen_flag_post <dbl> 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, …
$ acetaminophen_flag_prior <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ celecoxib_flag_prior <dbl> 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, …
$ celecoxib_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ acetaminophen_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_count_prior <dbl> 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, …
$ acetaminophen_count_prior <dbl> 5, 7, 2, 8, 4, 2, 2, 3, 5, 3, 4, 6, 1, 3, …
$ celecoxib_count_prior <dbl> 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, …
$ acetaminophen_count_post <dbl> 0, 1, 0, 2, 1, 0, 8, 5, 2, 3, 0, 0, 2, 1, …
$ celecoxib_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_count_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_count_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
Note that, by default, only intersections in the current observation period are considered.
The count and flag new columns can also have NA values meaning that the individual was not in observation in that window of interest. If we see individual 2070, they have 3748 days of future observation:
cdm$gi_bleed |>
filter(subject_id == 2070) |>
addFutureObservation() |>
glimpse()Rows: ??
Columns: 23
Database: DuckDB 1.4.1 [unknown@Linux 6.11.0-1018-azure:R 4.4.1//tmp/RtmpLKMGhY/file421c3f456b83.duckdb]
$ cohort_definition_id <int> 1
$ subject_id <int> 2070
$ cohort_start_date <date> 2008-08-15
$ cohort_end_date <date> 2008-08-15
$ acetaminophen_flag_on_index <dbl> 0
$ diclofenac_flag_prior <dbl> 0
$ acetaminophen_flag_post <dbl> 0
$ acetaminophen_flag_prior <dbl> 1
$ celecoxib_flag_prior <dbl> 1
$ celecoxib_flag_on_index <dbl> 0
$ diclofenac_flag_on_index <dbl> 0
$ celecoxib_flag_post <dbl> 0
$ diclofenac_flag_post <dbl> 0
$ acetaminophen_count_index <dbl> 0
$ diclofenac_count_prior <dbl> 0
$ acetaminophen_count_prior <dbl> 5
$ celecoxib_count_prior <dbl> 1
$ acetaminophen_count_post <dbl> 0
$ celecoxib_count_index <dbl> 0
$ diclofenac_count_index <dbl> 0
$ celecoxib_count_post <dbl> 0
$ diclofenac_count_post <dbl> 0
$ future_observation <int> 3748
Now we will perform the intersect with the following window of interest: c(2000, 3000), c(3000, 4000), c(4000, 5000).
cdm$gi_bleed |>
filter(subject_id == 2070) |>
addCohortIntersectCount(
targetCohortTable = "medicines",
window = list(c(2000, 3000), c(3000, 4000), c(4000, 5000)),
) |>
glimpse()Rows: ??
Columns: 31
Database: DuckDB 1.4.1 [unknown@Linux 6.11.0-1018-azure:R 4.4.1//tmp/RtmpLKMGhY/file421c3f456b83.duckdb]
$ cohort_definition_id <int> 1
$ subject_id <int> 2070
$ cohort_start_date <date> 2008-08-15
$ cohort_end_date <date> 2008-08-15
$ acetaminophen_flag_on_index <dbl> 0
$ diclofenac_flag_prior <dbl> 0
$ acetaminophen_flag_post <dbl> 0
$ acetaminophen_flag_prior <dbl> 1
$ celecoxib_flag_prior <dbl> 1
$ celecoxib_flag_on_index <dbl> 0
$ diclofenac_flag_on_index <dbl> 0
$ celecoxib_flag_post <dbl> 0
$ diclofenac_flag_post <dbl> 0
$ acetaminophen_count_index <dbl> 0
$ diclofenac_count_prior <dbl> 0
$ acetaminophen_count_prior <dbl> 5
$ celecoxib_count_prior <dbl> 1
$ acetaminophen_count_post <dbl> 0
$ celecoxib_count_index <dbl> 0
$ diclofenac_count_index <dbl> 0
$ celecoxib_count_post <dbl> 0
$ diclofenac_count_post <dbl> 0
$ acetaminophen_2000_to_3000 <dbl> 0
$ celecoxib_2000_to_3000 <dbl> 0
$ diclofenac_2000_to_3000 <dbl> 0
$ acetaminophen_3000_to_4000 <dbl> 0
$ celecoxib_3000_to_4000 <dbl> 0
$ diclofenac_3000_to_4000 <dbl> 0
$ acetaminophen_4000_to_5000 <dbl> NA
$ celecoxib_4000_to_5000 <dbl> NA
$ diclofenac_4000_to_5000 <dbl> NA
See that for the window 2000 to 3000, where the individual is still in observation, a 0 is reported. The same happens for the window 3000 to 4000 even if the individual does not have complete observation in the window. But for the last window, as the individual is not in observation at any point of the window and so NA is reported.
9.1.1.3 Date and times
To get the date of the intersection with a cohort within a given time window, we can use addCohortIntersectDate(). To get the number of days between the index date and intersection, we can use addCohortIntersectDays().
Both functions allow the order argument to specify which value to return:
firstreturns the first date/days that satisfy the windowlastreturns the last date/days that satisfy the window
cdm$gi_bleed <- cdm$gi_bleed |>
addCohortIntersectDate(
targetCohortTable = "medicines",
window = list("date_post" = c(1, Inf)),
order = "first"
)
cdm$gi_bleed |>
glimpse()Rows: ??
Columns: 25
Database: DuckDB 1.4.1 [unknown@Linux 6.11.0-1018-azure:R 4.4.1//tmp/RtmpLKMGhY/file421c3f456b83.duckdb]
$ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ subject_id <int> 2332, 4223, 3686, 280, 3573, 951, 2540, 25…
$ cohort_start_date <date> 1996-12-04, 1989-11-01, 2009-12-16, 1975-…
$ cohort_end_date <date> 1996-12-04, 1989-11-01, 2009-12-16, 1975-…
$ acetaminophen_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_prior <dbl> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, …
$ acetaminophen_flag_post <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ acetaminophen_flag_prior <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ celecoxib_flag_prior <dbl> 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, …
$ celecoxib_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ acetaminophen_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_count_prior <dbl> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, …
$ acetaminophen_count_prior <dbl> 7, 8, 4, 2, 3, 5, 3, 1, 3, 1, 5, 5, 5, 3, …
$ celecoxib_count_prior <dbl> 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, …
$ acetaminophen_count_post <dbl> 1, 2, 1, 8, 5, 2, 3, 2, 1, 2, 1, 3, 2, 2, …
$ celecoxib_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_count_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_count_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ acetaminophen_date_post <date> 1997-03-05, 1997-05-16, 2013-02-15, 1981-…
$ celecoxib_date_post <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ diclofenac_date_post <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
cdm$gi_bleed <- cdm$gi_bleed |>
addCohortIntersectDays(
targetCohortTable = "medicines",
window = list("days_prior" = c(-Inf, -1)),
order = "last"
)
cdm$gi_bleed |>
glimpse()Rows: ??
Columns: 28
Database: DuckDB 1.4.1 [unknown@Linux 6.11.0-1018-azure:R 4.4.1//tmp/RtmpLKMGhY/file421c3f456b83.duckdb]
$ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ subject_id <int> 2332, 4223, 3686, 280, 3573, 951, 2540, 25…
$ cohort_start_date <date> 1996-12-04, 1989-11-01, 2009-12-16, 1975-…
$ cohort_end_date <date> 1996-12-04, 1989-11-01, 2009-12-16, 1975-…
$ acetaminophen_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_prior <dbl> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, …
$ acetaminophen_flag_post <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ acetaminophen_flag_prior <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ celecoxib_flag_prior <dbl> 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, …
$ celecoxib_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ acetaminophen_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_count_prior <dbl> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, …
$ acetaminophen_count_prior <dbl> 7, 8, 4, 2, 3, 5, 3, 1, 3, 1, 5, 5, 5, 3, …
$ celecoxib_count_prior <dbl> 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, …
$ acetaminophen_count_post <dbl> 1, 2, 1, 8, 5, 2, 3, 2, 1, 2, 1, 3, 2, 2, …
$ celecoxib_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_count_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_count_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ acetaminophen_date_post <date> 1997-03-05, 1997-05-16, 2013-02-15, 1981-…
$ celecoxib_date_post <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ diclofenac_date_post <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ diclofenac_days_prior <dbl> NA, NA, NA, -26, NA, NA, -27, NA, NA, NA, …
$ acetaminophen_days_prior <dbl> -705, -3822, -3277, -4712, -2353, -657, -7…
$ celecoxib_days_prior <dbl> -38, -88, -47, NA, -82, -23, NA, -85, -16,…
Note that for the window in the future, we used order = "first" and for the window in the past, we used order = "last" as in both cases we wanted to get the intersection that was closer to the index date. Individuals with no intersection will have NA values in the newly created columns.
9.1.2 Intersections between cohorts and concept sets
Rather than creating medication cohorts, PatientProfiles allows us to also get intersections based directly on patient records using the medication concepts themselves. Here for example we add flag variables using the concepts for each medications. In this example we allow the intersection to use records out of observation which would not have been possible when using cohorts (as cohort entries must be, by definition, within observation). One thing to note though is that now we do not have any logic around collapsing medication records within a week of each other (which was done above when creating medication cohorts).
cdm$gi_bleed <- cdm$gi_bleed |>
addConceptIntersectFlag(
conceptSet = medication_codes,
window = list(
"cs_flag_prior" = c(-Inf, -1),
"cs_flag_on_index" = c(0, 0),
"cs_flag_post" = c(1, Inf)
),
inObservation = FALSE
)
cdm$gi_bleed |>
glimpse()Rows: ??
Columns: 37
Database: DuckDB 1.4.1 [unknown@Linux 6.11.0-1018-azure:R 4.4.1//tmp/RtmpLKMGhY/file421c3f456b83.duckdb]
$ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ subject_id <int> 2332, 4223, 3686, 280, 3573, 951, 2540,…
$ cohort_start_date <date> 1996-12-04, 1989-11-01, 2009-12-16, 19…
$ cohort_end_date <date> 1996-12-04, 1989-11-01, 2009-12-16, 19…
$ acetaminophen_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_prior <dbl> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, …
$ acetaminophen_flag_post <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ acetaminophen_flag_prior <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ celecoxib_flag_prior <dbl> 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, …
$ celecoxib_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ acetaminophen_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_count_prior <dbl> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, …
$ acetaminophen_count_prior <dbl> 7, 8, 4, 2, 3, 5, 3, 1, 3, 1, 5, 5, 5, …
$ celecoxib_count_prior <dbl> 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, …
$ acetaminophen_count_post <dbl> 1, 2, 1, 8, 5, 2, 3, 2, 1, 2, 1, 3, 2, …
$ celecoxib_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_count_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_count_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ acetaminophen_date_post <date> 1997-03-05, 1997-05-16, 2013-02-15, 19…
$ celecoxib_date_post <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ diclofenac_date_post <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ diclofenac_days_prior <dbl> NA, NA, NA, -26, NA, NA, -27, NA, NA, N…
$ acetaminophen_days_prior <dbl> -705, -3822, -3277, -4712, -2353, -657,…
$ celecoxib_days_prior <dbl> -38, -88, -47, NA, -82, -23, NA, -85, -…
$ acetaminophen_cs_flag_prior <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ celecoxib_cs_flag_prior <dbl> 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, …
$ acetaminophen_cs_flag_post <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ acetaminophen_cs_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_cs_flag_prior <dbl> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, …
$ celecoxib_cs_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_cs_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_cs_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_cs_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
We can also get counts based on concept sets. Here our counts will be of records rather than cohort entries and we should note that any overlapping or duplicate records will have been combined when making a cohort, and so counts of records can lead to quite a different result.
cdm$gi_bleed <- cdm$gi_bleed |>
addConceptIntersectCount(
conceptSet = medication_codes,
window = list(
"cs_count_prior" = c(-Inf, -1),
"cs_count_index" = c(0, 0),
"cs_count_post" = c(1, Inf)
),
)
cdm$gi_bleed |>
glimpse()Rows: ??
Columns: 46
Database: DuckDB 1.4.1 [unknown@Linux 6.11.0-1018-azure:R 4.4.1//tmp/RtmpLKMGhY/file421c3f456b83.duckdb]
$ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ subject_id <int> 2332, 4223, 3686, 280, 3573, 951, 2540,…
$ cohort_start_date <date> 1996-12-04, 1989-11-01, 2009-12-16, 19…
$ cohort_end_date <date> 1996-12-04, 1989-11-01, 2009-12-16, 19…
$ acetaminophen_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_prior <dbl> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, …
$ acetaminophen_flag_post <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ acetaminophen_flag_prior <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ celecoxib_flag_prior <dbl> 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, …
$ celecoxib_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ acetaminophen_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_count_prior <dbl> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, …
$ acetaminophen_count_prior <dbl> 7, 8, 4, 2, 3, 5, 3, 1, 3, 1, 5, 5, 5, …
$ celecoxib_count_prior <dbl> 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, …
$ acetaminophen_count_post <dbl> 1, 2, 1, 8, 5, 2, 3, 2, 1, 2, 1, 3, 2, …
$ celecoxib_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_count_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_count_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ acetaminophen_date_post <date> 1997-03-05, 1997-05-16, 2013-02-15, 19…
$ celecoxib_date_post <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ diclofenac_date_post <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ diclofenac_days_prior <dbl> NA, NA, NA, -26, NA, NA, -27, NA, NA, N…
$ acetaminophen_days_prior <dbl> -705, -3822, -3277, -4712, -2353, -657,…
$ celecoxib_days_prior <dbl> -38, -88, -47, NA, -82, -23, NA, -85, -…
$ acetaminophen_cs_flag_prior <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ celecoxib_cs_flag_prior <dbl> 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, …
$ acetaminophen_cs_flag_post <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ acetaminophen_cs_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_cs_flag_prior <dbl> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, …
$ celecoxib_cs_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_cs_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_cs_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_cs_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ acetaminophen_cs_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ acetaminophen_cs_count_prior <dbl> 7, 8, 4, 2, 3, 5, 3, 1, 3, 1, 5, 5, 5, …
$ celecoxib_cs_count_prior <dbl> 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, …
$ acetaminophen_cs_count_post <dbl> 1, 2, 1, 8, 5, 2, 3, 2, 1, 2, 1, 3, 3, …
$ diclofenac_cs_count_prior <dbl> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, …
$ celecoxib_cs_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_cs_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_cs_count_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_cs_count_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
9.1.3 Intersections between cohorts and clinical tables
Sometimes we might also want to get the intersection between a cohort and another OMOP CDM table. PatientProfiles also includes several addTableIntersect* functions to obtain intersection flags, counts, days, or dates between a cohort and clinical tables. These are analogous to the ones we’ve seen above for intersections with cohort tables.
As an example, say we want to get the number of visit occurrence records for individuals in the cohort, we can then look for an intersection with the visit_occurrence table:
cdm$gi_bleed <- cdm$gi_bleed |>
addTableIntersectCount(
tableName = "visit_occurrence",
window = list(c(-Inf, -1)),
nameStyle = "count_piror_visits"
)
cdm$gi_bleed |>
glimpse()Rows: ??
Columns: 47
Database: DuckDB 1.4.1 [unknown@Linux 6.11.0-1018-azure:R 4.4.1//tmp/RtmpLKMGhY/file421c3f456b83.duckdb]
$ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ subject_id <int> 2332, 4223, 3686, 280, 3573, 951, 2540,…
$ cohort_start_date <date> 1996-12-04, 1989-11-01, 2009-12-16, 19…
$ cohort_end_date <date> 1996-12-04, 1989-11-01, 2009-12-16, 19…
$ acetaminophen_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_prior <dbl> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, …
$ acetaminophen_flag_post <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ acetaminophen_flag_prior <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ celecoxib_flag_prior <dbl> 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, …
$ celecoxib_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ acetaminophen_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_count_prior <dbl> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, …
$ acetaminophen_count_prior <dbl> 7, 8, 4, 2, 3, 5, 3, 1, 3, 1, 5, 5, 3, …
$ celecoxib_count_prior <dbl> 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, …
$ acetaminophen_count_post <dbl> 1, 2, 1, 8, 5, 2, 3, 2, 1, 2, 1, 2, 2, …
$ celecoxib_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_count_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_count_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ acetaminophen_date_post <date> 1997-03-05, 1997-05-16, 2013-02-15, 19…
$ celecoxib_date_post <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ diclofenac_date_post <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ diclofenac_days_prior <dbl> NA, NA, NA, -26, NA, NA, -27, NA, NA, N…
$ acetaminophen_days_prior <dbl> -705, -3822, -3277, -4712, -2353, -657,…
$ celecoxib_days_prior <dbl> -38, -88, -47, NA, -82, -23, NA, -85, -…
$ acetaminophen_cs_flag_prior <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ celecoxib_cs_flag_prior <dbl> 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, …
$ acetaminophen_cs_flag_post <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ acetaminophen_cs_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_cs_flag_prior <dbl> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, …
$ celecoxib_cs_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_cs_flag_on_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_cs_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_cs_flag_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ acetaminophen_cs_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ acetaminophen_cs_count_prior <dbl> 7, 8, 4, 2, 3, 5, 3, 1, 3, 1, 5, 5, 3, …
$ celecoxib_cs_count_prior <dbl> 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, …
$ acetaminophen_cs_count_post <dbl> 1, 2, 1, 8, 5, 2, 3, 2, 1, 2, 1, 3, 2, …
$ diclofenac_cs_count_prior <dbl> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, …
$ celecoxib_cs_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_cs_count_index <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ celecoxib_cs_count_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ diclofenac_cs_count_post <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ count_piror_visits <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
We can see that by using various functions from PatientProfiles we can add numerous variables that we can later use in our analysis. From this we can continue on to then work with this single tidy table to perform various analyses.
9.2 Cohort summaries
The PatientProfiles package provides maximum flexibility, allowing us to add variables of interest for us to use in further analyses. If we simply want a summary of these variables we can use the CohortCharacteristics package, which uses PatientProfiles behind the scenes to add these variables and then calculates various summary statistics for them. CohortCharacteristics also provides functions to visualise these results.
Below we can see how we can generate a summary table of the characteristics of patients in our study cohort, along with summary statistics from our intersecting of interest.
chars <- cdm$gi_bleed |>
summariseCharacteristics(
demographics = TRUE,
cohortIntersectFlag = list(
"flag_prior" = list(
targetCohortTable = "medicines",
window = c(-Inf, -1)
),
"flag_on_index" = list(
targetCohortTable = "medicines",
window = c(0, 0)
),
"flag_post" = list(
targetCohortTable = "medicines",
window = c(1, Inf)
)
),
cohortIntersectCount = list(
"count_prior" = list(
targetCohortTable = "medicines",
window = c(-Inf, -1)
),
"count_on_index" = list(
targetCohortTable = "medicines",
window = c(0, 0)
),
"count_post" = list(
targetCohortTable = "medicines",
window = c(1, Inf)
)
),
cohortIntersectDays = list(
"days_prior" = list(
targetCohortTable = "medicines",
window = c(-Inf, -1),
order = "last"
),
"days_post" = list(
targetCohortTable = "medicines",
window = c(1, Inf),
order = "first"
)
),
tableIntersectCount = list("piror_visits" = list(
tableName = "visit_occurrence",
window = list(c(-Inf, -1))
))
)
tableCharacteristics(chars)
CDM name
|
|||
|---|---|---|---|
GiBleed
|
|||
| Variable name | Variable level | Estimate name |
Cohort name
|
| gi_bleed | |||
| Number records | - | N | 479 |
| Number subjects | - | N | 479 |
| Cohort start date | - | Median [Q25 - Q75] | 2000-08-07 [1988-03-27 - 2009-11-12] |
| Range | 1944-01-20 to 2019-05-25 | ||
| Cohort end date | - | Median [Q25 - Q75] | 2000-08-07 [1988-03-27 - 2009-11-12] |
| Range | 1944-01-20 to 2019-05-25 | ||
| Age | - | Median [Q25 - Q75] | 38 [36 - 41] |
| Mean (SD) | 38.41 (3.29) | ||
| Range | 31 to 46 | ||
| Sex | Female | N (%) | 242 (50.52%) |
| Male | N (%) | 237 (49.48%) | |
| Prior observation | - | Median [Q25 - Q75] | 14,201 [13,249 - 15,128] |
| Mean (SD) | 14,215.89 (1,191.31) | ||
| Range | 11,572 to 16,833 | ||
| Future observation | - | Median [Q25 - Q75] | 6,571 [3,257 - 10,248] |
| Mean (SD) | 7,395.70 (5,470.03) | ||
| Range | 1 to 27,318 | ||
| Days in cohort | - | Median [Q25 - Q75] | 1 [1 - 1] |
| Mean (SD) | 1.00 (0.00) | ||
| Range | 1 to 1 | ||
| Piror visits | - | Median [Q25 - Q75] | 0.00 [0.00 - 0.00] |
| Mean (SD) | 0.11 (0.32) | ||
| Range | 0.00 to 2.00 | ||
| Flag prior | Celecoxib | N (%) | 355 (74.11%) |
| Acetaminophen | N (%) | 467 (97.49%) | |
| Diclofenac | N (%) | 124 (25.89%) | |
| Flag on index | Acetaminophen | N (%) | 1 (0.21%) |
| Celecoxib | N (%) | 0 (0.00%) | |
| Diclofenac | N (%) | 0 (0.00%) | |
| Flag post | Acetaminophen | N (%) | 315 (65.76%) |
| Celecoxib | N (%) | 0 (0.00%) | |
| Diclofenac | N (%) | 0 (0.00%) | |
| Count prior | Diclofenac | Median [Q25 - Q75] | 0.00 [0.00 - 1.00] |
| Mean (SD) | 0.26 (0.44) | ||
| Range | 0.00 to 1.00 | ||
| Acetaminophen | Median [Q25 - Q75] | 3.00 [2.00 - 5.00] | |
| Mean (SD) | 3.49 (1.86) | ||
| Range | 0.00 to 9.00 | ||
| Celecoxib | Median [Q25 - Q75] | 1.00 [0.00 - 1.00] | |
| Mean (SD) | 0.74 (0.44) | ||
| Range | 0.00 to 1.00 | ||
| Count on index | Acetaminophen | Median [Q25 - Q75] | 0.00 [0.00 - 0.00] |
| Mean (SD) | 0.00 (0.05) | ||
| Range | 0.00 to 1.00 | ||
| Celecoxib | Median [Q25 - Q75] | 0.00 [0.00 - 0.00] | |
| Mean (SD) | 0.00 (0.00) | ||
| Range | 0.00 to 0.00 | ||
| Diclofenac | Median [Q25 - Q75] | 0.00 [0.00 - 0.00] | |
| Mean (SD) | 0.00 (0.00) | ||
| Range | 0.00 to 0.00 | ||
| Count post | Acetaminophen | Median [Q25 - Q75] | 1.00 [0.00 - 2.00] |
| Mean (SD) | 1.58 (1.85) | ||
| Range | 0.00 to 12.00 | ||
| Celecoxib | Median [Q25 - Q75] | 0.00 [0.00 - 0.00] | |
| Mean (SD) | 0.00 (0.00) | ||
| Range | 0.00 to 0.00 | ||
| Diclofenac | Median [Q25 - Q75] | 0.00 [0.00 - 0.00] | |
| Mean (SD) | 0.00 (0.00) | ||
| Range | 0.00 to 0.00 | ||
| Days prior | Celecoxib | Median [Q25 - Q75] | -48.00 [-69.50 - -25.00] |
| Mean (SD) | -47.92 (24.66) | ||
| Range | -89.00 to -5.00 | ||
| Diclofenac | Median [Q25 - Q75] | -45.50 [-67.00 - -20.75] | |
| Mean (SD) | -44.27 (25.46) | ||
| Range | -89.00 to -5.00 | ||
| Acetaminophen | Median [Q25 - Q75] | -3,159.00 [-6,533.00 - -1,481.00] | |
| Mean (SD) | -4,299.63 (3,548.06) | ||
| Range | -15,070.00 to -13.00 | ||
| Days post | Acetaminophen | Median [Q25 - Q75] | 2,336.00 [1,094.50 - 4,364.50] |
| Mean (SD) | 3,024.38 (2,614.80) | ||
| Range | 10.00 to 13,716.00 | ||
| Celecoxib | Median [Q25 - Q75] | - | |
| Mean (SD) | - | ||
| Range | - | ||
| Diclofenac | Median [Q25 - Q75] | - | |
| Mean (SD) | - | ||
| Range | - | ||
We can see with a relatively small amount of code we were able to generate a detailed summary of our GI bleed cohort and their use of medicines of interest.
9.3 Disconnecting
Once we have finished our analysis we can close our connection to the database behind our cdm reference.
cdmDisconnect(cdm)9.4 Further reading
Català M, Guo Y, Du M, Lopez-Guell K, Burn E, Mercade-Besora N (2025). PatientProfiles: Identify Characteristics of Patients in the OMOP Common Data. R package version 1.4.4, https://darwin-eu.github.io/PatientProfiles.
Català M, Guo Y, Lopez-Guell K, Burn E, Mercade-Besora N, Alcalde M (2025). CohortCharacteristics: Summarise and Visualise Characteristics of Patients in the OMOP CDM. R package version 1.0.0, https://darwin-eu.github.io/CohortCharacteristics.