Introduction
Most of the functionalities of omock are used to
build specific mock tables (e.g. mockPerson()
,
mockObservationPeriod()
, …), this allows the user to create
mock cdm objects combining all those functions with some room for
customisation. There are times where the user will want to create a mock
CDM reference from its own bespoke tables. The mockCdmFromTables()
function is designed to facilitates the creation of mock CDM reference
from bespoke tables. This functionality will be useful to create a mock
CDM from a cohort_table
or a drug_exposure
table, or with incomplete data (e.g. missing columns).
Create a mock cdm from a cohort table
For example if you want to create a CDM reference based on below bespoke cohorts. You can do it simple using the mockCdmFromTable() functions in a few lines of code.
# Define a list of user-defined cohort tables
cohortTables <- list(
cohort1 = tibble(
subject_id = 1:10L,
cohort_definition_id = rep(1L, 10),
cohort_start_date = as.Date("2020-01-01") + 1:10,
cohort_end_date = as.Date("2020-01-01") + 11:20
),
cohort2 = tibble(
subject_id = 11:20L,
cohort_definition_id = rep(2L, 10),
cohort_start_date = as.Date("2020-02-01") + 1:10,
cohort_end_date = as.Date("2020-02-01") + 11:20
)
)
# Create a mock CDM object from the user-defined tables
cdm <- mockCdmFromTables(tables = cohortTables)
cdm
#>
#> ── # OMOP CDM reference (local) of mock database ───────────────────────────────
#> • omop tables: cdm_source, concept, concept_ancestor, concept_relationship,
#> concept_synonym, drug_strength, observation_period, person, vocabulary
#> • cohort tables: cohort1, cohort2
#> • achilles tables: -
#> • other tables: -
The generated CDM object will build the person
,
observation_period
and vocabulary tables so that all the
cohorts are in observation:
cdm$cohort1 |>
addInObservation()
#> # A tibble: 10 × 5
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> * <int> <int> <date> <date>
#> 1 1 1 2020-01-02 2020-01-12
#> 2 1 2 2020-01-03 2020-01-13
#> 3 1 3 2020-01-04 2020-01-14
#> 4 1 4 2020-01-05 2020-01-15
#> 5 1 5 2020-01-06 2020-01-16
#> 6 1 6 2020-01-07 2020-01-17
#> 7 1 7 2020-01-08 2020-01-18
#> 8 1 8 2020-01-09 2020-01-19
#> 9 1 9 2020-01-10 2020-01-20
#> 10 1 10 2020-01-11 2020-01-21
#> # ℹ 1 more variable: in_observation <int>
cdm$observation_period
#> # A tibble: 20 × 5
#> person_id observation_period_s…¹ observation_period_e…² observation_period_id
#> * <int> <date> <date> <int>
#> 1 1 2019-06-19 2020-03-25 1
#> 2 2 2018-02-05 2020-10-09 2
#> 3 3 2016-06-11 2020-11-20 3
#> 4 4 2019-11-20 2021-01-03 4
#> 5 5 2018-05-07 2021-06-20 5
#> 6 6 2015-10-10 2020-02-24 6
#> 7 7 2017-10-20 2021-10-23 7
#> 8 8 2018-09-21 2023-11-03 8
#> 9 9 2018-06-13 2020-09-05 9
#> 10 10 2019-10-01 2021-05-13 10
#> 11 11 2018-01-07 2021-10-16 11
#> 12 12 2019-09-15 2020-02-19 12
#> 13 13 2016-04-05 2020-06-10 13
#> 14 14 2019-05-07 2020-02-24 14
#> 15 15 2018-06-26 2020-10-18 15
#> 16 16 2018-09-05 2020-07-18 16
#> 17 17 2018-09-10 2020-02-21 17
#> 18 18 2017-09-15 2020-05-20 18
#> 19 19 2019-11-15 2020-05-28 19
#> 20 20 2019-07-31 2020-04-25 20
#> # ℹ abbreviated names: ¹observation_period_start_date,
#> # ²observation_period_end_date
#> # ℹ 1 more variable: period_type_concept_id <int>
Create a mock CDM from drug_exposure
Now we will create a CDM around a drug_exposure
table,
this functionality is quite useful to obtain mock datasets for testing
purposes only specifying part of the information. In this case we will
partially define person
table to impose all individuals are
women:
person <- tibble(person_id = 1:5L, gender_concept_id = 8532L)
and we will also create the records of the drug_exposure
table:
drugExposure <- tibble(
person_id = rep(1:5L, 2),
drug_concept_id = 19073188L,
drug_exposure_start_date = rep(as.Date(c("2000-01-01", "2000-06-1")), each = 5),
drug_exposure_end_date = drug_exposure_start_date + c(10L, 20L, 100L, 140L, 30L, 50L, 30L, 20L, 45L, 35L)
)
Then mockCdmFromTables()
will populate the missing
columns with interpolated data and add all the tables necessary to
create a minimum viable CDM (it will contain at least
person
, observation_period
and the vocabulary
tables):
cdm <- mockCdmFromTables(tables = list(person = person, drug_exposure = drugExposure))
cdm
As before all the records of drug_exposure
will be in
observation:
cdm$drug_exposure |>
addInObservation() |>
group_by(in_observation) |>
tllay()