Skip to contents

This function takes an existing CDM reference (which can be empty) and a list of additional named tables to create a more complete mock CDM object. It ensures that all provided observations fit within their respective observation periods and that all individual records are consistent with the entries in the person table. This is useful for creating reliable and realistic healthcare data simulations for development and testing within the OMOP CDM framework.

Usage

mockCdmFromTables(cdm = mockCdmReference(), tables = list(), seed = NULL)

Arguments

cdm

A `cdm_reference` object, which serves as the base structure where all additional tables will be integrated. This parameter should already be initialized and can contain pre-existing standard or cohort-specific OMOP tables.

tables

A named list of data frames representing additional tables to be integrated into the CDM. These tables can include both standard OMOP tables such as 'drug_exposure' or 'condition_occurrence', as well as cohort-specific tables that are not part of the standard OMOP model but are necessary for specific analyses. Each table should be named according to its intended table name in the CDM structure.

seed

An optional integer that sets the seed for random number generation used in creating mock data entries. Setting a seed ensures that the generated mock data are reproducible across different runs of the function. If 'NULL', the seed is not set, leading to non-deterministic behavior in data generation.

Value

Returns the updated `cdm` object with all the new tables added and integrated, ensuring consistency across the observational periods and the person entries.

Examples

# \donttest{
library(omock)
library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union

# Create a mock cohort table
cohort <- tibble(
  cohort_definition_id = c(1, 1, 2, 2, 1, 3, 3, 3, 1, 3),
  subject_id = c(1, 4, 2, 3, 5, 5, 4, 3, 3, 1),
  cohort_start_date = as.Date(c(
    "2020-04-01", "2021-06-01", "2022-05-22", "2010-01-01", "2019-08-01",
    "2019-04-07", "2021-01-01", "2008-02-02", "2009-09-09", "2021-01-01"
  )),
  cohort_end_date = cohort_start_date
)

# Generate a mock CDM from preexisting CDM structure and cohort table
cdm <- mockCdmFromTables(cdm = mockCdmReference(), tables = list(cohort = cohort))
#> Warning: ! 7 column in person do not match expected column type:
#>  `person_id` is numeric but expected integer
#>  `gender_concept_id` is numeric but expected integer
#>  `year_of_birth` is numeric but expected integer
#>  `month_of_birth` is numeric but expected integer
#>  `race_concept_id` is numeric but expected integer
#>  `ethnicity_concept_id` is numeric but expected integer
#>  `location_id` is numeric but expected integer
#> Warning: ! 2 column in observation_period do not match expected column type:
#>  `person_id` is numeric but expected integer
#>  `period_type_concept_id` is numeric but expected integer
#> Warning: ! 9 column in cdm_source do not match expected column type:
#>  `cdm_source_abbreviation` is logical but expected character
#>  `cdm_holder` is logical but expected character
#>  `source_description` is logical but expected character
#>  `source_documentation_reference` is logical but expected character
#>  `cdm_etl_reference` is logical but expected character
#>  `source_release_date` is logical but expected date
#>  `cdm_release_date` is logical but expected date
#>  `cdm_version` is numeric but expected character
#>  `vocabulary_version` is logical but expected character
#> Warning: ! 3 column in concept do not match expected column type:
#>  `concept_id` is numeric but expected integer
#>  `valid_start_date` is character but expected date
#>  `valid_end_date` is character but expected date
#> Warning: ! 1 column in vocabulary do not match expected column type:
#>  `vocabulary_concept_id` is numeric but expected integer
#> Warning: ! 5 column in concept_relationship do not match expected column type:
#>  `concept_id_1` is numeric but expected integer
#>  `concept_id_2` is numeric but expected integer
#>  `valid_start_date` is logical but expected date
#>  `valid_end_date` is logical but expected date
#>  `invalid_reason` is logical but expected character
#> Warning: ! 2 column in concept_synonym do not match expected column type:
#>  `concept_id` is numeric but expected integer
#>  `language_concept_id` is numeric but expected integer
#> Warning: ! 4 column in concept_ancestor do not match expected column type:
#>  `ancestor_concept_id` is numeric but expected integer
#>  `descendant_concept_id` is numeric but expected integer
#>  `min_levels_of_separation` is numeric but expected integer
#>  `max_levels_of_separation` is numeric but expected integer
#> Warning: ! 9 column in drug_strength do not match expected column type:
#>  `drug_concept_id` is numeric but expected integer
#>  `ingredient_concept_id` is numeric but expected integer
#>  `amount_value` is logical but expected numeric
#>  `amount_unit_concept_id` is numeric but expected integer
#>  `numerator_unit_concept_id` is numeric but expected integer
#>  `denominator_unit_concept_id` is numeric but expected integer
#>  `box_size` is logical but expected integer
#>  `valid_start_date` is logical but expected date
#>  `valid_end_date` is logical but expected date
#> Warning: ! 2 column in cohort do not match expected column type:
#>  `cohort_definition_id` is numeric but expected integer
#>  `subject_id` is numeric but expected integer

# Access the newly integrated cohort table and the standard person table in the CDM
print(cdm$cohort)
#> # A tibble: 10 × 4
#>    cohort_definition_id subject_id cohort_start_date cohort_end_date
#>                   <dbl>      <dbl> <date>            <date>         
#>  1                    1          1 2020-04-01        2020-04-01     
#>  2                    1          4 2021-06-01        2021-06-01     
#>  3                    2          2 2022-05-22        2022-05-22     
#>  4                    2          3 2010-01-01        2010-01-01     
#>  5                    1          5 2019-08-01        2019-08-01     
#>  6                    3          5 2019-04-07        2019-04-07     
#>  7                    3          4 2021-01-01        2021-01-01     
#>  8                    3          3 2008-02-02        2008-02-02     
#>  9                    1          3 2009-09-09        2009-09-09     
#> 10                    3          1 2021-01-01        2021-01-01     
print(cdm$person)
#> # A tibble: 5 × 18
#>   person_id gender_concept_id year_of_birth month_of_birth day_of_birth
#> *     <dbl>             <dbl>         <dbl>          <dbl>        <int>
#> 1         1              8507          2011             11           18
#> 2         2              8532          2010             11            8
#> 3         3              8507          1989              7           24
#> 4         4              8507          2013              8           21
#> 5         5              8532          2012              5            9
#> # ℹ 13 more variables: birth_datetime <date>, race_concept_id <dbl>,
#> #   ethnicity_concept_id <dbl>, location_id <dbl>, person_source_value <chr>,
#> #   gender_source_value <chr>, gender_source_concept_id <int>,
#> #   race_source_value <chr>, race_source_concept_id <int>,
#> #   ethnicity_source_value <chr>, ethnicity_source_concept_id <int>,
#> #   provider_id <int>, care_site_id <int>
# }