Skip to contents

This comprises all the diagnostics that are being offered in this package, this includes:

  • A diagnostic on the OMOP CDM dataset as a whole via databaseDiagnostics.

  • A diagnostic on the codelists associated with cohorts via codelistDiagnostics.

  • A diagnostic on the cohort itself via cohortDiagnostics.

  • A diagnostic on the frequency of the cohort in the dataset population via populationDiagnostics.

Usage

phenotypeDiagnostics(
  cohort,
  databaseDiagnostics = list(),
  codelistDiagnostics = list(),
  cohortDiagnostics = list(),
  populationDiagnostics = list(),
  stagingDirectory = NULL
)

Arguments

cohort

Cohort table in a cdm reference

databaseDiagnostics

A list of arguments that uses `databaseDiagnostics`. If the list is empty, the default values will be used. Example: In the following example, all diagnostics will be run except *person table summary* from databaseDiagnostics: *databaseDiagnostics = list( "personTableSummary" = FALSE )

codelistDiagnostics

A list of arguments that uses `codelistDiagnostics`. If the list is empty, the default values will be used. Example: In the below example, all diagnostics will be run, and a subsample of 1,000 participants will be used to run measurement diagnostics and another independent subsample of 500 participants will be used to run drug diagnostics: *codelistDiagnostics = list( "measurementDiagnosticsSample" = 1000, "drugDiagnosticsSample" = 500 )

cohortDiagnostics

A list of arguments that uses `cohortDiagnostics`. If the list is empty, the default values will be used. Example: *cohortDiagnostics = list( "cohortSurvival" = TRUE )

populationDiagnostics

A list of arguments that uses `populationDiagnostics`. If the list is empty, the default values will be used. Example: In the below example, all diagnostics will be run and a subsample of 100,000 participants will be used to run populationDiagnostics. *populationDiagnostics = list( "populationSample" = 100000 )

stagingDirectory

Path to folder to save incremental results and log file

Value

A summarised result

Examples

# \donttest{
library(omock)
library(CohortConstructor)
library(PhenotypeR)

cdm <- mockCdmFromDataset(source = "duckdb")
#>  Loading bundled GiBleed tables from package data.
#>  Adding drug_strength table.
#>  Creating local <cdm_reference> object.
#>  Inserting <cdm_reference> into duckdb.
cdm$warfarin <- conceptCohort(cdm,
                              conceptSet =  list(warfarin = c(1310149L,
                                                              40163554L)),
                              name = "warfarin")
#>  Subsetting table drug_exposure using 2 concepts with domain: drug.
#>  Combining tables.
#>  Creating cohort attributes.
#>  Applying cohort requirements.
#>  Merging overlapping records.
#>  Cohort warfarin created.
result <- phenotypeDiagnostics(cdm$warfarin)
#> Logging PhenotypeR progress in
#> /tmp/RtmpnSJwH7/phenotypeDiagnostics_log_{date}_{time}1b4e77f4f51e.txt
#>  Creating log file:
#>   /tmp/RtmpnSJwH7/phenotypeDiagnostics_log_2026_04_29_12_48_361b4e77f4f51e.txt.
#> [2026-04-29 12:48:36] - Log file created
#> [2026-04-29 12:48:36] - Database diagnostics - getting CDM Snapshot
#> [2026-04-29 12:48:36] - Database diagnostics - summarising person table
#>  The following estimates will be calculated:
#>  date_of_birth: density
#> ! Table is collected to memory as not all requested estimates are supported on
#>   the database side
#> → Start summary of data, at 2026-04-29 12:48:40.173976
#>  Summary finished, at 2026-04-29 12:48:40.218652
#> [2026-04-29 12:48:40] - Database diagnostics - summarising observation period
#>  retrieving cdm object from cdm_table.
#> Warning: ! There are 2649 individuals not included in the person table.
#>  The following estimates will be calculated:
#>  observation_period_start_date: density
#>  observation_period_end_date: density
#> ! Table is collected to memory as not all requested estimates are supported on
#>   the database side
#> → Start summary of data, at 2026-04-29 12:48:43.070694
#>  Summary finished, at 2026-04-29 12:48:43.123774
#> [2026-04-29 12:48:43] - Codelist diagnostics - index event breakdown
#> Getting counts of warfarin codes for cohort warfarin
#> Warning: The CDM reference containing the cohort must also contain achilles tables.
#> Returning only index event breakdown.
#> [2026-04-29 12:48:45] - Cohort diagnostics - cohort attrition
#> [2026-04-29 12:48:46] - Cohort diagnostics - cohort count
#>  summarising data
#>  summarising cohort warfarin
#>  summariseCharacteristics finished!
#> → Skipping cohort sampling as all cohorts have less than 20000 individuals.
#> [2026-04-29 12:48:46] - Cohort diagnostics - matched cohorts
#> → Sampling cohort `tmp_022_sampled`
#> Returning entry cohort as the size of the cohorts to be sampled is equal or
#> smaller than `n`.
#>  Generating an age and sex matched cohort for warfarin
#> Starting matching
#>  Creating copy of target cohort.
#>  1 cohort to be matched.
#>  Creating controls cohorts.
#>  Excluding cases from controls
#>  Matching by gender_concept_id and year_of_birth
#>  Removing controls that were not in observation at index date
#>  Excluding target records whose pair is not in observation
#>  Adjusting ratio
#> Binding cohorts
#>  Done
#> → Getting cohorts and indexes
#> [2026-04-29 12:49:01] - Cohort diagnostics - cohort characteristics
#>  adding demographics columns
#>  adding tableIntersectCount 1/1
#> window names casted to snake_case:
#>  `-365 to -1` -> `365_to_1`
#>  summarising data
#>  summarising cohort warfarin
#>  summarising cohort warfarin_sampled
#>  summarising cohort warfarin_matched
#>  summariseCharacteristics finished!
#> [2026-04-29 12:49:06] - Cohort diagnostics - age density
#>  The following estimates will be calculated:
#>  age: density
#> ! Table is collected to memory as not all requested estimates are supported on
#>   the database side
#> → Start summary of data, at 2026-04-29 12:49:06.778898
#>  Summary finished, at 2026-04-29 12:49:06.872811
#> Using defaults for windows for large scale characteristics: c(-365, -31),
#> c(-30, -1), c(0, 0), c(1, 30), and c(31, 365). These can be changed via passing
#> alternative windows as a global option
#> `PhenotypeR_summariseLargeScaleCharacteristics_window`
#> Using defaults for event tables for large scale characteristics:
#> condition_occurrence, visit_occurrence, measurement, procedure_occurrence,
#> device_exposure, and observation. These can be changed via passing alternative
#> windows as a global option
#> `PhenotypeR_summariseLargeScaleCharacteristics_eventInWindow`
#> Using defaults for episode tables for large scale characteristics:
#> drug_exposure and drug_era. These can be changed via passing alternative
#> windows as a global option
#> `PhenotypeR_summariseLargeScaleCharacteristics_episodeInWindow`
#> [2026-04-29 12:49:07] - Cohort diagnostics - large scale characteristics
#>  Summarising large scale characteristics 
#>  - getting characteristics from table condition_occurrence (1 of 7)
#>  - getting characteristics from table condition_occurrence (1 of 7) for time wi…
#>  - getting characteristics from table condition_occurrence (1 of 7) for time wi…
#>  - getting characteristics from table condition_occurrence (1 of 7) for time wi…
#>  - getting characteristics from table condition_occurrence (1 of 7) for time wi…
#>  - getting characteristics from table condition_occurrence (1 of 7) for time wi…
#>  - getting characteristics from table visit_occurrence (2 of 7)
#>  - getting characteristics from table visit_occurrence (2 of 7) for time window…
#>  - getting characteristics from table visit_occurrence (2 of 7) for time window…
#>  - getting characteristics from table visit_occurrence (2 of 7) for time window…
#>  - getting characteristics from table visit_occurrence (2 of 7) for time window…
#>  - getting characteristics from table visit_occurrence (2 of 7) for time window…
#>  - getting characteristics from table measurement (3 of 7)
#>  - getting characteristics from table measurement (3 of 7) for time window -365…
#>  - getting characteristics from table measurement (3 of 7) for time window -30 …
#>  - getting characteristics from table measurement (3 of 7) for time window 0 an…
#>  - getting characteristics from table measurement (3 of 7) for time window 1 an…
#>  - getting characteristics from table measurement (3 of 7) for time window 31 a…
#>  - getting characteristics from table procedure_occurrence (4 of 7)
#>  - getting characteristics from table procedure_occurrence (4 of 7) for time wi…
#>  - getting characteristics from table procedure_occurrence (4 of 7) for time wi…
#>  - getting characteristics from table procedure_occurrence (4 of 7) for time wi…
#>  - getting characteristics from table procedure_occurrence (4 of 7) for time wi…
#>  - getting characteristics from table procedure_occurrence (4 of 7) for time wi…
#>  - getting characteristics from table observation (5 of 7)
#>  - getting characteristics from table observation (5 of 7) for time window -365…
#>  - getting characteristics from table observation (5 of 7) for time window -30 …
#>  - getting characteristics from table observation (5 of 7) for time window 0 an…
#>  - getting characteristics from table observation (5 of 7) for time window 1 an…
#>  - getting characteristics from table observation (5 of 7) for time window 31 a…
#>  - getting characteristics from table drug_exposure (6 of 7)
#>  - getting characteristics from table drug_exposure (6 of 7) for time window -3…
#>  - getting characteristics from table drug_exposure (6 of 7) for time window -3…
#>  - getting characteristics from table drug_exposure (6 of 7) for time window 0 …
#>  - getting characteristics from table drug_exposure (6 of 7) for time window 1 …
#>  - getting characteristics from table drug_exposure (6 of 7) for time window 31…
#>  - getting characteristics from table drug_era (7 of 7)
#>  - getting characteristics from table drug_era (7 of 7) for time window -365 an…
#>  - getting characteristics from table drug_era (7 of 7) for time window -30 and…
#>  - getting characteristics from table drug_era (7 of 7) for time window 0 and 0
#>  - getting characteristics from table drug_era (7 of 7) for time window 1 and 30
#>  - getting characteristics from table drug_era (7 of 7) for time window 31 and …
#> Formatting result
#> 236 estimates dropped as frequency less than 1%
#>  Summarising large scale characteristics
#>  Summarising large scale characteristics 
#>  - getting characteristics from table condition_occurrence (1 of 7)
#>  - getting characteristics from table condition_occurrence (1 of 7) for time wi…
#>  - getting characteristics from table condition_occurrence (1 of 7) for time wi…
#>  - getting characteristics from table condition_occurrence (1 of 7) for time wi…
#>  - getting characteristics from table condition_occurrence (1 of 7) for time wi…
#>  - getting characteristics from table condition_occurrence (1 of 7) for time wi…
#>  - getting characteristics from table visit_occurrence (2 of 7)
#>  - getting characteristics from table visit_occurrence (2 of 7) for time window…
#>  - getting characteristics from table visit_occurrence (2 of 7) for time window…
#>  - getting characteristics from table visit_occurrence (2 of 7) for time window…
#>  - getting characteristics from table visit_occurrence (2 of 7) for time window…
#>  - getting characteristics from table visit_occurrence (2 of 7) for time window…
#>  - getting characteristics from table measurement (3 of 7)
#>  - getting characteristics from table measurement (3 of 7) for time window -365…
#>  - getting characteristics from table measurement (3 of 7) for time window -30 …
#>  - getting characteristics from table measurement (3 of 7) for time window 0 an…
#>  - getting characteristics from table measurement (3 of 7) for time window 1 an…
#>  - getting characteristics from table measurement (3 of 7) for time window 31 a…
#>  - getting characteristics from table procedure_occurrence (4 of 7)
#>  - getting characteristics from table procedure_occurrence (4 of 7) for time wi…
#>  - getting characteristics from table procedure_occurrence (4 of 7) for time wi…
#>  - getting characteristics from table procedure_occurrence (4 of 7) for time wi…
#>  - getting characteristics from table procedure_occurrence (4 of 7) for time wi…
#>  - getting characteristics from table procedure_occurrence (4 of 7) for time wi…
#>  - getting characteristics from table observation (5 of 7)
#>  - getting characteristics from table observation (5 of 7) for time window -365…
#>  - getting characteristics from table observation (5 of 7) for time window -30 …
#>  - getting characteristics from table observation (5 of 7) for time window 0 an…
#>  - getting characteristics from table observation (5 of 7) for time window 1 an…
#>  - getting characteristics from table observation (5 of 7) for time window 31 a…
#>  - getting characteristics from table drug_exposure (6 of 7)
#>  - getting characteristics from table drug_exposure (6 of 7) for time window -3…
#>  - getting characteristics from table drug_exposure (6 of 7) for time window -3…
#>  - getting characteristics from table drug_exposure (6 of 7) for time window 0 …
#>  - getting characteristics from table drug_exposure (6 of 7) for time window 1 …
#>  - getting characteristics from table drug_exposure (6 of 7) for time window 31…
#>  - getting characteristics from table drug_era (7 of 7)
#>  - getting characteristics from table drug_era (7 of 7) for time window -365 an…
#>  - getting characteristics from table drug_era (7 of 7) for time window -30 and…
#>  - getting characteristics from table drug_era (7 of 7) for time window 0 and 0
#>  - getting characteristics from table drug_era (7 of 7) for time window 1 and 30
#>  - getting characteristics from table drug_era (7 of 7) for time window 31 and …
#> Formatting result
#> 236 estimates dropped as frequency less than 1%
#>  Summarising large scale characteristics
#> `cohort_sample` and `matched_sample` casted to character.
#> [2026-04-29 12:50:24] - Population diagnosics - denominator cohort
#> [2026-04-29 12:50:24] - Population diagnosics - sampling person table to 1e+05
#> people
#>  Creating denominator cohorts
#>  Cohorts created in 0 min and 6 sec
#> [2026-04-29 12:50:30] - Population diagnosics - incidence
#>  Getting incidence for analysis 1 of 7
#>  Getting incidence for analysis 2 of 7
#>  Getting incidence for analysis 3 of 7
#>  Getting incidence for analysis 4 of 7
#>  Getting incidence for analysis 5 of 7
#>  Getting incidence for analysis 6 of 7
#>  Getting incidence for analysis 7 of 7
#>  Overall time taken: 0 mins and 9 secs
#> [2026-04-29 12:50:40] - Population diagnosics - prevalence
#>  Getting prevalence for analysis 1 of 7
#>  Getting prevalence for analysis 2 of 7
#>  Getting prevalence for analysis 3 of 7
#>  Getting prevalence for analysis 4 of 7
#>  Getting prevalence for analysis 5 of 7
#>  Getting prevalence for analysis 6 of 7
#>  Getting prevalence for analysis 7 of 7
#>  Time taken: 0 mins and 4 secs
#> `populationDateStart`, `populationDateEnd`, and `populationSample` casted to
#> character.
#> `populationDateStart` and `populationDateEnd` eliminated from settings as all
#> elements are NA.
#> [2026-04-29 12:50:45] - Exporting log file

# }