Skip to contents

This comprises all the diagnostics that are being offered in this package, this includes:

* A diagnostics on the database via `databaseDiagnostics`. * A diagnostics on the cohort_codelist attribute of the cohort via `codelistDiagnostics`. * A diagnostics on the cohort via `cohortDiagnostics`. * A diagnostics on the population via `populationDiagnostics`.

Usage

phenotypeDiagnostics(
  cohort,
  diagnostics = c("databaseDiagnostics", "codelistDiagnostics", "cohortDiagnostics",
    "populationDiagnostics"),
  measurementSample = 20000,
  survival = FALSE,
  cohortSample = 20000,
  matchedSample = 1000,
  populationSample = 1e+06,
  populationDateRange = as.Date(c(NA, NA))
)

Arguments

cohort

Cohort table in a cdm reference

diagnostics

Vector indicating which diagnostics to perform. Options include: `databaseDiagnostics`, `codelistDiagnostics`, `cohortDiagnostics`, and `populationDiagnostics`.

measurementSample

The number of people to take a random sample for measurement diagnostics. If `measurementSample = NULL`, no sampling will be performed. If `measurementSample = 0` measurement diagnostics will not be run.

survival

Boolean variable. Whether to conduct survival analysis (TRUE) or not (FALSE).

cohortSample

The number of people to take a random sample for cohortDiagnostics. If `cohortSample = NULL`, no sampling will be performed.

matchedSample

The number of people to take a random sample for matching. If `matchedSample = NULL`, no sampling will be performed. If `matchedSample = 0`, no matched cohorts will be created.

populationSample

Number of people from the cdm to sample. If NULL no sampling will be performed. Sample will be within populationDateRange if specified.

populationDateRange

Two dates. The first indicating the earliest cohort start date and the second indicating the latest possible cohort end date. If NULL or the first date is set as missing, the earliest observation_start_date in the observation_period table will be used for the former. If NULL or the second date is set as missing, the latest observation_end_date in the observation_period table will be used for the latter.

Value

A summarised result

Examples

# \donttest{
library(omock)
library(CohortConstructor)
library(PhenotypeR)

cdm <- mockCdmFromDataset(source = "duckdb")
#>  Reading GiBleed tables.
#>  Adding drug_strength table.
#>  Creating local <cdm_reference> object.
#>  Inserting <cdm_reference> into duckdb.
cdm$warfarin <- conceptCohort(cdm,
                              conceptSet =  list(warfarin = c(1310149L,
                                                              40163554L)),
                              name = "warfarin")
#>  Subsetting table drug_exposure using 2 concepts with domain: drug.
#>  Combining tables.
#>  Creating cohort attributes.
#>  Applying cohort requirements.
#>  Merging overlapping records.
#>  Cohort warfarin created.

result <- phenotypeDiagnostics(cdm$warfarin)
#>  Creating log file:
#>   /tmp/Rtmp1awHhK/phenotypeDiagnostics_log_2026_01_20_10_15_041eec52da1a5e.txt.
#> [2026-01-20 10:15:04] - Log file created
#> [2026-01-20 10:15:04] - Phenotype diagnostics - input validation
#> [2026-01-20 10:15:04] - Database diagnostics - input validation
#> [2026-01-20 10:15:04] - Database diagnostics - getting CDM Snapshot
#> [2026-01-20 10:15:04] - Database diagnostics - summarising person table
#>  The following estimates will be computed:
#>  date_of_birth: density
#> ! Table is collected to memory as not all requested estimates are supported on
#>   the database side
#> → Start summary of data, at 2026-01-20 10:15:09.178558
#>  Summary finished, at 2026-01-20 10:15:09.255892
#> [2026-01-20 10:15:09] - Database diagnostics - summarising observation period
#>  retrieving cdm object from cdm_table.
#> Warning: ! There are 2649 individuals not included in the person table.
#>  The following estimates will be computed:
#>  observation_period_start_date: density
#>  observation_period_end_date: density
#> ! Table is collected to memory as not all requested estimates are supported on
#>   the database side
#> → Start summary of data, at 2026-01-20 10:15:12.722002
#>  Summary finished, at 2026-01-20 10:15:12.819899
#> [2026-01-20 10:15:13] - Database diagnostics - summarising clinical tables -
#> summary
#>  Adding variables of interest to drug_exposure.
#>  Summarising records per person in drug_exposure.
#>  Summarising subjects not in person table in drug_exposure.
#>  Summarising records in observation in drug_exposure.
#>  Summarising records with start before birth date in drug_exposure.
#>  Summarising records with end date before start date in drug_exposure.
#>  Summarising domains in drug_exposure.
#>  Summarising standard concepts in drug_exposure.
#>  Summarising source vocabularies in drug_exposure.
#>  Summarising concept types in drug_exposure.
#>  Summarising concept class in drug_exposure.
#>  Summarising missing data in drug_exposure.
#> [2026-01-20 10:15:17] - Database diagnostics - summarising clinical tables -
#> trends
#> [2026-01-20 10:15:17] - Codelist diagnostics - input validation
#> [2026-01-20 10:15:18] - Codelist diagnostics - index event breakdown
#> Getting counts of warfarin codes for cohort warfarin
#> Warning: The CDM reference containing the cohort must also contain achilles tables.
#> Returning only index event breakdown.
#> [2026-01-20 10:15:19] - Cohort diagnostics - input validation
#> [2026-01-20 10:15:19] - Cohort diagnostics - cohort attrition
#> [2026-01-20 10:15:20] - Cohort diagnostics - cohort count
#>  summarising data
#>  summarising cohort warfarin
#>  summariseCharacteristics finished!
#> → Skipping cohort sampling as all cohorts have less than 20000 individuals.
#> [2026-01-20 10:15:20] - Cohort diagnostics - matched cohorts
#> → Sampling cohort `tmp_024_sampled`
#> Returning entry cohort as the size of the cohorts to be sampled is equal or
#> smaller than `n`.
#>  Generating an age and sex matched cohort for warfarin
#> Starting matching
#>  Creating copy of target cohort.
#>  1 cohort to be matched.
#>  Creating controls cohorts.
#>  Excluding cases from controls
#>  Matching by gender_concept_id and year_of_birth
#>  Removing controls that were not in observation at index date
#>  Excluding target records whose pair is not in observation
#>  Adjusting ratio
#> Binding cohorts
#>  Done
#> → Getting cohorts and indexes
#> [2026-01-20 10:15:32] - Cohort diagnostics - cohort characteristics
#>  adding demographics columns
#>  adding tableIntersectCount 1/1
#> window names casted to snake_case:
#>  `-365 to -1` -> `365_to_1`
#>  summarising data
#>  summarising cohort warfarin
#>  summarising cohort warfarin_sampled
#>  summarising cohort warfarin_matched
#>  summariseCharacteristics finished!
#> [2026-01-20 10:15:37] - Cohort diagnostics - age density
#>  The following estimates will be computed:
#>  age: density
#> ! Table is collected to memory as not all requested estimates are supported on
#>   the database side
#> → Start summary of data, at 2026-01-20 10:15:38.291645
#>  Summary finished, at 2026-01-20 10:15:38.4354
#> [2026-01-20 10:15:38] - Cohort diagnostics - large scale characteristics
#>  Summarising large scale characteristics 
#>  - getting characteristics from table condition_occurrence (1 of 8)
#>  - getting characteristics from table condition_occurrence (1 of 8) for time wi…
#>  - getting characteristics from table condition_occurrence (1 of 8) for time wi…
#>  - getting characteristics from table condition_occurrence (1 of 8) for time wi…
#>  - getting characteristics from table condition_occurrence (1 of 8) for time wi…
#>  - getting characteristics from table condition_occurrence (1 of 8) for time wi…
#>  - getting characteristics from table condition_occurrence (1 of 8) for time wi…
#>  - getting characteristics from table condition_occurrence (1 of 8) for time wi…
#>  - getting characteristics from table visit_occurrence (2 of 8)
#>  - getting characteristics from table visit_occurrence (2 of 8) for time window…
#>  - getting characteristics from table visit_occurrence (2 of 8) for time window…
#>  - getting characteristics from table visit_occurrence (2 of 8) for time window…
#>  - getting characteristics from table visit_occurrence (2 of 8) for time window…
#>  - getting characteristics from table visit_occurrence (2 of 8) for time window…
#>  - getting characteristics from table visit_occurrence (2 of 8) for time window…
#>  - getting characteristics from table visit_occurrence (2 of 8) for time window…
#>  - getting characteristics from table measurement (3 of 8)
#>  - getting characteristics from table measurement (3 of 8) for time window -Inf…
#>  - getting characteristics from table measurement (3 of 8) for time window -365…
#>  - getting characteristics from table measurement (3 of 8) for time window -30 …
#>  - getting characteristics from table measurement (3 of 8) for time window 0 an…
#>  - getting characteristics from table measurement (3 of 8) for time window 1 an…
#>  - getting characteristics from table measurement (3 of 8) for time window 31 a…
#>  - getting characteristics from table measurement (3 of 8) for time window 366 …
#>  - getting characteristics from table procedure_occurrence (4 of 8)
#>  - getting characteristics from table procedure_occurrence (4 of 8) for time wi…
#>  - getting characteristics from table procedure_occurrence (4 of 8) for time wi…
#>  - getting characteristics from table procedure_occurrence (4 of 8) for time wi…
#>  - getting characteristics from table procedure_occurrence (4 of 8) for time wi…
#>  - getting characteristics from table procedure_occurrence (4 of 8) for time wi…
#>  - getting characteristics from table procedure_occurrence (4 of 8) for time wi…
#>  - getting characteristics from table procedure_occurrence (4 of 8) for time wi…
#>  - getting characteristics from table device_exposure (5 of 8)
#>  - getting characteristics from table device_exposure (5 of 8) for time window …
#>  - getting characteristics from table device_exposure (5 of 8) for time window …
#>  - getting characteristics from table device_exposure (5 of 8) for time window …
#>  - getting characteristics from table device_exposure (5 of 8) for time window …
#>  - getting characteristics from table device_exposure (5 of 8) for time window …
#>  - getting characteristics from table device_exposure (5 of 8) for time window …
#>  - getting characteristics from table device_exposure (5 of 8) for time window …
#>  - getting characteristics from table observation (6 of 8)
#>  - getting characteristics from table observation (6 of 8) for time window -Inf…
#>  - getting characteristics from table observation (6 of 8) for time window -365…
#>  - getting characteristics from table observation (6 of 8) for time window -30 …
#>  - getting characteristics from table observation (6 of 8) for time window 0 an…
#>  - getting characteristics from table observation (6 of 8) for time window 1 an…
#>  - getting characteristics from table observation (6 of 8) for time window 31 a…
#>  - getting characteristics from table observation (6 of 8) for time window 366 …
#>  - getting characteristics from table drug_exposure (7 of 8)
#>  - getting characteristics from table drug_exposure (7 of 8) for time window -I…
#>  - getting characteristics from table drug_exposure (7 of 8) for time window -3…
#>  - getting characteristics from table drug_exposure (7 of 8) for time window -3…
#>  - getting characteristics from table drug_exposure (7 of 8) for time window 0 …
#>  - getting characteristics from table drug_exposure (7 of 8) for time window 1 …
#>  - getting characteristics from table drug_exposure (7 of 8) for time window 31…
#>  - getting characteristics from table drug_exposure (7 of 8) for time window 36…
#>  - getting characteristics from table drug_era (8 of 8)
#>  - getting characteristics from table drug_era (8 of 8) for time window -Inf an…
#>  - getting characteristics from table drug_era (8 of 8) for time window -365 an…
#>  - getting characteristics from table drug_era (8 of 8) for time window -30 and…
#>  - getting characteristics from table drug_era (8 of 8) for time window 0 and 0
#>  - getting characteristics from table drug_era (8 of 8) for time window 1 and 30
#>  - getting characteristics from table drug_era (8 of 8) for time window 31 and …
#>  - getting characteristics from table drug_era (8 of 8) for time window 366 and…
#> Formatting result
#> 415 estimates dropped as frequency less than 1%
#>  Summarising large scale characteristics
#>  Summarising large scale characteristics 
#>  - getting characteristics from table condition_occurrence (1 of 8)
#>  - getting characteristics from table condition_occurrence (1 of 8) for time wi…
#>  - getting characteristics from table condition_occurrence (1 of 8) for time wi…
#>  - getting characteristics from table condition_occurrence (1 of 8) for time wi…
#>  - getting characteristics from table condition_occurrence (1 of 8) for time wi…
#>  - getting characteristics from table condition_occurrence (1 of 8) for time wi…
#>  - getting characteristics from table condition_occurrence (1 of 8) for time wi…
#>  - getting characteristics from table condition_occurrence (1 of 8) for time wi…
#>  - getting characteristics from table visit_occurrence (2 of 8)
#>  - getting characteristics from table visit_occurrence (2 of 8) for time window…
#>  - getting characteristics from table visit_occurrence (2 of 8) for time window…
#>  - getting characteristics from table visit_occurrence (2 of 8) for time window…
#>  - getting characteristics from table visit_occurrence (2 of 8) for time window…
#>  - getting characteristics from table visit_occurrence (2 of 8) for time window…
#>  - getting characteristics from table visit_occurrence (2 of 8) for time window…
#>  - getting characteristics from table visit_occurrence (2 of 8) for time window…
#>  - getting characteristics from table measurement (3 of 8)
#>  - getting characteristics from table measurement (3 of 8) for time window -Inf…
#>  - getting characteristics from table measurement (3 of 8) for time window -365…
#>  - getting characteristics from table measurement (3 of 8) for time window -30 …
#>  - getting characteristics from table measurement (3 of 8) for time window 0 an…
#>  - getting characteristics from table measurement (3 of 8) for time window 1 an…
#>  - getting characteristics from table measurement (3 of 8) for time window 31 a…
#>  - getting characteristics from table measurement (3 of 8) for time window 366 …
#>  - getting characteristics from table procedure_occurrence (4 of 8)
#>  - getting characteristics from table procedure_occurrence (4 of 8) for time wi…
#>  - getting characteristics from table procedure_occurrence (4 of 8) for time wi…
#>  - getting characteristics from table procedure_occurrence (4 of 8) for time wi…
#>  - getting characteristics from table procedure_occurrence (4 of 8) for time wi…
#>  - getting characteristics from table procedure_occurrence (4 of 8) for time wi…
#>  - getting characteristics from table procedure_occurrence (4 of 8) for time wi…
#>  - getting characteristics from table procedure_occurrence (4 of 8) for time wi…
#>  - getting characteristics from table device_exposure (5 of 8)
#>  - getting characteristics from table device_exposure (5 of 8) for time window …
#>  - getting characteristics from table device_exposure (5 of 8) for time window …
#>  - getting characteristics from table device_exposure (5 of 8) for time window …
#>  - getting characteristics from table device_exposure (5 of 8) for time window …
#>  - getting characteristics from table device_exposure (5 of 8) for time window …
#>  - getting characteristics from table device_exposure (5 of 8) for time window …
#>  - getting characteristics from table device_exposure (5 of 8) for time window …
#>  - getting characteristics from table observation (6 of 8)
#>  - getting characteristics from table observation (6 of 8) for time window -Inf…
#>  - getting characteristics from table observation (6 of 8) for time window -365…
#>  - getting characteristics from table observation (6 of 8) for time window -30 …
#>  - getting characteristics from table observation (6 of 8) for time window 0 an…
#>  - getting characteristics from table observation (6 of 8) for time window 1 an…
#>  - getting characteristics from table observation (6 of 8) for time window 31 a…
#>  - getting characteristics from table observation (6 of 8) for time window 366 …
#>  - getting characteristics from table drug_exposure (7 of 8)
#>  - getting characteristics from table drug_exposure (7 of 8) for time window -I…
#>  - getting characteristics from table drug_exposure (7 of 8) for time window -3…
#>  - getting characteristics from table drug_exposure (7 of 8) for time window -3…
#>  - getting characteristics from table drug_exposure (7 of 8) for time window 0 …
#>  - getting characteristics from table drug_exposure (7 of 8) for time window 1 …
#>  - getting characteristics from table drug_exposure (7 of 8) for time window 31…
#>  - getting characteristics from table drug_exposure (7 of 8) for time window 36…
#>  - getting characteristics from table drug_era (8 of 8)
#>  - getting characteristics from table drug_era (8 of 8) for time window -Inf an…
#>  - getting characteristics from table drug_era (8 of 8) for time window -365 an…
#>  - getting characteristics from table drug_era (8 of 8) for time window -30 and…
#>  - getting characteristics from table drug_era (8 of 8) for time window 0 and 0
#>  - getting characteristics from table drug_era (8 of 8) for time window 1 and 30
#>  - getting characteristics from table drug_era (8 of 8) for time window 31 and …
#>  - getting characteristics from table drug_era (8 of 8) for time window 366 and…
#> Formatting result
#> 415 estimates dropped as frequency less than 1%
#>  Summarising large scale characteristics
#> `cohort_sample` and `matched_sample` casted to character.
#> [2026-01-20 10:16:49] - Population diagnosics - input validation
#> [2026-01-20 10:16:49] - Population diagnosics - denominator cohort
#> [2026-01-20 10:16:49] - Population diagnosics - sampling person table to1e+06
#>  Creating denominator cohorts
#>  Cohorts created in 0 min and 6 sec
#> [2026-01-20 10:16:55] - Population diagnosics - incidence
#>  Getting incidence for analysis 1 of 7
#>  Getting incidence for analysis 2 of 7
#>  Getting incidence for analysis 3 of 7
#>  Getting incidence for analysis 4 of 7
#>  Getting incidence for analysis 5 of 7
#>  Getting incidence for analysis 6 of 7
#>  Getting incidence for analysis 7 of 7
#>  Overall time taken: 0 mins and 10 secs
#> [2026-01-20 10:17:06] - Population diagnosics - prevalence
#>  Getting prevalence for analysis 1 of 7
#>  Getting prevalence for analysis 2 of 7
#>  Getting prevalence for analysis 3 of 7
#>  Getting prevalence for analysis 4 of 7
#>  Getting prevalence for analysis 5 of 7
#>  Getting prevalence for analysis 6 of 7
#>  Getting prevalence for analysis 7 of 7
#>  Time taken: 0 mins and 6 secs
#> `populationDateStart`, `populationDateEnd`, and `populationSample` casted to
#> character.
#> `populationDateStart` and `populationDateEnd` eliminated from settings as all
#> elements are NA.
#> [2026-01-20 10:17:13] - Phenotype diagnostics - exporting results
#> [2026-01-20 10:17:13] - Exporting log file
# }