Skip to contents

Summarise missing data in omop tables

Usage

summariseMissingData(
  cdm,
  omopTableName,
  col = NULL,
  sex = FALSE,
  interval = "overall",
  ageGroup = NULL,
  sample = 1e+05,
  dateRange = NULL,
  year = lifecycle::deprecated()
)

Arguments

cdm

A cdm_reference object. Use CDMConnector to create a reference to a database or omock to create a reference to synthetic data.

omopTableName

A character vector of the names of the tables to summarise in the cdm object. Run clinicalTables() to check the available options.

col

A character vector of column names to check for missing values. If NULL, all columns in the specified tables are checked. Default is NULL.

sex

Logical; whether to stratify results by sex (TRUE) or not (FALSE).

interval

Time interval to stratify by. It can either be "years", "quarters", "months" or "overall".

ageGroup

A list of age groups to stratify the results by. Each element represents a specific age range. You can give them specific names, e.g. ageGroup = list(children = c(0, 17), adult = c(18, Inf)).

sample

Either an integer or a character string.

  • If an integer (n > 0), the function will first sample n distinct person_ids from the person table and then subset the input tables to those subjects.

  • If a character string, it must be the name of a cohort in the cdm; in this case, the input tables are subset to subjects (subject_id) belonging to that cohort.

  • Use NULL to disable subsetting (default value).

dateRange

A vector of two dates defining the desired study period. Only the start_date column of the OMOP table is checked to ensure it falls within this range. If dateRange is NULL, no restriction is applied.

year

deprecated

Value

A summarised_result object with the results.

Examples

# \donttest{
library(OmopSketch)
library(omock)

cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb")
#>  Reading GiBleed tables.
#>  Adding drug_strength table.
#>  Creating local <cdm_reference> object.
#>  Inserting <cdm_reference> into duckdb.

result <- summariseMissingData(
  cdm = cdm,
  omopTableName = c("condition_occurrence", "visit_occurrence"),
  sample = 10000
)
#> The person table has ≤ 10000 subjects; skipping sampling of the CDM.

tableMissingData(result = result)
Summary of missingness in condition_occurrence, visit_occurrence tables
Column name Estimate name
Database name
GiBleed
visit_occurrence
admitting_source_concept_id N missing data (%) 0 (0.00%)
N zeros (%) 1,037 (100.00%)
admitting_source_value N missing data (%) 1,037 (100.00%)
care_site_id N missing data (%) 1,037 (100.00%)
N zeros (%) 0 (0.00%)
discharge_to_concept_id N missing data (%) 0 (0.00%)
N zeros (%) 1,037 (100.00%)
discharge_to_source_value N missing data (%) 1,037 (100.00%)
person_id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
preceding_visit_occurrence_id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
provider_id N missing data (%) 1,037 (100.00%)
N zeros (%) 0 (0.00%)
visit_concept_id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
visit_end_date N missing data (%) 0 (0.00%)
visit_end_datetime N missing data (%) 0 (0.00%)
visit_occurrence_id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
visit_source_concept_id N missing data (%) 0 (0.00%)
N zeros (%) 1,037 (100.00%)
visit_source_value N missing data (%) 0 (0.00%)
visit_start_date N missing data (%) 0 (0.00%)
visit_start_datetime N missing data (%) 0 (0.00%)
visit_type_concept_id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
condition_occurrence
condition_concept_id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
condition_end_date N missing data (%) 8,652 (13.24%)
condition_end_datetime N missing data (%) 8,652 (13.24%)
condition_occurrence_id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
condition_source_concept_id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
condition_source_value N missing data (%) 0 (0.00%)
condition_start_date N missing data (%) 0 (0.00%)
condition_start_datetime N missing data (%) 0 (0.00%)
condition_status_concept_id N missing data (%) 0 (0.00%)
N zeros (%) 65,332 (100.00%)
condition_status_source_value N missing data (%) 65,332 (100.00%)
condition_type_concept_id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
person_id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
provider_id N missing data (%) 65,332 (100.00%)
N zeros (%) 0 (0.00%)
stop_reason N missing data (%) 65,332 (100.00%)
visit_detail_id N missing data (%) 0 (0.00%)
N zeros (%) 65,332 (100.00%)
visit_occurrence_id N missing data (%) 64 (0.10%)
N zeros (%) 0 (0.00%)
cdmDisconnect(cdm = cdm) # }