Skip to contents

You will obtain information related to the number of records, number of subjects, whether the records are in observation, number of present domains, number of present concepts, missing data and inconsistencies in start date and end date.

Usage

summariseClinicalRecords(
  cdm,
  omopTableName,
  recordsPerPerson = c("mean", "sd", "median", "q25", "q75", "min", "max"),
  conceptSummary = TRUE,
  missingData = TRUE,
  quality = TRUE,
  sex = FALSE,
  ageGroup = NULL,
  dateRange = NULL,
  inObservation = lifecycle::deprecated(),
  standardConcept = lifecycle::deprecated(),
  sourceVocabulary = lifecycle::deprecated(),
  domainId = lifecycle::deprecated(),
  typeConcept = lifecycle::deprecated()
)

Arguments

cdm

A cdm_reference object. Use CDMConnector to create a reference to a database or omock to create a reference to synthetic data.

omopTableName

A character vector of the names of the tables to summarise in the cdm object. Run clinicalTables() to check the available options.

recordsPerPerson

Generates summary statistics for the number of records per person. Set to NULL if no summary statistics are required.

conceptSummary

Logical. If TRUE, includes summaries of concept-level information, including:

  • Domain ID of standard concepts.

  • Type concept ID.

  • Standard vs non-standard concepts.

  • Source vocabulary usage.

missingData

Logical. If TRUE, includes a summary of missing data for relevant fields.

quality

Logical. If TRUE, performs basic data quality checks, including:

  • Percentage of records within the observation period.

  • Number of records with end date before start date.

  • Number of records with start date before the person's birth date.

sex

Logical; whether to stratify results by sex (TRUE) or not (FALSE).

ageGroup

A list of age groups to stratify the results by. Each element represents a specific age range. You can give them specific names, e.g. ageGroup = list(children = c(0, 17), adult = c(18, Inf)).

dateRange

A vector of two dates defining the desired study period. Only the start_date column of the OMOP table is checked to ensure it falls within this range. If dateRange is NULL, no restriction is applied.

inObservation

Deprecated. Use quality = TRUE instead.

standardConcept

Deprecated. Use conceptSummary = TRUE instead.

sourceVocabulary

Deprecated. Use conceptSummary = TRUE instead.

domainId

Deprecated. Use conceptSummary = TRUE instead.

typeConcept

Deprecated. Use conceptSummary = TRUE instead.

Value

A summarised_result object with the results.

Examples

# \donttest{
library(OmopSketch)
library(omock)

cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb")
#>  Loading bundled GiBleed tables from package data.
#>  Adding drug_strength table.
#>  Creating local <cdm_reference> object.
#>  Inserting <cdm_reference> into duckdb.

result <- summariseClinicalRecords(
  cdm = cdm,
  omopTableName = "condition_occurrence",
  recordsPerPerson = c("mean", "sd"),
  quality = TRUE,
  conceptSummary = TRUE,
  missingData = TRUE
)
#>  Adding variables of interest to condition_occurrence.
#>  Summarising records per person in condition_occurrence.
#>  The following estimates will be calculated:
#>  duration: mean, sd
#> → Start summary of data, at 2026-05-19 14:09:40.173314
#>  Summary finished, at 2026-05-19 14:09:40.321956
#>  Summarising subjects not in person table in condition_occurrence.
#>  Summarising records in observation in condition_occurrence.
#>  Summarising records with start before birth date in condition_occurrence.
#>  Summarising records with end date before start date in condition_occurrence.
#>  Summarising domains in condition_occurrence.
#>  Summarising standard concepts in condition_occurrence.
#>  Summarising source vocabularies in condition_occurrence.
#>  Summarising concept types in condition_occurrence.
#>  Summarising missing data in condition_occurrence.

tableClinicalRecords(result = result)
Summary of condition_occurrence table
Variable name Variable level Is required Estimate name
Database name
GiBleed
condition_occurrence
Number records overall N 65,332
Number subjects overall N (%) 2,694 (100.00%)
Subjects not in person table overall N (%) 0 (0.00%)
Records per person overall Mean (SD) 24.25 (7.41)
Duration of records overall Mean (SD) 53.17 (423.43)
In observation No overall N (%) 450 (0.69%)
Yes overall N (%) 64,882 (99.31%)
Domain Condition overall N (%) 65,332 (100.00%)
Standard vocabulary Snomed overall N (%) 65,332 (100.00%)
Source vocabulary Icd10cm overall N (%) 479 (0.73%)
No matching concept overall N (%) 27 (0.04%)
Snomed overall N (%) 64,826 (99.23%)
Standard concept Standard overall N (%) 65,332 (100.00%)
Type concept id Ehr encounter diagnosis overall N (%) 65,332 (100.00%)
Start date before birth date overall N (%) 0 (0.00%)
End date before start date overall N (%) 0 (0.00%)
Column name Condition concept id TRUE N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Condition end date FALSE N missing data (%) 8,652 (13.24%)
Condition end datetime FALSE N missing data (%) 8,652 (13.24%)
Condition occurrence id TRUE N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Condition source concept id FALSE N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Condition source value FALSE N missing data (%) 0 (0.00%)
Condition start date TRUE N missing data (%) 0 (0.00%)
Condition start datetime FALSE N missing data (%) 0 (0.00%)
Condition status concept id FALSE N missing data (%) 0 (0.00%)
N zeros (%) 65,332 (100.00%)
Condition status source value FALSE N missing data (%) 65,332 (100.00%)
Condition type concept id TRUE N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Person id TRUE N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Provider id FALSE N missing data (%) 65,332 (100.00%)
N zeros (%) 0 (0.00%)
Stop reason FALSE N missing data (%) 65,332 (100.00%)
Visit detail id FALSE N missing data (%) 0 (0.00%)
N zeros (%) 65,332 (100.00%)
Visit occurrence id FALSE N missing data (%) 64 (0.10%)
N zeros (%) 0 (0.00%)
cdmDisconnect(cdm = cdm) # }