Skip to contents

Introduction

In this example we’re going to be using the Eunomia synthetic data.

library(CDMConnector)
library(OmopSketch)
library(PhenotypeR)
library(dplyr)
library(ggplot2)

con <- DBI::dbConnect(duckdb::duckdb(), 
                      CDMConnector::eunomiaDir("synpuf-1k", "5.3"))
cdm <- CDMConnector::cdmFromCon(con = con, 
                                cdmName = "Eunomia Synpuf",
                                cdmSchema   = "main",
                                writeSchema = "main", 
                                achillesSchema = "main")

Database diagnostics

Although we may have created our study cohort, to inform analytic decisions and interpretation of results requires an understanding of the dataset from which it has been derived. The databaseDiagnostics() function will help us better understand a data source.

To run database diagnostics we just need to provide our cdm reference to the function.

db_diagnostics <- databaseDiagnostics(cdm)

Database diagnostics builds on OmopSketch package to perform the following analyses:

  • Snapshot: Summarises the meta data of a CDM object by using summariseOmopSnapshot()
  • Observation periods: Summarises the observation period table by using summariseObservationPeriod(). This will allow us to see if there are individuals with multiple, non-overlapping, observation periods and how long each observation period lasts on average.

The output is a summarised result object.

Visualise the results

We can use OmopSketch package functions to visualise the results obtained.

Snapshot

tableOmopSnapshot(db_diagnostics)
Snapshot of the cdm Eunomia Synpuf
Database name
Eunomia Synpuf
Diagnostic
databaseDiagnostics
Estimate
Phenotyper version
0.2.0.900
General
Snapshot date 2025-12-03
Person count 1,000
Vocabulary version v5.0 06-AUG-21
Cdm
Source name Synpuf
Version 5.3
Holder name ohdsi
Release date 2018-03-15
Description
Documentation reference
Observation period
N 1,000
Start date 2008-01-01
End date 2010-12-31
Cdm source
Type duckdb
Package CDMConnector
Write schema main

Observation periods

tableObservationPeriod(db_diagnostics)
Summary of observation_period table
CDM name
Eunomia Synpuf
Diagnostic
databaseDiagnostics
Observation period ordinal Variable name Variable level Estimate name
Phenotyper version
0.2.0.900
all Number records N 1,048
Number subjects N 1,000
Subjects not in person table N (%) 0 (0.00%)
Records per person Mean (SD) 1.05 (0.21)
Median [Q25 - Q75] 1 [1 - 1]
Range [min to max] [1 to 2]
Duration in days Mean (SD) 979.71 (262.79)
Median [Q25 - Q75] 1,096 [1,096 - 1,096]
Range [min to max] [1 to 1,096]
Days to next observation period Mean (SD) 172.17 (108.35)
Median [Q25 - Q75] 138 [93 - 254]
Range [min to max] [32 to 366]
Type concept id Period while enrolled in insurance N (%) 1,048 (100.00%)
Start date before birth date N (%) 0 (0.00%)
End date before start date N (%) 0 (0.00%)
Column name Observation period end date N missing data (%) 0 (0.00%)
Observation period id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Observation period start date N missing data (%) 0 (0.00%)
Period type concept id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Person id N missing data (%) 0 (0.00%)
N zeros (%) 1 (0.10%)
1st Number subjects N 1,000
Duration in days Mean (SD) 994.16 (257.95)
Median [Q25 - Q75] 1,096 [1,096 - 1,096]
Range [min to max] [1 to 1,096]
Days to next observation period Mean (SD) 172.17 (108.35)
Median [Q25 - Q75] 138 [93 - 254]
Range [min to max] [32 to 366]
2nd Number subjects N 48
Duration in days Mean (SD) 678.60 (164.50)
Median [Q25 - Q75] 730 [730 - 730]
Range [min to max] [31 to 730]
Days to next observation period Mean (SD)
Median [Q25 - Q75]
Range [min to max]