
Database diagnostics
a03_DatabaseDiagnostics.Rmd
Introduction
In this example we’re going to be using the Eunomia synthetic data.
library(CDMConnector)
library(OmopSketch)
library(PhenotypeR)
library(dplyr)
library(ggplot2)
con <- DBI::dbConnect(duckdb::duckdb(),
CDMConnector::eunomiaDir("synpuf-1k", "5.3"))
cdm <- CDMConnector::cdmFromCon(con = con,
cdmName = "Eunomia Synpuf",
cdmSchema = "main",
writeSchema = "main",
achillesSchema = "main")
Database diagnostics
Although we may have created our study cohort, to inform analytic
decisions and interpretation of results requires an understanding of the
dataset from which it has been derived. The
databaseDiagnostics()
function will help us better
understand a data source.
To run database diagnostics we just need to provide our cdm reference to the function.
db_diagnostics <- databaseDiagnostics(cdm)
Database diagnostics builds on OmopSketch package to perform the following analyses:
- Snapshot: Summarises the meta data of a CDM object by using summariseOmopSnapshot()
- Observation periods: Summarises the observation period table by using summariseObservationPeriod(). This will allow us to see if there are individuals with multiple, non-overlapping, observation periods and how long each observation period lasts on average.
The output is a summarised result object.
Visualise the results
We can use OmopSketch package functions to visualise the results obtained.
Snapshot
tableOmopSnapshot(db_diagnostics)
Estimate |
Database name
|
---|---|
Eunomia Synpuf | |
General | |
Snapshot date | 2025-07-22 |
Person count | 1,000 |
Vocabulary version | v5.0 06-AUG-21 |
Observation period | |
N | 1,048 |
Start date | 2008-01-01 |
End date | 2010-12-31 |
Cdm | |
Source name | Synpuf |
Version | v5.3.1 |
Holder name | ohdsi |
Release date | 2018-03-15 |
Description | |
Documentation reference | |
Source type | duckdb |
Observation periods
tableObservationPeriod(db_diagnostics)
Observation period ordinal | Variable name | Estimate name |
CDM name
|
---|---|---|---|
Eunomia Synpuf | |||
all | Number records | N | 1,048 |
Number subjects | N | 1,000 | |
Records per person | mean (sd) | 1.05 (0.21) | |
median [Q25 - Q75] | 1 [1 - 1] | ||
Duration in days | mean (sd) | 979.71 (262.79) | |
median [Q25 - Q75] | 1,096 [1,096 - 1,096] | ||
Days to next observation period | mean (sd) | 172.17 (108.35) | |
median [Q25 - Q75] | 138 [93 - 254] | ||
1st | Number subjects | N | 1,000 |
Duration in days | mean (sd) | 994.16 (257.95) | |
median [Q25 - Q75] | 1,096 [1,096 - 1,096] | ||
Days to next observation period | mean (sd) | 172.17 (108.35) | |
median [Q25 - Q75] | 138 [93 - 254] | ||
2nd | Number subjects | N | 48 |
Duration in days | mean (sd) | 678.60 (164.50) | |
median [Q25 - Q75] | 730 [730 - 730] | ||
Days to next observation period | mean (sd) | - | |
median [Q25 - Q75] | - |