
Database diagnostics
DatabaseDiagnostics.RmdIntroduction
In this example we’re going to be using the Eunomia synthetic data.
library(CDMConnector)
library(OmopSketch)
library(PhenotypeR)
library(dplyr)
library(ggplot2)
con <- DBI::dbConnect(duckdb::duckdb(),
CDMConnector::eunomiaDir("synpuf-1k", "5.3"))
cdm <- CDMConnector::cdmFromCon(con = con,
cdmName = "Eunomia Synpuf",
cdmSchema = "main",
writeSchema = "main",
achillesSchema = "main")Database diagnostics
Although we may have created our study cohort, to inform analytic
decisions and interpretation of results requires an understanding of the
dataset from which it has been derived. The
databaseDiagnostics() function will help us better
understand a data source.
To run database diagnostics we just need to provide our cdm reference to the function.
db_diagnostics <- databaseDiagnostics(cdm)Database diagnostics builds on OmopSketch package to perform the following analyses:
- Snapshot: Summarises the meta data of a CDM object by using summariseOmopSnapshot()
- Observation periods: Summarises the observation period table by using summariseObservationPeriod(). This will allow us to see if there are individuals with multiple, non-overlapping, observation periods and how long each observation period lasts on average.
The output is a summarised result object.
Visualise the results
We can use OmopSketch package functions to visualise the results obtained.
Snapshot
tableOmopSnapshot(db_diagnostics)|
Database name
|
|
|---|---|
|
Eunomia Synpuf
|
|
|
Diagnostic
|
|
|
databaseDiagnostics
|
|
| Estimate |
Phenotyper version
|
| 0.2.0.900 | |
| General | |
| Snapshot date | 2025-12-03 |
| Person count | 1,000 |
| Vocabulary version | v5.0 06-AUG-21 |
| Cdm | |
| Source name | Synpuf |
| Version | 5.3 |
| Holder name | ohdsi |
| Release date | 2018-03-15 |
| Description | |
| Documentation reference | |
| Observation period | |
| N | 1,000 |
| Start date | 2008-01-01 |
| End date | 2010-12-31 |
| Cdm source | |
| Type | duckdb |
| Package | CDMConnector |
| Write schema | main |
Observation periods
tableObservationPeriod(db_diagnostics)|
CDM name
|
||||
|---|---|---|---|---|
|
Eunomia Synpuf
|
||||
|
Diagnostic
|
||||
|
databaseDiagnostics
|
||||
| Observation period ordinal | Variable name | Variable level | Estimate name |
Phenotyper version
|
| 0.2.0.900 | ||||
| all | Number records | – | N | 1,048 |
| Number subjects | – | N | 1,000 | |
| Subjects not in person table | – | N (%) | 0 (0.00%) | |
| Records per person | – | Mean (SD) | 1.05 (0.21) | |
| Median [Q25 - Q75] | 1 [1 - 1] | |||
| Range [min to max] | [1 to 2] | |||
| Duration in days | – | Mean (SD) | 979.71 (262.79) | |
| Median [Q25 - Q75] | 1,096 [1,096 - 1,096] | |||
| Range [min to max] | [1 to 1,096] | |||
| Days to next observation period | – | Mean (SD) | 172.17 (108.35) | |
| Median [Q25 - Q75] | 138 [93 - 254] | |||
| Range [min to max] | [32 to 366] | |||
| Type concept id | Period while enrolled in insurance | N (%) | 1,048 (100.00%) | |
| Start date before birth date | – | N (%) | 0 (0.00%) | |
| End date before start date | – | N (%) | 0 (0.00%) | |
| Column name | Observation period end date | N missing data (%) | 0 (0.00%) | |
| Observation period id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| Observation period start date | N missing data (%) | 0 (0.00%) | ||
| Period type concept id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| Person id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 1 (0.10%) | |||
| 1st | Number subjects | – | N | 1,000 |
| Duration in days | – | Mean (SD) | 994.16 (257.95) | |
| Median [Q25 - Q75] | 1,096 [1,096 - 1,096] | |||
| Range [min to max] | [1 to 1,096] | |||
| Days to next observation period | – | Mean (SD) | 172.17 (108.35) | |
| Median [Q25 - Q75] | 138 [93 - 254] | |||
| Range [min to max] | [32 to 366] | |||
| 2nd | Number subjects | – | N | 48 |
| Duration in days | – | Mean (SD) | 678.60 (164.50) | |
| Median [Q25 - Q75] | 730 [730 - 730] | |||
| Range [min to max] | [31 to 730] | |||
| Days to next observation period | – | Mean (SD) | – | |
| Median [Q25 - Q75] | – | |||
| Range [min to max] | – | |||