
Database diagnostics
DatabaseDiagnostics.RmdIntroduction
In this example we’re going to be using the Eunomia synthetic data.
library(CDMConnector)
library(OmopSketch)
library(PhenotypeR)
library(dplyr)
library(CohortConstructor)
con <- DBI::dbConnect(duckdb::duckdb(),
CDMConnector::eunomiaDir("synpuf-1k", "5.3"))
cdm <- CDMConnector::cdmFromCon(con = con,
cdmName = "Eunomia Synpuf",
cdmSchema = "main",
writeSchema = "main",
achillesSchema = "main")
cdm$injuries <- conceptCohort(cdm = cdm,
conceptSet = list(
"ankle_sprain" = 81151L,
"ankle_fracture" = 4059173L,
"forearm_fracture" = 4278672L,
"hip_fracture" = 4230399L
),
name = "injuries")Database diagnostics
Although we may have created our study cohort, to inform analytic
decisions and interpretation of results requires an understanding of the
dataset from which it has been derived. The
databaseDiagnostics() function will help us better
understand a data source.
To run database diagnostics we just need to provide our cdm reference to the function.
db_diagnostics <- databaseDiagnostics(cdm$injuries)Database diagnostics builds on OmopSketch package to perform the following analyses:
- Snapshot: Summarises the meta data of a CDM object by using summariseOmopSnapshot()
- Observation periods: Summarises the observation period table by using summariseObservationPeriod(). This will allow us to see if there are individuals with multiple, non-overlapping, observation periods and how long each observation period lasts on average.
- Person table: Summarises the person table by using summarisePerson(). This provides demographic information including sex, race, ethnicity, year/month/day of birth distributions, and location/provider/care site information.
- Clinical Records: The diagnostics will detect which domains appears to the codelist associated to your cohort (i.e., Drug), and use summariseClinicalRecords() to summarise the clinical table associated (i.e., “drug_exposure”).
The output is a summarised result object.
Visualise the results
We can use OmopSketch package functions to visualise the results obtained.
Snapshot
tableOmopSnapshot(db_diagnostics)|
Database name
|
|
|---|---|
|
Eunomia Synpuf
|
|
|
Diagnostic
|
|
|
databaseDiagnostics
|
|
| Estimate |
Phenotyper version
|
| 0.3.9000 | |
| General | |
| Snapshot date | 2026-01-20 |
| Person count | 1,000 |
| Vocabulary version | v5.0 06-AUG-21 |
| Cdm | |
| Source name | Synpuf |
| Version | 5.3 |
| Holder name | ohdsi |
| Release date | 2018-03-15 |
| Description | |
| Documentation reference | |
| Observation period | |
| N | 1,000 |
| Start date | 2008-01-01 |
| End date | 2010-12-31 |
| Cdm source | |
| Type | duckdb |
| Package | CDMConnector |
| Write schema | main |
Observation Periods
tableObservationPeriod(db_diagnostics)|
CDM name
|
||||
|---|---|---|---|---|
|
Eunomia Synpuf
|
||||
|
Diagnostic
|
||||
|
databaseDiagnostics
|
||||
| Observation period ordinal | Variable name | Variable level | Estimate name |
Phenotyper version
|
| 0.3.9000 | ||||
| all | Number records | – | N | 1,048 |
| Number subjects | – | N | 1,000 | |
| Subjects not in person table | – | N (%) | 0 (0.00%) | |
| Records per person | – | Mean (SD) | 1.05 (0.21) | |
| Median [Q25 - Q75] | 1 [1 - 1] | |||
| Range [min to max] | [1 to 2] | |||
| Duration in days | – | Mean (SD) | 979.71 (262.79) | |
| Median [Q25 - Q75] | 1,096 [1,096 - 1,096] | |||
| Range [min to max] | [1 to 1,096] | |||
| Days to next observation period | – | Mean (SD) | 172.17 (108.35) | |
| Median [Q25 - Q75] | 138 [93 - 254] | |||
| Range [min to max] | [32 to 366] | |||
| Type concept id | Period while enrolled in insurance | N (%) | 1,048 (100.00%) | |
| Start date before birth date | – | N (%) | 0 (0.00%) | |
| End date before start date | – | N (%) | 0 (0.00%) | |
| Column name | Observation period end date | N missing data (%) | 0 (0.00%) | |
| Observation period id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| Observation period start date | N missing data (%) | 0 (0.00%) | ||
| Period type concept id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| Person id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 1 (0.10%) | |||
| 1st | Number subjects | – | N | 1,000 |
| Duration in days | – | Mean (SD) | 994.16 (257.95) | |
| Median [Q25 - Q75] | 1,096 [1,096 - 1,096] | |||
| Range [min to max] | [1 to 1,096] | |||
| Days to next observation period | – | Mean (SD) | 172.17 (108.35) | |
| Median [Q25 - Q75] | 138 [93 - 254] | |||
| Range [min to max] | [32 to 366] | |||
| 2nd | Number subjects | – | N | 48 |
| Duration in days | – | Mean (SD) | 678.60 (164.50) | |
| Median [Q25 - Q75] | 730 [730 - 730] | |||
| Range [min to max] | [31 to 730] | |||
| Days to next observation period | – | Mean (SD) | – | |
| Median [Q25 - Q75] | – | |||
| Range [min to max] | – | |||
Person Table
tablePerson(db_diagnostics)|
CDM name
|
|||
|---|---|---|---|
|
Eunomia Synpuf
|
|||
|
Diagnostic
|
|||
|
databaseDiagnostics
|
|||
| Variable name | Variable level | Estimate name |
Phenotyper version
|
| 0.3.9000 | |||
| Number subjects | – | N | 1,000 |
| Number subjects not in observation | – | N (%) | 0 (0.00%) |
| Sex | Female | N (%) | 498 (49.80%) |
| Male | N (%) | 502 (50.20%) | |
| None | N (%) | 0 (0.00%) | |
| Sex source | Female | N (%) | 498 (49.80%) |
| Male | N (%) | 502 (50.20%) | |
| Race | Black or African American | N (%) | 114 (11.40%) |
| No matching concept | N (%) | 67 (6.70%) | |
| White | N (%) | 819 (81.90%) | |
| Race source | Black | N (%) | 114 (11.40%) |
| Hispanic | N (%) | 17 (1.70%) | |
| Others | N (%) | 50 (5.00%) | |
| White | N (%) | 819 (81.90%) | |
| Ethnicity | Hispanic or Latino | N (%) | 17 (1.70%) |
| No matching concept | N (%) | 50 (5.00%) | |
| Not Hispanic or Latino | N (%) | 933 (93.30%) | |
| Ethnicity source | Hispanic or Latino | N (%) | 17 (1.70%) |
| Not Hispanic or Latino | N (%) | 933 (93.30%) | |
| Others | N (%) | 50 (5.00%) | |
| Year of birth | – | Missing (%) | 0 (0.00%) |
| Median [Q25 - Q75] | 1,935 [1,928 - 1,942] | ||
| 90% Range [Q05 to Q95] | 1,919 to 1,962 | ||
| Range [min to max] | 1,909 to 1,983 | ||
| Month of birth | – | Missing (%) | 0 (0.00%) |
| Median [Q25 - Q75] | 7 [3 - 10] | ||
| 90% Range [Q05 to Q95] | 1 to 12 | ||
| Range [min to max] | 1 to 12 | ||
| Day of birth | – | Missing (%) | 0 (0.00%) |
| Median [Q25 - Q75] | 1 [1 - 1] | ||
| 90% Range [Q05 to Q95] | 1 to 1 | ||
| Range [min to max] | 1 to 1 | ||
| Location | – | Missing (%) | 0 (0.00%) |
| Zero count (%) | 1 (0.10%) | ||
| Distinct values | 626 | ||
| Provider | – | Missing (%) | 1,000 (100.00%) |
| Zero count (%) | 0 (0.00%) | ||
| Distinct values | 1 | ||
| Care site | – | Missing (%) | 1,000 (100.00%) |
| Zero count (%) | 0 (0.00%) | ||
| Distinct values | 1 | ||
Clinical Records
tableClinicalRecords(db_diagnostics)|
Database name
|
|||
|---|---|---|---|
|
Eunomia Synpuf
|
|||
|
Diagnostic
|
|||
|
databaseDiagnostics
|
|||
| Variable name | Variable level | Estimate name |
Phenotyper version
|
| 0.3.9000 | |||
| condition_occurrence | |||
| Number records | – | N | 160,322 |
| Number subjects | – | N (%) | 836 (100.00%) |
| Subjects not in person table | – | N (%) | 0 (0.00%) |
| Records per person | – | Mean (SD) | 191.77 (148.24) |
| Median [Q25 - Q75] | 160 [76 - 278] | ||
| Range [min to max] | [1 to 765] | ||
| In observation | Yes | N (%) | 160,322 (100.00%) |
| Domain | Condition | N (%) | 159,651 (99.58%) |
| Observation | N (%) | 666 (0.42%) | |
| Procedure | N (%) | 5 (0.00%) | |
| Source vocabulary | Icd9cm | N (%) | 160,322 (100.00%) |
| Standard concept | S | N (%) | 153,665 (95.85%) |
| – | N (%) | 6,657 (4.15%) | |
| Type concept id | Carrier claim detail - 10th position | N (%) | 308 (0.19%) |
| Carrier claim detail - 11th position | N (%) | 210 (0.13%) | |
| Carrier claim detail - 12th position | N (%) | 170 (0.11%) | |
| Carrier claim detail - 13th position | N (%) | 122 (0.08%) | |
| Carrier claim detail - 1st position | N (%) | 34,738 (21.67%) | |
| Carrier claim detail - 2nd position | N (%) | 14,405 (8.99%) | |
| Carrier claim detail - 3rd position | N (%) | 7,484 (4.67%) | |
| Carrier claim detail - 4th position | N (%) | 4,235 (2.64%) | |
| Carrier claim detail - 5th position | N (%) | 2,598 (1.62%) | |
| Carrier claim detail - 6th position | N (%) | 1,595 (0.99%) | |
| Carrier claim detail - 7th position | N (%) | 876 (0.55%) | |
| Carrier claim detail - 8th position | N (%) | 599 (0.37%) | |
| Carrier claim detail - 9th position | N (%) | 420 (0.26%) | |
| Carrier claim header - 1st position | N (%) | 34,477 (21.50%) | |
| Carrier claim header - 2nd position | N (%) | 19,670 (12.27%) | |
| Carrier claim header - 3rd position | N (%) | 11,387 (7.10%) | |
| Carrier claim header - 4th position | N (%) | 6,436 (4.01%) | |
| Carrier claim header - 5th position | N (%) | 828 (0.52%) | |
| Carrier claim header - 6th position | N (%) | 468 (0.29%) | |
| Carrier claim header - 7th position | N (%) | 238 (0.15%) | |
| Carrier claim header - 8th position | N (%) | 177 (0.11%) | |
| Inpatient header - 10th position | N (%) | 31 (0.02%) | |
| Inpatient header - 11th position | N (%) | 476 (0.30%) | |
| Inpatient header - 1st position | N (%) | 515 (0.32%) | |
| Inpatient header - 2nd position | N (%) | 491 (0.31%) | |
| Inpatient header - 3rd position | N (%) | 490 (0.31%) | |
| Inpatient header - 4th position | N (%) | 466 (0.29%) | |
| Inpatient header - 5th position | N (%) | 450 (0.28%) | |
| Inpatient header - 6th position | N (%) | 424 (0.26%) | |
| Inpatient header - 7th position | N (%) | 397 (0.25%) | |
| Inpatient header - 8th position | N (%) | 371 (0.23%) | |
| Inpatient header - 9th position | N (%) | 335 (0.21%) | |
| Outpatient header - 10th position | N (%) | 8 (0.00%) | |
| Outpatient header - 11th position | N (%) | 1,225 (0.76%) | |
| Outpatient header - 1st position | N (%) | 4,977 (3.10%) | |
| Outpatient header - 2nd position | N (%) | 3,161 (1.97%) | |
| Outpatient header - 3rd position | N (%) | 2,004 (1.25%) | |
| Outpatient header - 4th position | N (%) | 1,202 (0.75%) | |
| Outpatient header - 5th position | N (%) | 715 (0.45%) | |
| Outpatient header - 6th position | N (%) | 460 (0.29%) | |
| Outpatient header - 7th position | N (%) | 315 (0.20%) | |
| Outpatient header - 8th position | N (%) | 213 (0.13%) | |
| Outpatient header - 9th position | N (%) | 155 (0.10%) | |
| Start date before birth date | – | N (%) | 0 (0.00%) |
| End date before start date | – | N (%) | 0 (0.00%) |
| Column name | Condition concept id | N missing data (%) | 0 (0.00%) |
| N zeros (%) | 0 (0.00%) | ||
| Condition end date | N missing data (%) | 0 (0.00%) | |
| Condition end datetime | N missing data (%) | 160,322 (100.00%) | |
| Condition occurrence id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 1 (0.00%) | ||
| Condition source concept id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Condition source value | N missing data (%) | 0 (0.00%) | |
| Condition start date | N missing data (%) | 0 (0.00%) | |
| Condition start datetime | N missing data (%) | 160,322 (100.00%) | |
| Condition status concept id | N missing data (%) | 160,322 (100.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Condition status source value | N missing data (%) | 160,322 (100.00%) | |
| Condition type concept id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Person id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 191 (0.12%) | ||
| Provider id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 73,682 (45.96%) | ||
| Stop reason | N missing data (%) | 160,322 (100.00%) | |
| Visit detail id | N missing data (%) | 160,322 (100.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Visit occurrence id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 10 (0.01%) | ||