result <- phenotypeDiagnostics(
cohort,
databaseDiagnostics = list(),
codelistDiagnostics = list(),
cohortDiagnostics = list(),
populationDiagnostics = list()
)Review codelists and cohorts in OMOP CDM

Database diagnostics
databaseDiagnostics()Codelist diagnostics
codelistDiagnostics()Cohort diagnostics
cohortDiagnostics()Population diagnostics
populationDiagnostics()db_diagnostics <- databaseDiagnostics(
cohort,
cohortId = NULL,
snapshot = TRUE,
personTableSummary = TRUE,
observationPeriodsSummary = TRUE,
clinicalRecordsSummary = TRUE
)
# Modify databaseDiagnostics in phenotypeDiagnostics:
result <- phenotypeDiagnostics(
cohort,
databaseDiagnostics = list(
"cohortId" = c(1,2),
"snapshot" = FALSE
)
)cl_diagnostics <- codelistDiagnostics(
cohort,
cohortId = NULL,
achillesCodeUse = TRUE,
orphanCodeUse = TRUE,
cohortCodeUse = TRUE,
drugDiagnostics = TRUE,
measurementDiagnostics = TRUE,
measurementDiagnosticsSample = 20000,
drugDiagnosticsSample = 20000
)
# Modify codelistDiagnostics in phenotypeDiagnostics:
result <- phenotypeDiagnostics(
cohort,
codelistDiagnostics = list(
"cohortId" = c(1,2),
"achillesCodeUse" = FALSE
)
)c_diagnostics <- cohortDiagnostics(
cohort,
cohortId = NULL,
cohortCount = TRUE,
cohortCharacteristics = TRUE,
largeScaleCharacteristics = TRUE,
compareCohorts = TRUE,
cohortSurvival = FALSE, # Notice that by default, cohortSurvival it's not run!!
cohortSample = 20000,
matchedSample = 1000
)
# Modify cohortDiagnostics in phenotypeDiagnostics:
result <- phenotypeDiagnostics(
cohort,
cohortDiagnostics = list(
"cohortSample" = 1000
)
)c_diagnostics <- populationDiagnostics(
cohort,
cohortId = NULL,
incidence = TRUE,
periodPrevalence = TRUE,
populationSample = 1e+05,
populationDateRange = as.Date(c(NA, NA))
)
# Modify populationDiagnostics in phenotypeDiagnostics:
result <- phenotypeDiagnostics(
cohort,
populationDiagnostics = list(
"populationSample" = 10000
)
)Database diagnostics
- databaseDiagnostics()
Codelist diagnostics
- codelistDiagnostics()
Cohort diagnostics
- cohortDiagnostics()
Population diagnostics
- populationDiagnostics()
library(dplyr)
library(PhenotypeR)
exp <- tibble(
"cohort_name" = "type_2_diabetes",
"estimate" = c("Median age of incident cases",
" Survival at five years"),
"value" = c("45 to 65",
"85% to 95%"),
"diagnostics" = c("cohort_characteristics",
"cohort_survival"),
"source" = "Marta"
)
tableCohortExpectations(exp)See the results in the shiny app
We will now run PhenotypeR for three cohorts of hypertension, warfarin users, and people with a measurement of prostate specific antigen level
Let’s start by loading the required packages and using https://ohdsi.github.io/omock/ package to create a mock CDM.
# Install all the packages:
install.packages(c("omock", "here", "OmopConstructor", "CohortConstructor",
"CohortSurvival", "omopgenerics", "readr", "duckdb", "PhenotypeR"))
# Load all the packages
library(omock)
library(here)
library(PhenotypeR)
library(OmopConstructor)
library(CohortConstructor)
library(CohortSurvival)
library(omopgenerics)
library(readr)
# Create mock CDM
cdm <- mockCdmFromDataset(datasetName = "synpuf-1k_5.3",
source = "duckdb")
cdm <- cdm |> buildAchillesTables()# Define code list for your cohort
codes <- list(
"hypertension" = c(320128L),
"users_of_warfarin" = c(1310149L, 40163554L),
"measurement_of_prostate_specific_antigen_level" = c(2617206L)
)
# Instantiate your cohort
cdm[["study_cohorts"]] <- conceptCohort(cdm,
conceptSet = codes,
name = "study_cohorts")To create a database description, follow the following instructions:
cdmName(cdm)downloadDatabaseDescriptionTemplate(). Remember that the docx files MUST have the same name as the database!!!Help: You can check the arguments of the function using: ??downloadDatabaseDescriptionTemplate or in the PhenotypeR website
Information source:
"OHDSI/Eunomia: An R package that facilitates access to a variety of OMOP CDM sample data sets."
Description:
"synput-1k_5.3 is a synthetic dataset designed for testing OHDSI tools. It is based on a subsample of Medicare claims data standardised to the OMOP Common Data Model (CDM) version 5.3. It contains approximately records for 1k participants."To create clinical descriptions for the previous cohorts, follow the following instructions:
getCohortName(cdm)downloadClinicalDescriptionTemplate(). Remember that the docx files MUST have the same name as the cohorts!!!Help: You can check the arguments of the function using: ??downloadClinicalDescriptionTemplate or in the PhenotypeR website
# Hypertension
Information source:
"Dynamed (Home - DynaMed)"
Introduction:
"Hypertension is a sustained elevation of systemic arterial blood pressure, most commonly defined as a systolic blood pressure (BP) ≥ 140 mm Hg or diastolic BP ≥ 90 mm Hg, but definitions vary by professional organization and cardiovascular risk.
Other names include: primary hypertension, essential hypertension, idiopathic hypertension, sustained hypertension."
Complications:
"Hypertension is a risk factor for: Coronary artery disease (CAD), Heart failure, Chronic kidney disease, Stroke, Intracerebral hemorrhage, Transient ischemic attack (TIA), Peripheral artery disease (PAD), Aortic regurgitation, Atrial flutter, Mild cognitive impairment (MCI)."
Phenotyping plan:
"Inclusion criteria: At least one record of a diagnosis code for essential hypertension (ConceptId = 320128L).
Index date: Date of the first occurrence of the essential hypertension diagnosis code.
Exit criteria: As it is considered a chronic condition, once a patient enters the cohort they remain in it until the end of their observation period in the database."# Warfarin users
Information source:
"Gemini (https://gemini.google.com/)"
Introduction:
"Warfarin is an oral anticoagulant that interferes with the hepatic synthesis of Vitamin K-dependent clotting factors (II, VII, IX, and X). It is primarily indicated for the prophylaxis and treatment of venous thrombosis, pulmonary embolism, and thromboembolic complications associated with atrial fibrillation (AFib) or cardiac valve replacement."
Phenotyping plan:
"Inclusion criteria: At least one record in the drug_exposure table of warfarin prescription. Multiple records per person are allowed.
Index date: Date of the recorded drug exposure.
Washout period: No washout period is used."# Measurement of antigen specific cancer
Information source:
"Dynamed (https://www.dynamed.com/)"
Introduction:
"Measurement of prostate specific antigen in serum for the detection and management of benign prostatic hyperplasia and prostate cancer.
Other names include: PSA measurement, PSA - Prostate-specific antigen level, PSA - Serum prostate specific antigen level, tPSA measurement - Total prostate specific antigen measurement"
Phenotyping plan:
"Inclusion criteria: A record in the measurement table where the measurement_concept_id is 2617206L. Multiple records per person are allowed.
Index date:Measurement date of the recorded PSA test."Run the following bit of code to download mock expectations for your cohorts.
We’ll now run phenotypeDiagnostics() with the following specifications:
measurementDiagnosticsSample = 1000drugDiagnosticsSample = 1000matchedSample = 1000populationSample = 10000Do you want to check your answer? Go to the following slide!
Are you 100% sure that you’re ready to see the answer?
result <- phenotypeDiagnostics(cohort = cdm[["study_cohorts"]],
databaseDiagnostics = list(),
codelistDiagnostics = list(
"measurementDiagnosticsSample" = 1000,
"drugDiagnosticsSample" = 1000
),
cohortDiagnostics = list(
"cohortSurvival" = TRUE,
"cohortSample" = NULL,
"matchedSample" = 1000
),
populationDiagnostics = list(
"populationSample" = 10000
))Let us now create the shiny app using shinyDiagnostics()! To do that, complete the following spaces:
Help: You can check the arguments of the function using:
??shinyDiagnostics or in the PhenotypeR website
If you were not able to create the shiny app, you can find it here
Use the Shiny App to answer the following questions. The next slide lists questions from 1-20. After that, you’ll find the same questions again with hints to guide you. After that, you’ll find the answers!
Check that database descriptions and clinical descriptions have been uploaded correctly
When does the database observation period start and end?
How many females and males are in the database?
What is the average number of days during the first observation period?
What is the average number of records per person in the drug_exposure table?
According to ACHILLES tables, how many records of essential hypertension (concept ID = 320128) are in the database?
Which is the orphan code for the cohort hypertension with less number of records in the database?
How many people have the concept Prostate cancer screening; prostate specific antigen test (psa) (concept ID = 2617206) in our cohort measurement of prostate specific antigen level?
For our cohort measurement of prostate specific antigen level, how many days are between measurements (in average)?
How many records are excluded after merging overlapping records in the cohort hypertension?
How many females are within the cohort hypertension?
How many people get a prescription of lovastatin 10 MG Oral Tablet (concept ID = 19019115) after 30 days of having an hypertension diagnosis?
Which condition shows the greatest SMD between the matched cohort and the sampled cohort within 1–30 days after warfarin initiation (users_of_warfarin cohort)?
How many people are in both, hypertension and users of warfarin cohorts?
What is the average number of days between people entering the hypertension cohort and then to the users of warfarin cohort?
How many people die in the hypertension cohort? Check if it is aligned with the cohort expectations and explore the survival plot.
What is the incidence of hypertension between 1/01/2009 to 31/12/2009 in our subsample of 10,000?
What is the prevalence of hypertension between 1/01/2009 to 31/12/2009 in our subsample of 10,000?
What is the prevalence of hypertension between 1/01/2009 to 31/12/2009 in our subsample of 10,000 only among Females?
Save the prevalence plot as a png image.
Check that database descriptions and clinical descriptions have been uploaded correctly
When does the database observation period start and end?
How many females and males are in the database?
What is the average number of days during the first observation period?
What is the average number of records per person in the drug_exposure table?
According to ACHILLES tables, how many records of essential hypertension (concept ID = 320128) are in the database?
Which is the orphan code for the cohort hypertension with less number of records in the database?
How many people have the concept Prostate cancer screening; prostate specific antigen test (psa) (concept ID = 2617206) in our cohort measurement of prostate specific antigen level?
For our cohort measurement of prostate specific antigen level, how many days are between measurements (in average)?
How many records are excluded after merging overlapping records in the cohort hypertension?
How many females are within the cohort hypertension?
How many people get a prescription of lovastatin 10 MG Oral Tablet (concept ID = 19019115) after 30 days of having an hypertension diagnosis?
Which condition shows the greatest SMD between the matched cohort and the sampled cohort within 1–30 days after warfarin initiation (users_of_warfarin cohort)?
How many people are in both, hypertension and users of warfarin cohorts?
What is the average number of days between people entering the hypertension cohort and then to the users of warfarin cohort?
How many people die in the hypertension cohort? Check if it is aligned with the cohort expectations and explore the survival plot.
What is the incidence of hypertension between 1/01/2009 to 31/12/2009 in our subsample of 10,000?
What is the prevalence of hypertension between 1/01/2009 to 31/12/2009 in our subsample of 10,000?
What is the prevalence of hypertension between 1/01/2009 to 31/12/2009 in our subsample of 10,000 only among Females?