vignettes/EvaluatingPhenotypeAlgorithms.rmd
EvaluatingPhenotypeAlgorithms.rmd
The PheValuator
package enables evaluating the
performance characteristics of phenotype algorithms (PAs) using data
from databases that are translated into the Observational Medical
Outcomes Partnership Common Data Model (OMOP CDM).
This vignette describes how to run the PheValuator process from start
to end in the PheValuator
package.
There are several steps in performing a PA evaluation:
Each of these steps is described in detail below. For this vignette, we will describe the evaluation of PAs for type 2 diabetes mellitus (T2DM).
The extremely specific (xSpec), extremely sensitive (xSens), prevalence, and evaluation cohorts are developed using the ATLAS tool. The xSpec is a cohort where the subjects in the cohort are likely to be positive for the health outcome of interest (HOI) with a very high probability. This may be achieved by requiring that subjects have multiple condition codes for the HOI in their patient record. An example of this for T2DM is included in the OHDSI ATLAS repository. In this example each subject has at least 2 diagnosis codes for T2DM in days 21 to 1 prior to an index clinical visit, the first of which is the the first diagnosis in the patients history. The algorithm also excludes subjects with type 1 DM (T1DM) any time in their record. This is a very specific algorithm for T2DM as it ensures that the subjects in this cohort have a very high probability for having the condition of T2DM. This PA also specifies that subjects are required to have at least 365 days of prior and post-index observation in their patient record.
Quick Tip: When building the xSpec cohort, the days prior where you are looking for 2 or more diagnosis codes for the condition should be altered depending on the condition. For very rare conditions you may need to expand it to 60 to 1 days prior to index. For acute conditions such as myocardial infarction it is best to have a very short interval such as 1 to 1 day prior to index (i.e., the day prior to index)
An example of an xSens cohort is created by developing a PA that is very sensitive for the HOI. The system uses the xSens cohort to create a set of “noisy” negative subjects, i.e., subjects with a high likelihood of not having the HOI. This group of subjects will be used in the model building process and is described in detail below. An example of an xSens cohort for T2DM is also in the OHDSI ATLAS repository.
The system uses the prevalence cohort to provide a reasonable approximation of the prevalence of the HOI in the population. This improves the calibration of the predictive model. This group of subjects will be used in the model building process and is described in detail below. An example of an prevalence cohort for T2DM is also in the OHDSI ATLAS repository.
PheValuator uses the evaluation cohort to define a specific set of subjects to test for the outcome of interest and uses these subjects to evaluate the PAs. An example of an evaluation cohort for T2DM is also in the OHDSI ATLAS repository.
Quick Tip: You can build your required cohorts easily by exporting the JSON in the examples above and copying them into your repository. The only changes you should need to make are changing the concept sets for the condition and the days prior to index for searching for multiple diagnosis codes in the xSpec cohort.
For the example below, two more phenotype algorithms were created:
The basic steps in the process are as follows:
For example:
options(andromedaTempFolder = "c:/temp2/ff") #place to store large temporary files
CovSettings <- createDefaultCovariateSettings(excludedCovariateConceptIds = c(201254),
addDescendantsToExclude = TRUE,
startDayWindow1 = 0,
endDayWindow1 = 30,
startDayWindow2 = 31,
endDayWindow2 = 180,
startDayWindow3 = 181,
endDayWindow3 = 365)
Quick Tip: The feature extraction windows above are useful for chronic conditions. For acute conditions, such as myocardial infarction, you should use:
options(andromedaTempFolder = "c:/temp2/ff") #place to store large temporary files
CovSettings <- createDefaultCovariateSettings(startDayWindow1 = 0,
endDayWindow1 = 10,
startDayWindow2 = 11,
endDayWindow2 = 20,
startDayWindow3 = 21,
endDayWindow3 = 30)
For example:
CohortArgs <- createCreateEvaluationCohortArgs(xSpecCohortId = 1769699,
xSensCohortId = 1770120,
prevalenceCohortId = 1770119,
evaluationPopulationCohortId = 1778258,
covariateSettings = CovSettings)
For example:
#First phenotype algorithm to test
conditionAlg1TestArgs <- createTestPhenotypeAlgorithmArgs(phenotypeCohortId = 1778259)
For example:
analysis1 <- createPheValuatorAnalysis(analysisId = 1,
description = "[PheValuator] Type 2 Diabetes Mellitus
(prevalent)",
createEvaluationCohortArgs = CohortArgs,
testPhenotypeAlgorithmArgs = conditionAlg1TestArgs)
Create as many analyses as needed:
For example:
#Second phenotype algorithm to test
conditionAlg2TestArgs <- createTestPhenotypeAlgorithmArgs(phenotypeCohortId = 1778260,
washoutPeriod = 0)
analysis2 <- createPheValuatorAnalysis(analysisId = 2,
description = "[PheValuator] Type 2 diabetes mellitus
with second code 31-365 days after index",
createEvaluationCohortArgs = CohortArgs,
testPhenotypeAlgorithmArgs = conditionAlg2TestArgs)
#save the analyses
pheValuatorAnalysisList <- list(analysis1, analysis2)
For example:
#create database connection details
connectionDetails <- createConnectionDetails(dbms = "postgresql",
server = "localhost/ohdsi",
user = "joe",
password = "supersecret")
#run the PheValuator process
referenceTable <- runPheValuatorAnalyses(connectionDetails = connectionDetails,
cdmDatabaseSchema = "yourCDMSchema",
cohortDatabaseSchema = "yourCohortSchema",
cohortTable = "yourCohortTableSchema",
workDatabaseSchema = "yourWritableSchema",
outputFolder = "yourOutputFolderSchema",
pheValuatorAnalysisList = pheValuatorAnalysisList)
For example:
#view the results of the phenotype evaluation
View(summarizePheValuatorAnalyses(referenceTable, "yourOutputFolderSchema"))
write.csv(phenotypeResults, "c:/phenotyping/diabetes_results.csv", row.names = FALSE)
The results from above will look like:
The runPheValuatorAnalyses() function will produce the following artifacts: