Using Observational Health Data Sciences and Informatics (OHDSI) Report Generator

Using the OhdsiReportGenerator Package

This package enables users to extract characterization, estimation and prediction results from the OHDSI results (see result model) generated via R packages such as PatientLevelPrediction, Characterization, CohortMethod and SelfControlledCaseSeries.

Example Result Database

This package provides users with an example OHDSI result database. This can be accessed via getExampleConnectionDetails() and contains the results of running a demo analysis using the Eunomia R package sample OMOP CDM data.

library(OhdsiReportGenerator)
# create a connection details object to 
connectionDetails <- getExampleConnectionDetails()

# create a connection handler to the results
library(ResultModelManager)

## Loading required package: R6

## Loading required package: DatabaseConnector

ConnectionHandler <- ResultModelManager::ConnectionHandler$new(connectionDetails)

## Connecting using SQLite driver

Extracting Cohorts

The cohort details can be extracted using the function getCohortDefinitions that requires the user to input the connection handler, schema, table prefix used by CohortGenerator (default is ‘cg_’) and target cohort ids to restrict to.

The example results are in a sqlite database and use the schema ‘main’.

cohorts <- getCohortDefinitions(
    connectionHandler = ConnectionHandler,
    schema = 'main',
    cgTablePrefix = 'cg_',
    targetIds = NULL
)

knitr::kable(
  x = cohorts %>% 
    dplyr::select(-"json", -"sqlCommand"), 
  caption = 'The data.frame extracted containing the cohort details minus the json and sqlCommand columns.'
  )

The data.frame extracted containing the cohort details minus the json and sqlCommand columns.
cohortDefinitionId	cohortName	description	subsetParent	isSubset	subsetDefinitionId
1	Celecoxib	NA	1	NA	NA
2	Diclofenac	NA	2	NA	NA
3	GI bleed	NA	3	NA	NA
1001	Celecoxib - age 18 to 64 Age 18 to 64	NA	1	NA	1
2001	Diclofenac - age 18 to 64 Age 18 to 64	NA	2	NA	1
1002	Celecoxib - first event with 365 prior obs first event with 365 prior obs	NA	1	NA	2
2002	Diclofenac - first event with 365 prior obs first event with 365 prior obs	NA	2	NA	2
1003	Celecoxib - age 18 to 64 and first event with 365 prior obs Age 18 to 64, first event with 365 prior obs	NA	1	NA	3
2003	Diclofenac - age 18 to 64 and first event with 365 prior obs Age 18 to 64, first event with 365 prior obs	NA	2	NA	3

You can process the cohorts definitions to extract the parent cohorts and the children for each parent cohort.

if(nrow(cohorts) > 0){
processedCohorts <- processCohorts(cohorts)

knitr::kable(
  x = data.frame(
    parentId = processedCohorts$parents,
    parentName= names(processedCohorts$parents)
    ), 
  caption = 'The parent cohorts.'
  )

knitr::kable(
  x = processedCohorts$cohortList[[1]] %>% 
    dplyr::select(-"json", -"sqlCommand"),
  caption = 'The children/subset cohorts for the first parent cohort.'
  )
}

The children/subset cohorts for the first parent cohort.
cohortDefinitionId	cohortName	description	subsetParent	isSubset	subsetDefinitionId
1	Celecoxib	NA	1	NA	NA
1001	Celecoxib - age 18 to 64 Age 18 to 64	NA	1	NA	1
1002	Celecoxib - first event with 365 prior obs first event with 365 prior obs	NA	1	NA	2
1003	Celecoxib - age 18 to 64 and first event with 365 prior obs Age 18 to 64, first event with 365 prior obs	NA	1	NA	3

For example, if you created a cohort corresponding to all patients exposed to drug A with an id of 1 and then created ‘children’ subset cohorts corresponding to patients exposed to drug A for indication X with an id 1001 and patients exposed to drug A aged 60+ with a cohort id 1002, then getCohortDefinition() would extract all three of these cohorts and processCohorts() would identify cohort 1 as the parent and cohorts 1,1001,1002 as the children/subset of cohort 1.

You can also extract the subset logic for a subset id:

subsets <- getCohortSubsetDefinitions(
    connectionHandler = ConnectionHandler,
    schema = 'main',
    cgTablePrefix = 'cg_'
)

knitr::kable(
  x = subsets, 
  caption = 'The subset cohort logic used in the analysis.'
  )

The subset cohort logic used in the analysis.
subsetDefinitionId	json
1	{

“name”: “age 18 to 64”, “definitionId”: 1, “subsetOperators”: [ { “name”: “Age 18 to 64”, “subsetType”: “DemographicSubsetOperator”, “ageMin”: 18, “ageMax”: 64 } ], “packageVersion”: “0.11.2”, “identifierExpression”: “targetId * 1000 + definitionId”, “operatorNameConcatString”: “,”, “subsetCohortNameTemplate”: “@baseCohortName - @subsetDefinitionName @operatorNames” } | | 2|{ “name”: “first event with 365 prior obs”, “definitionId”: 2, “subsetOperators”: [ { “name”: “first event with 365 prior obs”, “subsetType”: “LimitSubsetOperator”, “priorTime”: 365, “followUpTime”: 0, “limitTo”: “firstEver” } ], “packageVersion”: “0.11.2”, “identifierExpression”: “targetId * 1000 + definitionId”, “operatorNameConcatString”: “,”, “subsetCohortNameTemplate”: “@baseCohortName - @subsetDefinitionName @operatorNames” } | | 3|{ “name”: “age 18 to 64 and first event with 365 prior obs”, “definitionId”: 3, “subsetOperators”: [ { “name”: “Age 18 to 64”, “subsetType”: “DemographicSubsetOperator”, “ageMin”: 18, “ageMax”: 64 }, { “name”: “first event with 365 prior obs”, “subsetType”: “LimitSubsetOperator”, “priorTime”: 365, “followUpTime”: 0, “limitTo”: “firstEver” } ], “packageVersion”: “0.11.2”, “identifierExpression”: “targetId * 1000 + definitionId”, “operatorNameConcatString”: “,”, “subsetCohortNameTemplate”: “@baseCohortName - @subsetDefinitionName @operatorNames” } |

Extracting Characterization Results

The characterization analysis contains time to event results, dechallenge-rechallenge, risk factor and cohort incidence results.

The time-to-event results tell you how often a outcome occurs relative to some target cohort index date on a 1-day, 30-day and 365-day scale.

tte <- getTimeToEvent(
    connectionHandler = ConnectionHandler,
    schema = 'main'
)

knitr::kable(
  x = tte %>% dplyr::filter(.data$timeScale == 'per 365-day'), 
  caption = 'Example time-to-event results for the 365-day scale.'
  )

Example time-to-event results for the 365-day scale.
databaseName	targetName	targetId	outcomeName	outcomeId	outcomeType	targetOutcomeType	timeToEvent	numEvents	timeScale
Synthea	Celecoxib	1	GI bleed	3	first	After last target end	365	327	per 365-day
Synthea	Celecoxib	1	GI bleed	3	first	Before first target start	365	28	per 365-day
Synthea	Diclofenac	2	GI bleed	3	first	After last target end	365	114	per 365-day
Synthea	Diclofenac	2	GI bleed	3	first	Before first target start	365	10	per 365-day
Synthea	Celecoxib - age 18 to 64 Age 18 to 64	1001	GI bleed	3	first	After last target end	365	327	per 365-day
Synthea	Celecoxib - age 18 to 64 Age 18 to 64	1001	GI bleed	3	first	Before first target start	365	28	per 365-day
Synthea	Diclofenac - age 18 to 64 Age 18 to 64	2001	GI bleed	3	first	After last target end	365	114	per 365-day
Synthea	Diclofenac - age 18 to 64 Age 18 to 64	2001	GI bleed	3	first	Before first target start	365	10	per 365-day

The dechallenge-rechallenge analysis shows you how often an outcome occurs during some target exposure and then how often the target exposure is stopped shortly after the outcome and whether the outcome re-occurs when the target exposure restarts:

drc <- getDechallengeRechallenge(
    connectionHandler = ConnectionHandler,
    schema = 'main'
)

knitr::kable(
  x = drc, 
  caption = 'Example dechallenge-rechallenge results.'
  )

Example dechallenge-rechallenge results.
databaseName	targetName	targetId	outcomeName	outcomeId	dechallengeStopInterval	dechallengeEvaluationWindow	numExposureEras	numPersonsExposed	numCases	dechallengeAttempt	dechallengeFail	dechallengeSuccess	rechallengeAttempt	rechallengeFail	rechallengeSuccess	pctDechallengeAttempt	pctDechallengeFail	pctDechallengeSuccess	pctRechallengeAttempt	pctRechallengeFail	pctRechallengeSuccess
Synthea	Celecoxib	1	GI bleed	3	30	30	100	80	20	15	10	5	5	1	4	0.75	0.67	0.33	1	0.2	0.8

It is also possible to find the incidence rates (where we restrict to all ages, genders and start years):

ir <- getIncidenceRates(
    connectionHandler = ConnectionHandler,
    schema = 'main', 
    targetIds = 1
)

knitr::kable(
  x = ir %>% dplyr::filter(
    subgroupName == 'All' &
    ageGroupName == 'Any' & 
    genderName == 'Any' &   
    startYear == 'Any'
  ), 
  caption = 'Example incidence rate results.'
  )

Example incidence rate results.
databaseName	targetName	targetId	outcomeName	outcomeId	cleanWindow	subgroupName	ageGroupName	genderName	startYear	tarStartWith	tarStartOffset	tarEndWith	tarEndOffset	personsAtRiskPe	personsAtRisk	personDaysPe	personDays	personOutcomesPe	personOutcomes	outcomesPe	outcomes	incidenceProportionP100p	incidenceRateP100py
Synthea	Celecoxib	1	GI bleed	3	9999	All	Any	Any	Any	start	0	end	0	1800	1800	1800	1800	0	0	0	0	0	0
Synthea	Celecoxib	1	GI bleed	3	9999	All	Any	Any	Any	start	0	start	365	1800	1800	648214	540789	0	0	0	0	0	0

Finally, it is possible to get risk factors (associations between features and the occurrence of the outcome during a time-at-risk) using getBinaryRiskFactors to identify the binary features:

rf <- getBinaryRiskFactors(
    connectionHandler = ConnectionHandler,
    schema = 'main', 
    targetId = 1, 
    outcomeId = 3
)

knitr::kable(
  x = rf, 
  caption = 'Example risk factors for binary features results.'
  )

Example risk factors for binary features results.
databaseName	targetName	targetCohortId	outcomeName	outcomeCohortId	minPriorObservation	outcomeWashoutDays	riskWindowStart	riskWindowEnd	startAnchor	endAnchor	covariateName	caseCount	caseAverage	nonCaseCount	nonCaseAverage	SMD	absSMD
Synthea	Celecoxib	1	GI bleed	3	365	365	1	365	cohort start	cohort end	age group: 30 - 34	35	0.0985916	171	0.1183391	-0.0635357	0.0635357
Synthea	Celecoxib	1	GI bleed	3	365	365	1	365	cohort start	cohort end	age group: 35 - 39	177	0.4985915	685	0.4740484	0.0789649	0.0789649
Synthea	Celecoxib	1	GI bleed	3	365	365	1	365	cohort start	cohort end	age group: 40 - 44	129	0.3633803	531	0.3674740	-0.0131713	0.0131713
Synthea	Celecoxib	1	GI bleed	3	365	365	1	365	cohort start	cohort end	age group: 45 - 49	14	0.0394366	58	0.0401384	-0.0022579	0.0022579

and getContinuousRiskFactors for the continuous features:

rf <- getContinuousRiskFactors(
    connectionHandler = ConnectionHandler,
    schema = 'main', 
    targetId = 1, 
    outcomeId = 3
)

knitr::kable(
  x = rf, 
  caption = 'Example risk factors for continuous features results.'
  )

Example risk factors for continuous features results.
databaseName	targetName	targetCohortId	minPriorObservation	covariateName	covariateId	targetCountValue	targetMinValue	targetMaxValue	targetAverageValue	targetStandardDeviation	targetMedianValue	targetP10Value	targetP25Value	targetP75Value	targetP90Value	outcomeName	outcomeCohortId	outcomeWashoutDays	caseCountValue	caseMinValue	caseMaxValue	caseAverageValue	caseStandardDeviation	caseMedianValue	caseP10Value	caseP25Value	caseP75Value	caseP90Value	SMD	absSMD
Synthea	Celecoxib	1	365	age in years	1002	1800	31	47	38.645	3.321244	39	34	36	41	43	GI bleed	3	365	355	32	46	38.77465	3.274612	39	35	36	41	43	0.0393116	0.0393116
Synthea	Celecoxib	1	365	observation time (days) after index	1009	1800	10	28328	7588.082	5352.386801	6676	1336	3442	10649	14808	GI bleed	3	365	355	10	27390	7524.37465	5611.529121	6708	996	3236	10415	14884	-0.0116180	0.0116180
Synthea	Celecoxib	1	365	observation time (days) prior to index	1008	1800	11369	17044	14114.684	1195.708067	14100	12544	13160	15040	15744	GI bleed	3	365	355	11550	16805	14159.86761	1176.652643	14155	12644	13250	15075	15840	0.0380906	0.0380906
Synthea	Celecoxib	1	365	time (days) between cohort start and end	1010	1800	0	0	0.000	0.000000	0	0	0	0	0	GI bleed	3	365	355	0	0	0.00000	0.000000	0	0	0	0	0	NaN	NaN
Synthea	Celecoxib	1	365	visit_occurrence concept count during day -365 through 0 concept_count relative to index: Inpatient Visit	9201923	-5	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA

Extracting Prediction Results

The model designs used to develop prediction models in an analysis can be extracted using getPredictionModelDesigns:

modelDesigns <- getPredictionModelDesigns(
    connectionHandler = ConnectionHandler,
    schema = 'main', 
    targetIds = 1002, 
    outcomeIds = 3
)

knitr::kable(
  x = modelDesigns %>%
    dplyr::select(
      "modelDesignId",  
      "modelType",
      "developmentTargetId",
      "developmentTargetName",
      "developmentOutcomeId",   
      "developmentOutcomeName", 
      "timeAtRisk",
      "meanAuroc", 
      "noDevelopmentDatabases"
    ), 
  caption = 'Example model designs for the prediction results.'
  )

Example model designs for the prediction results.
modelDesignId	modelType	developmentTargetId	developmentTargetName	developmentOutcomeId	developmentOutcomeName	timeAtRisk	meanAuroc	noDevelopmentDatabases
1	logistic	1002	Celecoxib - first event with 365 prior obs first event with 365 prior obs	3	GI bleed	( cohort start+1 ) - ( cohort start+365 )	0.6885703	1

Once you know the modelDesignId of interest, you can then extract the model performance:

perform <- getPredictionPerformances(
    connectionHandler = ConnectionHandler,
    schema = 'main', 
    modelDesignId = 1
)

knitr::kable(
  x = perform, 
  caption = 'Example performance for the prediction results.'
  )

Example performance for the prediction results.
performanceId	modelDesignId	developmentDatabaseId	validationDatabaseId	developmentTargetId	developmentTargetName	developmentOutcomeId	developmentOutcomeName	developmentDatabase	validationDatabase	validationTargetName	validationOutcomeName	timeStamp	auroc	auroc95lb	auroc95ub	calibrationInLarge	eStatistic	brierScore	auprc	populationSize	outcomeCount	evalPercent	outcomePercent	validationTimeAtRisk	predictionResultType
1	1	1	1	1002	Celecoxib - first event with 365 prior obs first event with 365 prior obs	1002	Celecoxib - first event with 365 prior obs first event with 365 prior obs	Synthea	Synthea	Celecoxib - first event with 365 prior obs first event with 365 prior obs	GI bleed	2025-02-13	0.6885703	0.6264985	0.7506421	0.0976443	0.0318938	0.0720927	0.3085693	1800	355	24.94444	19.72222	( cohort start+1 ) - ( cohort start+365 )	Development

As well as the top predictors, in this case we will return the top 5:

top5 <- getPredictionTopPredictors(
    connectionHandler = ConnectionHandler,
    schema = 'main', 
    targetIds = 1002, 
    outcomeIds = 3,
    numberPredictors = 5
)

knitr::kable(
  x = top5 , 
  caption = 'Example top five predictors.'
  )

Example top five predictors.
databaseName	tarStartDay	tarStartAnchor	tarEndDay	tarEndAnchor	performanceId	covariateId	covariateName	conceptId	covariateValue	covariateCount	covariateMean	covariateStDev	withNoOutcomeCovariateCount	withNoOutcomeCovariateMean	withNoOutcomeCovariateStDev	withOutcomeCovariateCount	withOutcomeCovariateMean	withOutcomeCovariateStDev	standardizedMeanDiff	rn
Synthea	1	cohort start	365	cohort start	1	1901	Charlson index - Romano adaptation	0	1.4448410	935	0.6144444	0.6533777	660	0.5307958	0.6301972	275	0.9549296	0.6352971	0.6702998	1
Synthea	1	cohort start	365	cohort start	1	4266809210	condition_era group (ConditionGroupEraLongTerm) during day -365 through 0 days relative to index: Diverticular disease	4266809	0.5045103	338	0.1877778	0.3905346	210	0.1453287	0.3524320	128	0.3605634	0.4801640	0.5110418	2
Synthea	1	cohort start	365	cohort start	1	4285898212	condition_era group (ConditionGroupEraShortTerm) during day -30 through 0 days relative to index: Polyp of colon	4285898	0.4321971	28	0.0155556	0.1237481	13	0.0089965	0.0944225	15	0.0422535	0.2011670	0.2116439	3
Synthea	1	cohort start	365	cohort start	1	2007	index month: 2	0	-0.2439688	124	0.0688889	0.2532651	104	0.0719723	0.2584421	20	0.0563380	0.2305733	-0.0638383	4
Synthea	1	cohort start	365	cohort start	1	6007	index month: 6	0	-0.1786601	144	0.0800000	0.2712932	122	0.0844291	0.2780302	22	0.0619718	0.2411044	-0.0862999	5

You can also select certain results such as the attrition:

attrition <- getPredictionPerformanceTable(
    connectionHandler = ConnectionHandler,
    schema = 'main', 
    table = 'attrition',
    performanceId = 1
)

knitr::kable(
  x = attrition , 
  caption = 'Example prediction attrition for performance 1.'
  )

Example prediction attrition for performance 1.
performanceId	outcomeId	description	targetCount	uniquePeople	outcomes
1	3	Original cohorts	1800	1800	355
1	3	Initial plpData cohort or population	1800	1800	355
1	3	No prior outcome	1800	1800	355
1	3	Removing non-outcome subjects with insufficient time at risk (if any)	1800	1800	355

Extracting Estimation Results

It is possible to extract cohort method and self controlled case series results and make nice plots.

To extract all the cohort method estimations for target id 1002 and outcome 3:

cmEst <- getCMEstimation(
    connectionHandler = ConnectionHandler,
    schema = 'main', 
    targetIds = 1002, 
    comparatorIds = 2002,
    outcomeIds = 3
)

knitr::kable(
  x = cmEst , 
  caption = 'Example cohort method estimation results.'
  )

Example cohort method estimation results.
databaseName	analysisId	description	targetName	targetId	comparatorName	comparatorId	outcomeName	outcomeId	calibratedRr	calibratedCi95Lb	calibratedCi95Ub	calibratedP	calibratedOneSidedP	calibratedLogRr	calibratedSeLogRr	targetSubjects	comparatorSubjects	targetDays	comparatorDays	targetOutcomes	comparatorOutcomes	targetEstimator
Synthea	1	No matching, simple outcome model	Celecoxib - first event with 365 prior obs first event with 365 prior obs	1002	Diclofenac - first event with 365 prior obs first event with 365 prior obs	2002	GI bleed	3	1.832771	0.9724055	3.454372	0.1078385	0.0639169	0.6058291	0.3233791	1800	830	540789	261005	355	124	ate
Synthea	2	Matching on ps and covariates, simple outcomeModel	Celecoxib - first event with 365 prior obs first event with 365 prior obs	1002	Diclofenac - first event with 365 prior obs first event with 365 prior obs	2002	GI bleed	3	NA	NA	NA	NA	NA	NA	NA	829	829	263258	260639	117	124	att

You can also extract any meta-analysis estimate for cohort method:

cmMe <- getCmMetaEstimation(
    connectionHandler = ConnectionHandler,
    schema = 'main'
)

knitr::kable(
  x = cmMe , 
  caption = 'Example cohort method meta analysis estimation results.'
  )

Example cohort method meta analysis estimation results.
databaseName	analysisId	description	targetName	targetId	comparatorName	comparatorId	outcomeName	outcomeId	calibratedRr	calibratedCi95Lb	calibratedCi95Ub	calibratedP	calibratedOneSidedP	calibratedLogRr	calibratedSeLogRr	targetSubjects	comparatorSubjects	targetDays	comparatorDays	targetOutcomes	comparatorOutcomes	nDatabases

and you can create a nice plot to show the results:

plotCmEstimates(
  cmData = cmEst, 
  cmMeta = NULL,
  targetName = 'Example Target', 
  comparatorName = 'Example Comp', 
  selectedAnalysisId = 1
)

Jenna Reps, Anthony Sena

2025-03-20