Creating Analysis Specification
Anthony G. Sena
Creating an analysis specification
This walk through will show how to use Strategus
define an analysis specification on an example study using cohorts from
the example problem What is the risk of gastrointestinal (GI) bleed
in new users of celecoxib compared to new users of diclofenac? as
described in the Book
Of OHDSI Chapter 12 on Population Level Estimation
Setting up your R environment
Use renv
and the renv.lock
from the Strategus study template to set up your R environment. This is
done by copying the renv.lock
file into the root of a new
project and the restore of the environment is done by calling renv::restore().
This will ensure that you have all of the R dependencies including the
OHDSI HADES libraries and Strategus. The following code will download
the renv.lock file to your machine, install renv
restore the R environment:
Cohorts for the study
To start, we’ll need to define cohorts and negative control outcomes
to use in our example analysis specification. We’ve included the cohorts
and negative control outcomes in the Strategus
package for
this example and the code below will load them for use when assembling
the analysis specification.
cohortDefinitionSet <- CohortGenerator::getCohortDefinitionSet(
settingsFileName = "testdata/Cohorts.csv",
jsonFolder = "testdata/cohorts",
sqlFolder = "testdata/sql",
packageName = "Strategus"
ncoCohortSet <- CohortGenerator::readCsv(file = system.file("testdata/negative_controls_concept_set.csv",
package = "Strategus"
Assembling HADES modules
The building blocks of the Strategus
specification are HADES modules. For purposes of this walk through, a
module is a specific analytic task you would like to perform. As shown
in the subsequent sections, the high-level pattern for using a module
consists of:
- Instantiate the module object. For example, CohortGenerator’s module
is created using:
cg <- CohortGeneratorModule$new()
- Create the module specifications using the settings function(s) from the module. See the module list for more details.
- Compose the analysis pipeline from one or more module settings.
CohortGenerator Module Settings
The following code instantiates a new CohortGenerator module
. cgModule
then exposes functions we
can use for creating the module specifications to add to the analysis
specification. In the analysis specification, we will add the cohort
definitions and negative control outcomes to the
section since these elements may be used by
any of the HADES modules. To do this, we will use the
functions of the CohortGenerator module. In addition, we will use the
function to specify the cohort
generation settings.
cgModule <- CohortGeneratorModule$new()
# Create the cohort definition shared resource element for the analysis specification
cohortDefinitionSharedResource <- cgModule$createCohortSharedResourceSpecifications(
cohortDefinitionSet = cohortDefinitionSet
# Create the negative control outcome shared resource element for the analysis specification
ncoSharedResource <- cgModule$createNegativeControlOutcomeCohortSharedResourceSpecifications(
negativeControlOutcomeCohortSet = ncoCohortSet,
occurrenceType = "all",
detectOnDescendants = TRUE
# Create the module specification
cohortGeneratorModuleSpecifications <- cgModule$createModuleSpecifications(
generateStats = TRUE
CohortDiagnostics Module Settings
The following code creates the
to run cohort
diagnostics on the cohorts in the study.
cdModule <- CohortDiagnosticsModule$new()
cohortDiagnosticsModuleSpecifications <- cdModule$createModuleSpecifications(
runInclusionStatistics = TRUE,
runIncludedSourceConcepts = TRUE,
runOrphanConcepts = TRUE,
runTimeSeries = FALSE,
runVisitContext = TRUE,
runBreakdownIndexEvents = TRUE,
runIncidenceRate = TRUE,
runCohortRelationship = TRUE,
runTemporalCohortCharacterization = TRUE
CohortIncidence Module Settings
The following code creates the
to perform an incidence
rate analysis for the target cohorts and outcome in this study.
ciModule <- CohortIncidenceModule$new()
targets <- list(
CohortIncidence::createCohortRef(id = 1, name = "Celecoxib"),
CohortIncidence::createCohortRef(id = 2, name = "Diclofenac"),
CohortIncidence::createCohortRef(id = 4, name = "Celecoxib Age >= 30"),
CohortIncidence::createCohortRef(id = 5, name = "Diclofenac Age >= 30")
outcomes <- list(CohortIncidence::createOutcomeDef(id = 1, name = "GI bleed", cohortId = 3, cleanWindow = 9999))
tars <- list(
CohortIncidence::createTimeAtRiskDef(id = 1, startWith = "start", endWith = "end"),
CohortIncidence::createTimeAtRiskDef(id = 2, startWith = "start", endWith = "start", endOffset = 365)
analysis1 <- CohortIncidence::createIncidenceAnalysis(
targets = c(1, 2, 4, 5),
outcomes = c(1),
tars = c(1, 2)
irDesign <- CohortIncidence::createIncidenceDesign(
targetDefs = targets,
outcomeDefs = outcomes,
tars = tars,
analysisList = list(analysis1),
strataSettings = CohortIncidence::createStrataSettings(
byYear = TRUE,
byGender = TRUE
cohortIncidenceModuleSpecifications <- ciModule$createModuleSpecifications(
irDesign = irDesign$toList()
Characterization Module Settings
The following code creates the
to perform an
characterization analysis for the target cohorts and outcome in this
cModule <- CharacterizationModule$new()
characterizationModuleSpecifications <- cModule$createModuleSpecifications(
targetIds = c(1, 2),
outcomeIds = 3
CohortMethod Module Settings
The following code creates the
to perform a comparative
cohort analysis for this study.
cmModule <- CohortMethodModule$new()
negativeControlOutcomes <- lapply(
X = ncoCohortSet$cohortId,
FUN = CohortMethod::createOutcome,
outcomeOfInterest = FALSE,
trueEffectSize = 1,
priorOutcomeLookback = 30
outcomesOfInterest <- lapply(
X = 3,
FUN = CohortMethod::createOutcome,
outcomeOfInterest = TRUE
outcomes <- append(
tcos1 <- CohortMethod::createTargetComparatorOutcomes(
targetId = 1,
comparatorId = 2,
outcomes = outcomes,
excludedCovariateConceptIds = c(1118084, 1124300)
tcos2 <- CohortMethod::createTargetComparatorOutcomes(
targetId = 4,
comparatorId = 5,
outcomes = outcomes,
excludedCovariateConceptIds = c(1118084, 1124300)
targetComparatorOutcomesList <- list(tcos1, tcos2)
covarSettings <- FeatureExtraction::createDefaultCovariateSettings(addDescendantsToExclude = TRUE)
getDbCmDataArgs <- CohortMethod::createGetDbCohortMethodDataArgs(
washoutPeriod = 183,
firstExposureOnly = TRUE,
removeDuplicateSubjects = "remove all",
maxCohortSize = 100000,
covariateSettings = covarSettings
createStudyPopArgs <- CohortMethod::createCreateStudyPopulationArgs(
minDaysAtRisk = 1,
riskWindowStart = 0,
startAnchor = "cohort start",
riskWindowEnd = 30,
endAnchor = "cohort end"
matchOnPsArgs <- CohortMethod::createMatchOnPsArgs()
fitOutcomeModelArgs <- CohortMethod::createFitOutcomeModelArgs(modelType = "cox")
createPsArgs <- CohortMethod::createCreatePsArgs(
stopOnError = FALSE,
control = Cyclops::createControl(cvRepetitions = 1)
computeSharedCovBalArgs <- CohortMethod::createComputeCovariateBalanceArgs()
computeCovBalArgs <- CohortMethod::createComputeCovariateBalanceArgs(
covariateFilter = FeatureExtraction::getDefaultTable1Specifications()
cmAnalysis1 <- CohortMethod::createCmAnalysis(
analysisId = 1,
description = "No matching, simple outcome model",
getDbCohortMethodDataArgs = getDbCmDataArgs,
createStudyPopArgs = createStudyPopArgs,
fitOutcomeModelArgs = fitOutcomeModelArgs
cmAnalysis2 <- CohortMethod::createCmAnalysis(
analysisId = 2,
description = "Matching on ps and covariates, simple outcomeModel",
getDbCohortMethodDataArgs = getDbCmDataArgs,
createStudyPopArgs = createStudyPopArgs,
createPsArgs = createPsArgs,
matchOnPsArgs = matchOnPsArgs,
computeSharedCovariateBalanceArgs = computeSharedCovBalArgs,
computeCovariateBalanceArgs = computeCovBalArgs,
fitOutcomeModelArgs = fitOutcomeModelArgs
cmAnalysisList <- list(cmAnalysis1, cmAnalysis2)
analysesToExclude <- NULL
cohortMethodModuleSpecifications <- cmModule$createModuleSpecifications(
cmAnalysisList = cmAnalysisList,
targetComparatorOutcomesList = targetComparatorOutcomesList,
analysesToExclude = analysesToExclude
SelfControlledCaseSeries Module Settings
The following code creates the sccsModuleSpecifications
to perform a self-controlled case series analysis for this study.
sccsModule <- SelfControlledCaseSeriesModule$new()
# Exposures-outcomes -----------------------------------------------------------
negativeControlOutcomeIds <- ncoCohortSet$cohortId
outcomeOfInterestIds <- c(3)
exposureOfInterestIds <- c(1, 2)
exposuresOutcomeList <- list()
for (exposureOfInterestId in exposureOfInterestIds) {
for (outcomeOfInterestId in outcomeOfInterestIds) {
exposuresOutcomeList[[length(exposuresOutcomeList) + 1]] <- SelfControlledCaseSeries::createExposuresOutcome(
outcomeId = outcomeOfInterestId,
exposures = list(SelfControlledCaseSeries::createExposure(exposureId = exposureOfInterestId))
for (negativeControlOutcomeId in negativeControlOutcomeIds) {
exposuresOutcomeList[[length(exposuresOutcomeList) + 1]] <- SelfControlledCaseSeries::createExposuresOutcome(
outcomeId = negativeControlOutcomeId,
exposures = list(SelfControlledCaseSeries::createExposure(exposureId = exposureOfInterestId, trueEffectSize = 1))
# Analysis settings ------------------------------------------------------------
getDbSccsDataArgs <- SelfControlledCaseSeries::createGetDbSccsDataArgs(
studyStartDate = "",
studyEndDate = "",
maxCasesPerOutcome = 1e6,
useNestingCohort = TRUE,
nestingCohortId = 1,
deleteCovariatesSmallCount = 0
createStudyPopulation6AndOlderArgs <- SelfControlledCaseSeries::createCreateStudyPopulationArgs(
minAge = 18,
naivePeriod = 365
covarPreExp <- SelfControlledCaseSeries::createEraCovariateSettings(
label = "Pre-exposure",
includeEraIds = "exposureId",
start = -30,
end = -1,
endAnchor = "era start"
covarExposureOfInt <- SelfControlledCaseSeries::createEraCovariateSettings(
label = "Main",
includeEraIds = "exposureId",
start = 0,
startAnchor = "era start",
end = 0,
endAnchor = "era end",
profileLikelihood = TRUE,
exposureOfInterest = TRUE
calendarTimeSettings <- SelfControlledCaseSeries::createCalendarTimeCovariateSettings(
calendarTimeKnots = 5,
allowRegularization = TRUE,
computeConfidenceIntervals = FALSE
seasonalitySettings <- SelfControlledCaseSeries::createSeasonalityCovariateSettings(
seasonKnots = 5,
allowRegularization = TRUE,
computeConfidenceIntervals = FALSE
createSccsIntervalDataArgs <- SelfControlledCaseSeries::createCreateSccsIntervalDataArgs(
eraCovariateSettings = list(covarPreExp, covarExposureOfInt),
seasonalityCovariateSettings = seasonalitySettings,
calendarTimeCovariateSettings = calendarTimeSettings,
minCasesForTimeCovariates = 100000
fitSccsModelArgs <- SelfControlledCaseSeries::createFitSccsModelArgs(
control = Cyclops::createControl(
cvType = "auto",
selectorType = "byPid",
startingVariance = 0.1,
seed = 1,
resetCoefficients = TRUE,
noiseLevel = "quiet"
sccsAnalysis1 <- SelfControlledCaseSeries::createSccsAnalysis(
analysisId = 1,
description = "SCCS age 18-",
getDbSccsDataArgs = getDbSccsDataArgs,
createStudyPopulationArgs = createStudyPopulation6AndOlderArgs,
createIntervalDataArgs = createSccsIntervalDataArgs,
fitSccsModelArgs = fitSccsModelArgs
sccsAnalysisList <- list(sccsAnalysis1)
# SCCS module specs ------------------------------------------------------------
sccsModuleSpecifications <- sccsModule$createModuleSpecifications(
sccsAnalysisList = sccsAnalysisList,
exposuresOutcomeList = exposuresOutcomeList,
combineDataFetchAcrossOutcomes = FALSE
PatientLevelPrediction Module Settings
The following code creates the plpModuleSpecifications
to perform a patient-level prediction analysis for this study.
plpModule <- PatientLevelPredictionModule$new()
makeModelDesignSettings <- function(targetId, outcomeId, popSettings, covarSettings) {
targetId = targetId,
outcomeId = outcomeId,
restrictPlpDataSettings = PatientLevelPrediction::createRestrictPlpDataSettings(),
populationSettings = popSettings,
covariateSettings = covarSettings,
preprocessSettings = PatientLevelPrediction::createPreprocessSettings(),
modelSettings = PatientLevelPrediction::setLassoLogisticRegression(),
splitSettings = PatientLevelPrediction::createDefaultSplitSetting(),
runCovariateSummary = T
plpPopulationSettings <- PatientLevelPrediction::createStudyPopulationSettings(
startAnchor = "cohort start",
riskWindowStart = 1,
endAnchor = "cohort start",
riskWindowEnd = 365,
minTimeAtRisk = 1
plpCovarSettings <- FeatureExtraction::createDefaultCovariateSettings()
modelDesignList <- list()
for (i in 1:length(exposureOfInterestIds)) {
for (j in 1:length(outcomeOfInterestIds)) {
modelDesignList <- append(
targetId = exposureOfInterestIds[i],
outcomeId = outcomeOfInterestIds[j],
popSettings = plpPopulationSettings,
covarSettings = plpCovarSettings
plpModuleSpecifications <- plpModule$createModuleSpecifications(
modelDesignList = modelDesignList
Strategus Analysis Specifications
Finally, we will use the various shared resources and module specifications to construct the full set of analysis specifications and save it to the file system in JSON format.
analysisSpecifications <- createEmptyAnalysisSpecificiations() %>%
addSharedResources(cohortDefinitionSharedResource) %>%
addSharedResources(ncoSharedResource) %>%
addModuleSpecifications(cohortGeneratorModuleSpecifications) %>%
addModuleSpecifications(cohortDiagnosticsModuleSpecifications) %>%
addModuleSpecifications(cohortIncidenceModuleSpecifications) %>%
addModuleSpecifications(characterizationModuleSpecifications) %>%
addModuleSpecifications(cohortMethodModuleSpecifications) %>%
addModuleSpecifications(sccsModuleSpecifications) %>%
ParallelLogger::saveSettingsToJson(analysisSpecifications, file.path(params$analysisSettingsPath, params$analysisSettingsFileName))