Introduction

In this vignette we focus on running several different analyses on several target-comparator-outcome combinations. This can be useful when we want to explore the sensitivity to analyses choices, include controls, or run an experiment similar to the OMOP experiment to empirically identify the optimal analysis choices for a particular research question.

This vignette assumes you are already familiar with the CohortMethod package and are able to perform single studies. We will walk through all the steps needed to perform an exemplar set of analyses, and we have selected the well-studied topic of the effect of coxibs versus non-selective nonsteroidal anti-inflammatory drugs (NSAIDs) on gastrointestinal (GI) bleeding-related hospitalization. For simplicity, we focus on one coxib – celecoxib – and one non-selective NSAID – diclofenac. We will execute various variations of an analysis for the primary outcome and a large set of negative control outcomes.

General approach

The general approach to running a set of analyses is that you specify all the function arguments of the functions you would normally call, and create sets of these function arguments. The final outcome models as well as intermediate data objects will all be saved to disk for later extraction.

An analysis will be executed by calling these functions in sequence:

  1. getDbCohortMethodData()
  2. createStudyPopulation()
  3. createPs() (optional)
  4. trimByPs() or trimByPsToEquipoise() (optional)
  5. matchOnPs(), matchOnPsAndCovariates(), stratifyByPs(), or stratifyByPsAndCovariates() (optional)
  6. computeCovariateBalance() (optional)
  7. fitOutcomeModel() (optional)

When you provide several analyses to the CohortMethod package, it will determine whether any of the analyses have anything in common, and will take advantage of this fact. For example, if we specify several analyses that only differ in the way the outcome model is fitted, then CohortMethod will extract the data and fit the propensity model only once, and re-use this in all the analyses.

The function arguments you need to define have been divided into four groups:

  1. Hypothesis of interest: arguments that are specific to a hypothesis of interest, in the case of the cohort method this is a combination of target, comparator, and outcome.
  2. Analyses: arguments that are not directly specific to a hypothesis of interest, such as the washout window, whether to include drugs as covariates, etc.
  3. Arguments that are the output of a previous function in the CohortMethod package, such as the cohortMethodData argument of the createPs function. These cannot be specified by the user.
  4. Arguments that are specific to an environment, such as the connection details for connecting to the server, and the name of the schema holding the CDM data.

There are a two arguments (excludedCovariateConceptIds, and includedCovariateConceptIds of the getDbCohortMethodData() function) that can be argued to be part both of group 1 and 2. These arguments are therefore present in both groups, and when executing the analysis the union of the two lists of concept IDs will be used.

Preparation for the example

We need to tell R how to connect to the server where the data are. CohortMethod uses the DatabaseConnector package, which provides the createConnectionDetails function. Type ?createConnectionDetails for the specific settings required for the various database management systems (DBMS). For example, one might connect to a PostgreSQL database using this code:

connectionDetails <- createConnectionDetails(dbms = "postgresql",
                                             server = "localhost/ohdsi",
                                             user = "joe",
                                             password = "supersecret")

cdmDatabaseSchema <- "my_cdm_data"
cohortDatabaseSchema <- "my_results"
cohortTable <- "my_cohorts"
options(sqlRenderTempEmulationSchema = NULL)

The last few lines define the cdmDatabaseSchema, cohortDatabaseSchema, and cohortTable variables. We’ll use these later to tell R where the data in CDM format live, and where we want to write intermediate tables. Note that for Microsoft SQL Server, databaseschemas need to specify both the database and the schema, so for example cdmDatabaseSchema <- "my_cdm_data.dbo". For database platforms that do not support temp tables, such as Oracle, it is also necessary to provide a schema where the user has write access that can be used to emulate temp tables. PostgreSQL supports temp tables, so we can set options(sqlRenderTempEmulationSchema = NULL) (or not set the sqlRenderTempEmulationSchema at all.)

We need to define the exposures and outcomes for our study. Here, we will define our exposures using the OHDSI Capr package. We define two cohorts, one for celecoxib and one for diclofenac. For each cohort we require a prior diagnosis of ‘osteoarthritis of knee’, and 365 days of continuous prior observation. we restrict to the first exposure per person:

library(Capr)

osteoArthritisOfKneeConceptId <- 4079750
celecoxibConceptId <- 1118084
diclofenacConceptId <- 1124300
osteoArthritisOfKnee <- cs(
  descendants(osteoArthritisOfKneeConceptId),
  name = "Osteoarthritis of knee"
)
attrition = attrition(
  "prior osteoarthritis of knee" = withAll(
    atLeast(1, conditionOccurrence(osteoArthritisOfKnee), 
            duringInterval(eventStarts(-Inf, 0)))
  )
)
celecoxib <- cs(
  descendants(celecoxibConceptId),
  name = "Celecoxib"
)
diclofenac  <- cs(
  descendants(diclofenacConceptId),
  name = "Diclofenac"
)
celecoxibCohort <- cohort(
  entry = entry(
    drugExposure(celecoxib, firstOccurrence()),
    observationWindow = continuousObservation(priorDays = 365)
  ),
  attrition = attrition,
  exit = exit(endStrategy = drugExit(celecoxib,
                                     persistenceWindow = 30,
                                     surveillanceWindow = 0))
)
diclofenacCohort <- cohort(
  entry = entry(
    drugExposure(diclofenac, firstOccurrence()),
    observationWindow = continuousObservation(priorDays = 365)
  ),
  attrition = attrition,
  exit = exit(endStrategy = drugExit(diclofenac,
                                     persistenceWindow = 30,
                                     surveillanceWindow = 0))
)
# Note: this will automatically assign cohort IDs 1 and 2, respectively:
exposureCohorts <- makeCohortSet(celecoxibCohort, diclofenacCohort)

We’ll pull the outcome definition from the OHDSI PhenotypeLibrary:

library(PhenotypeLibrary)
outcomeCohorts <- getPlCohortDefinitionSet(77) # GI bleed

In addition to the outcome of interest, we also want to include a large set of negative control outcomes:

negativeControlIds <- c(29735, 140673, 197494,
                        198185, 198199, 200528, 257315,
                        314658, 317376, 321319, 380731,
                        432661, 432867, 433516, 433701,
                        433753, 435140, 435459, 435524,
                        435783, 436665, 436676, 442619,
                        444252, 444429, 4131756, 4134120,
                        4134454, 4152280, 4165112, 4174262,
                        4182210, 4270490, 4286201, 4289933)
negativeControlCohorts <- tibble(
  cohortId = negativeControlIds,
  cohortName = sprintf("Negative control %d", negativeControlIds),
  outcomeConceptId = negativeControlIds
)

We combine the exposure and outcome cohort definitions, and use CohortGenerator to generate the cohorts:

allCohorts <- bind_rows(outcomeCohorts,
                        exposureCohorts)

library(CohortGenerator)
cohortTableNames <- getCohortTableNames(cohortTable = cohortTable)
createCohortTables(connectionDetails = connectionDetails,
                   cohortDatabaseSchema = cohortDatabaseSchema,
                   cohortTableNames = cohortTableNames)
generateCohortSet(connectionDetails = connectionDetails,
                  cdmDatabaseSchema = cdmDatabaseSchema,
                  cohortDatabaseSchema = cohortDatabaseSchema,
                  cohortTableNames = cohortTableNames,
                  cohortDefinitionSet = allCohorts)
generateNegativeControlOutcomeCohorts(
  connectionDetails = connectionDetails,
  cdmDatabaseSchema = cdmDatabaseSchema,
  cohortDatabaseSchema = cohortDatabaseSchema,
  cohortTable = cohortTable,
  negativeControlOutcomeCohortSet = negativeControlCohorts
)

If all went well, we now have a table with the cohorts of interest. We can see how many entries per cohort:

connection <- DatabaseConnector::connect(connectionDetails)
sql <- "SELECT cohort_definition_id, COUNT(*) AS count FROM @cohortDatabaseSchema.@cohortTable GROUP BY cohort_definition_id"
DatabaseConnector::renderTranslateQuerySql(
  connection = connection, 
  sql = sql,  
  cohortDatabaseSchema = cohortDatabaseSchema, 
  cohortTable = cohortTable
)
DatabaseConnector::disconnect(connection)
##    COHORT_DEFINITION_ID    COUNT
## 1               4182210  3999282
## 2                433516   225285
## 3                197494   109864
## 4                198199   215700
## 5                436665   667992
## 6                432661      990
## 7                257315   947702
## 8                200528   929099
## 9               4286201   209643
## 10                29735   306564
## 11               433701    86114
## 12               435783   165272
## 13               380731    93491
## 14              4134120    47673
## 15               432867 34228255
## 16               436676   397216
## 17               317376    70728
## 18               321319  1144743
## 19               433753   452773
## 20               435524   331191
## 21               314658  3166844
## 22               435459   150251
## 23               140673  7597288
## 24              4174262  1107081
## 25              4134454    16630
## 26               442619      929
## 27               444429        6
## 28                    2   185248
## 29                    1   116987
## 30                   77  1091702

Specifying hypotheses of interest

The first group of arguments define the target, comparator, and outcome. Here we demonstrate how to create one set, and add that set to a list:

outcomeOfInterest <- createOutcome(outcomeId = 77,
                                   outcomeOfInterest = TRUE)
negativeControlOutcomes <- lapply(
  negativeControlIds,
  function(outcomeId) createOutcome(outcomeId = outcomeId,
                                    outcomeOfInterest = FALSE,
                                    trueEffectSize = 1)
)
tcos <- createTargetComparatorOutcomes(
  targetId = 1,
  comparatorId = 2,
  outcomes = append(list(outcomeOfInterest),
                    negativeControlOutcomes)
)
targetComparatorOutcomesList <- list(tcos)

We first define the outcome of interest (GI-bleed, cohort ID 77), explicitly stating this is an outcome of interest (outcomeOfInterest = TRUE), meaning we want the full set of artifacts generated for this outcome. We then create a set of negative control outcomes. Because we specify outcomeOfInterest = FALSE, many of the artifacts will not be saved (like the matched population), or even not generated at all (like the covariate balance). This can save a lot of compute time and disk space. We also provide the true effect size for these controls, which will be used later for empirical calibration. We set the target to be celecoxib (cohort ID 1), and the comparator to be diclofenac (cohort ID 2).

A convenient way to save targetComparatorOutcomesList to file is by using the saveTargetComparatorOutcomesList function, and we can load it again using the loadTargetComparatorOutcomesList function.

Specifying analyses

The second group of arguments are not specific to a hypothesis of interest, and comprise the majority of arguments. For each function that will be called during the execution of the analyses, a companion function is available that has (almost) the same arguments. For example, for the trimByPs() function there is the createTrimByPsArgs() function. These companion functions can be used to create the arguments to be used during execution:

covarSettings <- createDefaultCovariateSettings(
  excludedCovariateConceptIds = c(1118084, 1124300),
  addDescendantsToExclude = TRUE
)

getDbCmDataArgs <- createGetDbCohortMethodDataArgs(
  washoutPeriod = 183,
  restrictToCommonPeriod = FALSE,
  firstExposureOnly = TRUE,
  removeDuplicateSubjects = "remove all",
  studyStartDate = "",
  studyEndDate = "",
  covariateSettings = covarSettings
)

createStudyPopArgs <- createCreateStudyPopulationArgs(
  removeSubjectsWithPriorOutcome = TRUE,
  minDaysAtRisk = 1,
  riskWindowStart = 0,
  startAnchor = "cohort start",
  riskWindowEnd = 30,
  endAnchor = "cohort end"
)

fitOutcomeModelArgs1 <- createFitOutcomeModelArgs(modelType = "cox")

Any argument that is not explicitly specified by the user will assume the default value specified in the function. We can now combine the arguments for the various functions into a single analysis:

cmAnalysis1 <- createCmAnalysis(
  analysisId = 1,
  description = "No matching, simple outcome model",
  getDbCohortMethodDataArgs = getDbCmDataArgs,
  createStudyPopArgs = createStudyPopArgs,
  fitOutcomeModelArgs = fitOutcomeModelArgs1
)

Note that we have assigned an analysis ID (1) to this set of arguments. We can use this later to link the results back to this specific set of choices. We also include a short description of the analysis.

We can easily create more analyses, for example by using matching, stratification, inverse probability of treatment weighting, or by using more sophisticated outcome models:

createPsArgs <- createCreatePsArgs() # Use default settings only

matchOnPsArgs <- createMatchOnPsArgs(maxRatio = 100)

computeSharedCovBalArgs <- createComputeCovariateBalanceArgs()

computeCovBalArgs <- createComputeCovariateBalanceArgs(
  covariateFilter = getDefaultCmTable1Specifications()
)

fitOutcomeModelArgs2 <- createFitOutcomeModelArgs(
  modelType = "cox",
  stratified = TRUE
)

cmAnalysis2 <- createCmAnalysis(
  analysisId = 2,
  description = "Matching",
  getDbCohortMethodDataArgs = getDbCmDataArgs,
  createStudyPopArgs = createStudyPopArgs,
  createPsArgs = createPsArgs,
  matchOnPsArgs = matchOnPsArgs,
  computeSharedCovariateBalanceArgs = computeSharedCovBalArgs,
  computeCovariateBalanceArgs = computeCovBalArgs,
  fitOutcomeModelArgs = fitOutcomeModelArgs2
)

stratifyByPsArgs <- createStratifyByPsArgs(numberOfStrata = 5)

cmAnalysis3 <- createCmAnalysis(
  analysisId = 3,
  description = "Stratification",
  getDbCohortMethodDataArgs = getDbCmDataArgs,
  createStudyPopArgs = createStudyPopArgs,
  createPsArgs = createPsArgs,
  stratifyByPsArgs = stratifyByPsArgs,
  computeSharedCovariateBalanceArgs = computeSharedCovBalArgs,
  computeCovariateBalanceArgs = computeCovBalArgs,
  fitOutcomeModelArgs = fitOutcomeModelArgs2
)

fitOutcomeModelArgs3 <- createFitOutcomeModelArgs(
  modelType = "cox",
  inversePtWeighting = TRUE
)

cmAnalysis4 <- createCmAnalysis(
  analysisId = 4,
  description = "Inverse probability weighting",
  getDbCohortMethodDataArgs = getDbCmDataArgs,
  createStudyPopArgs = createStudyPopArgs,
  createPsArgs = createPsArgs,
  fitOutcomeModelArgs = fitOutcomeModelArgs3
)
## Note: Using propensity scores but not computing covariate balance
fitOutcomeModelArgs4 <- createFitOutcomeModelArgs(
  useCovariates = TRUE,
  modelType = "cox",
  stratified = TRUE
)

cmAnalysis5 <- createCmAnalysis(
  analysisId = 5,
  description = "Matching plus full outcome model",
  getDbCohortMethodDataArgs = getDbCmDataArgs,
  createStudyPopArgs = createStudyPopArgs,
  createPsArgs = createPsArgs,
  matchOnPsArgs = matchOnPsArgs,
  fitOutcomeModelArgs = fitOutcomeModelArgs4
)
## Note: Using propensity scores but not computing covariate balance
interactionCovariateIds <- c(
  8532001, # Female
  201826210, # T2DM
  21600960413 # concurrent use of antithrombotic agents
)

fitOutcomeModelArgs5 <- createFitOutcomeModelArgs(
  modelType = "cox",
  stratified = TRUE,
  interactionCovariateIds = interactionCovariateIds
)

cmAnalysis6 <- createCmAnalysis(
  analysisId = 6,
  description = "Stratification plus interaction terms",
  getDbCohortMethodDataArgs = getDbCmDataArgs,
  createStudyPopArgs = createStudyPopArgs,
  createPsArgs = createPsArgs,
  stratifyByPsArgs = stratifyByPsArgs,
  fitOutcomeModelArgs = fitOutcomeModelArgs5
)
## Note: Using propensity scores but not computing covariate balance

These analyses can be combined in a list:

cmAnalysisList <- list(cmAnalysis1, 
                       cmAnalysis2, 
                       cmAnalysis3, 
                       cmAnalysis4, 
                       cmAnalysis5, 
                       cmAnalysis6)

A convenient way to save cmAnalysisList to file is by using the saveCmAnalysisList function, and we can load it again using the loadCmAnalysisList function.

Covariate balance

In our code, we specified that covariate balance must be computed for some of our analysis. For computational reasons, covariate balance has been split into two: We can compute covariate balance for each target-comparator-outcome-analysis combination, and we can compute covariate balance for each target-comparator-analysis, so across all outcomes. The latter is referred to as ‘shared covariate balance’. Since there can be many outcomes, it is often not feasible to recompute (or store) balance for all covariates for each outcome. Moreover, the differences between study populations for the various outcomes are likely very small; the only differences will arise from removing those having the outcome prior, which will exclude different people from the study population depending on the outcome. We therefore typically compute the balance for all covariates across all outcomes (shared balance), and only for a small subset of covariates for each outcome. In the code above, we use all covariates for the shared balance computation, which we typically use to evaluate whether our analysis achieved covariate balance. We limit the covariates for the per-outcome balance computations to only those used for the standard ‘table 1’ definition used in the getDefaultCmTable1Specifications() function, which we can use to create a ‘table 1’ for each outcome.

Executing multiple analyses

We can now run the analyses against the hypotheses of interest using the runCmAnalyses() function. This function will run all specified analyses against all hypotheses of interest, meaning that the total number of outcome models is length(cmAnalysisList) * length(targetComparatorOutcomesList) (if all analyses specify an outcome model should be fitted). Note that we do not want all combinations of analyses and hypothesis to be computed, we can can skip certain analyses by using the analysesToExclude argument of the runCmAnalyses().

multiThreadingSettings <- createDefaultMultiThreadingSettings(parallel::detectCores())

result <- runCmAnalyses(
  connectionDetails = connectionDetails,
  cdmDatabaseSchema = cdmDatabaseSchema,
  exposureDatabaseSchema = cohortDatabaseSchema,
  exposureTable = cohortTable,
  outcomeDatabaseSchema = cohortDatabaseSchema,
  outcomeTable = cohortTable,
  outputFolder = folder,
  cmAnalysisList = cmAnalysisList,
  targetComparatorOutcomesList = targetComparatorOutcomesList,
  multiThreadingSettings = multiThreadingSettings
)

In the code above, we first specify how many parallel threads CohortMethod can use. Many of the computations can be computed in parallel, and providing more than one CPU core can greatly speed up the computation. Here we specify CohortMethod can use all the CPU cores detected in the system (using the parallel::detectCores() function).

We call runCmAnalyses(), providing the arguments for connecting to the database, which schemas and tables to use, as well as the analyses and hypotheses of interest. The folder specifies where the outcome models and intermediate files will be written.

Restarting

If for some reason the execution was interrupted, you can restart by re-issuing the runCmAnalyses() command. Any intermediate and final products that have already been completed and written to disk will be skipped.

Retrieving the results

The result of the runCmAnalyses() is a data frame with one row per target-target-outcome-analysis combination. It provides the file names of the intermediate and end-result files that were constructed. For example, we can retrieve and plot the propensity scores for the combination of our target, comparator, outcome of interest, and last analysis:

psFile <- result %>%
  filter(targetId == 1,
         comparatorId == 2,
         outcomeId == 77,
         analysisId == 5) %>%
  pull(psFile)
ps <- readRDS(file.path(folder, psFile))
plotPs(ps)

Note that some of the file names will appear several times in the table. For example, analysis 3 and 5 only differ in terms of the outcome model, and will share the same propensity score and stratification files.

We can always retrieve the file reference table again using the getFileReference() function:

result <- getFileReference(folder)

We can get a summary of the results using getResultsSummary():

resultsSum <- getResultsSummary(folder)
resultsSum
## # A tibble: 216 × 29
##    analysisId targetId comparatorId outcomeId trueEffectSize targetSubjects
##         <int>    <int>        <int>     <int>          <dbl>          <int>
##  1          1        1            2        77             NA          90716
##  2          1        1            2     29735              1          92988
##  3          1        1            2    140673              1          86766
##  4          1        1            2    197494              1          93931
##  5          1        1            2    198185              1          94107
##  6          1        1            2    198199              1          93564
##  7          1        1            2    200528              1          93396
##  8          1        1            2    257315              1          93058
##  9          1        1            2    314658              1          84073
## 10          1        1            2    317376              1          93965
## # ℹ 206 more rows
## # ℹ 23 more variables: comparatorSubjects <int>, targetDays <dbl>,
## #   comparatorDays <dbl>, targetOutcomes <dbl>, comparatorOutcomes <dbl>,
## #   rr <dbl>, ci95Lb <dbl>, ci95Ub <dbl>, p <dbl>, oneSidedP <dbl>,
## #   logRr <dbl>, seLogRr <dbl>, llr <dbl>, mdrr <dbl>, targetEstimator <chr>,
## #   calibratedRr <dbl>, calibratedCi95Lb <dbl>, calibratedCi95Ub <dbl>,
## #   calibratedP <dbl>, calibratedOneSidedP <dbl>, calibratedLogRr <dbl>, …

This tells us, per target-comparator-outcome-analysis combination, the estimated relative risk and 95% confidence interval, as well as the number of people in the treated and comparator group (after trimming and matching if applicable), and the number of outcomes observed for those groups within the specified risk windows.

Empirical calibration and negative control distribution

Because our study included negative control outcomes, our analysis summary also contains calibrated confidence intervals and p-values. We can also create the calibration effect plots for every analysis ID. In each plot, the blue dots represent our negative control outcomes, and the yellow diamond represents our health outcome of interest: GI bleed. An unbiased, well-calibrated analysis should have 95% of the negative controls between the dashed lines (ie. 95% should have p > .05).

install.packages("EmpiricalCalibration")
library(EmpiricalCalibration)

# Analysis 1: No matching, simple outcome model
ncs <- resultsSum %>%
  filter(analysisId == 1,
         outcomeId != 77)
hoi <- resultsSum %>%
  filter(analysisId == 1,
         outcomeId == 77)
null <- fitNull(ncs$logRr, ncs$seLogRr)
plotCalibrationEffect(logRrNegatives = ncs$logRr, 
                      seLogRrNegatives = ncs$seLogRr, 
                      logRrPositives = hoi$logRr, 
                      seLogRrPositives = hoi$seLogRr, null)

# Analysis 2: Matching
ncs <- resultsSum %>%
  filter(analysisId == 2,
         outcomeId != 77)
hoi <- resultsSum %>%
  filter(analysisId == 2,
         outcomeId == 77)
null <- fitNull(ncs$logRr, ncs$seLogRr)
plotCalibrationEffect(logRrNegatives = ncs$logRr, 
                      seLogRrNegatives = ncs$seLogRr, 
                      logRrPositives = hoi$logRr, 
                      seLogRrPositives = hoi$seLogRr, null)

# Analysis 3: Stratification
ncs <- resultsSum %>%
  filter(analysisId == 3,
         outcomeId != 77)
hoi <- resultsSum %>%
  filter(analysisId == 3,
         outcomeId == 77)
null <- fitNull(ncs$logRr, ncs$seLogRr)
plotCalibrationEffect(logRrNegatives = ncs$logRr, 
                      seLogRrNegatives = ncs$seLogRr, 
                      logRrPositives = hoi$logRr, 
                      seLogRrPositives = hoi$seLogRr, null)

# Analysis 4: Inverse probability of treatment weighting
ncs <- resultsSum %>%
  filter(analysisId == 4,
         outcomeId != 77)
hoi <- resultsSum %>%
  filter(analysisId == 4,
         outcomeId == 77)
null <- fitNull(ncs$logRr, ncs$seLogRr)
plotCalibrationEffect(logRrNegatives = ncs$logRr, 
                      seLogRrNegatives = ncs$seLogRr, 
                      logRrPositives = hoi$logRr, 
                      seLogRrPositives = hoi$seLogRr, null)

# Analysis 5: Stratification plus full outcome model
ncs <- resultsSum %>%
  filter(analysisId == 5,
         outcomeId != 77)
hoi <- resultsSum %>%
  filter(analysisId == 5,
         outcomeId == 77)
null <- fitNull(ncs$logRr, ncs$seLogRr)
plotCalibrationEffect(logRrNegatives = ncs$logRr, 
                      seLogRrNegatives = ncs$seLogRr, 
                      logRrPositives = hoi$logRr, 
                      seLogRrPositives = hoi$seLogRr, null)

Analysis 6 explored interactions with certain variables. The estimates for these interaction terms are stored in a separate results summary. We can examine whether these estimates are also consistent with the null. In this example we consider the interaction with ‘concurrent use of antithrombotic agents’ (covariate ID 21600960413):

interactionResultsSum <- getInteractionResultsSummary(folder)

# Analysis 6: Stratification plus interaction terms
ncs <- interactionResultsSum %>%
  filter(analysisId == 6,
         interactionCovariateId == 21600960413,
         outcomeId != 77)
hoi <- interactionResultsSum %>%
  filter(analysisId == 6,
         interactionCovariateId == 21600960413,
         outcomeId == 77)
null <- fitNull(ncs$logRr, ncs$seLogRr)
plotCalibrationEffect(logRrNegatives = ncs$logRr, 
                      seLogRrNegatives = ncs$seLogRr, 
                      logRrPositives = hoi$logRr, 
                      seLogRrPositives = hoi$seLogRr, null)
## Warning in checkWithinLimits(yLimits, c(seLogRrNegatives, seLogRrPositives), :
## Values are outside plotted range. Consider adjusting yLimits parameter
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_vline()`).

Exporting to CSV

The results generated so far all reside in binary object on your local file system, mixing aggregate statistics such as hazard ratios with patient-level data including propensity scores per person. How could we share our results with others, possibly outside our organization? This is where the exportToCsv() function comes in. This function exports all results, including diagnostics to CSV (comma-separated values) files. These files only contain aggregate statistics, not patient-level data. The format is CSV files to enable human review.

exportToCsv(
  folder,
  exportFolder = file.path(folder, "export"),
  databaseId = "My CDM",
  minCellCount = 5,
  maxCores = parallel::detectCores()
)

Any person counts in the results that are smaller than the minCellCount argument will be blinded, by replacing the count with the negative minCellCount. For example, if the number of people with the outcome is 3, and minCellCount = 5, the count will be reported to be -5, which in the Shiny app will be displayed as ‘<5’.

Information on the data model used to generate the CSV files can be retrieved using getResultsDataModelSpecifications():

## # A tibble: 188 × 8
##    tableName   columnName dataType isRequired primaryKey minCellCount deprecated
##    <chr>       <chr>      <chr>    <chr>      <chr>      <chr>        <chr>     
##  1 cm_attriti… sequence_… int      Yes        Yes        No           No        
##  2 cm_attriti… descripti… varchar  Yes        No         No           No        
##  3 cm_attriti… subjects   int      Yes        No         Yes          No        
##  4 cm_attriti… exposure_… bigint   Yes        Yes        No           No        
##  5 cm_attriti… target_id  bigint   Yes        Yes        No           No        
##  6 cm_attriti… comparato… bigint   Yes        Yes        No           No        
##  7 cm_attriti… analysis_… int      Yes        Yes        No           No        
##  8 cm_attriti… outcome_id bigint   Yes        Yes        No           No        
##  9 cm_attriti… database_… varchar  Yes        Yes        No           No        
## 10 cm_follow_… target_id  bigint   Yes        Yes        No           No        
## # ℹ 178 more rows
## # ℹ 1 more variable: description <chr>

View results in a Shiny app

Finally, we can view the results in a Shiny app. For this we must first load the CSV files produced by exportToCsv() into a database. We could use the uploadExportedResults() function for this. However, if we just want to view the results ourselves we can create a small SQLite database ourselves without having to set up a database server. In any case we need to specify the names of the exposure and outcome cohorts we used in our study. We can create the SQLite database using:

cohorts <- data.frame(
  cohortDefinitionId = c(
    1,
    2,
    77),
  cohortName = c(
    "Celecoxib",
    "Diclofenac",
    "GI Bleed"
  )
)

insertExportedResultsInSqlite(
  sqliteFileName = file.path(folder, "myResults.sqlite"), 
  exportFolder = file.path(folder, "export"), 
  cohorts = cohorts
)

Next we launch the Shiny app using:

launchResultsViewerUsingSqlite(
  sqliteFileName = file.path(folder, "myResults.sqlite")
)
CohortMethod Shiny app

CohortMethod Shiny app

Acknowledgments

Considerable work has been dedicated to provide the CohortMethod package.

citation("CohortMethod")
## To cite package 'CohortMethod' in publications use:
## 
##   Schuemie M, Suchard M, Ryan P (2024). _CohortMethod: New-User Cohort
##   Method with Large Scale Propensity and Outcome Models_. R package
##   version 5.4.0, https://github.com/OHDSI/CohortMethod,
##   <https://ohdsi.github.io/CohortMethod>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {CohortMethod: New-User Cohort Method with Large Scale Propensity and Outcome
## Models},
##     author = {Martijn Schuemie and Marc Suchard and Patrick Ryan},
##     year = {2024},
##     note = {R package version 5.4.0, 
## https://github.com/OHDSI/CohortMethod},
##     url = {https://ohdsi.github.io/CohortMethod},
##   }

Further, CohortMethod makes extensive use of the Cyclops package.

citation("Cyclops")
## To cite Cyclops in publications use:
## 
##   Suchard MA, Simpson SE, Zorych I, Ryan P, Madigan D (2013). "Massive
##   parallelization of serial inference algorithms for complex
##   generalized linear models." _ACM Transactions on Modeling and
##   Computer Simulation_, *23*, 10.
##   <https://dl.acm.org/doi/10.1145/2414416.2414791>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     author = {M. A. Suchard and S. E. Simpson and I. Zorych and P. Ryan and D. Madigan},
##     title = {Massive parallelization of serial inference algorithms for complex generalized linear models},
##     journal = {ACM Transactions on Modeling and Computer Simulation},
##     volume = {23},
##     pages = {10},
##     year = {2013},
##     url = {https://dl.acm.org/doi/10.1145/2414416.2414791},
##   }

This work is supported in part through the National Science Foundation grant IIS 1251151.