vignettes/SingleStudies.Rmd
SingleStudies.Rmd
This vignette describes how you can use the CohortMethod
package to perform a single new-user cohort study. We will walk through
all the steps needed to perform an exemplar study, and we have selected
the well-studied topic of the effect of coxibs versus non-selective
non-steroidal anti-inflammatory drugs (NSAIDs) on gastrointestinal (GI)
bleeding-related hospitalization. For simplicity, we focus on one coxib
– celecoxib – and one non-selective NSAID – diclofenac.
Before installing the CohortMethod
package make sure you
have Java available. Java can be downloaded from www.java.com. For Windows users, RTools
is also necessary. RTools can be downloaded from CRAN.
The CohortMethod
package is currently maintained in a Github repository, and
has dependencies on other packages in Github. All of these packages can
be downloaded and installed from within R using the drat
package:
install.packages("remotes")
remotes::install_github("ohdsi/CohortMethod")
Once installed, you can type library(CohortMethod)
to
load the package.
The first step in running the CohortMethod
is extracting
all necessary data from the database server holding the data in the
Observational Medical Outcomes Partnership (OMOP) Common Data Model
(CDM) format.
We need to tell R how to connect to the server where the data are.
CohortMethod
uses the DatabaseConnector
package, which provides the createConnectionDetails
function. Type ?createConnectionDetails
for the specific
settings required for the various database management systems (DBMS).
For example, one might connect to a PostgreSQL database using this
code:
connectionDetails <- createConnectionDetails(dbms = "postgresql",
server = "localhost/ohdsi",
user = "joe",
password = "supersecret")
cdmDatabaseSchema <- "my_cdm_data"
resultsDatabaseSchema <- "my_results"
options(sqlRenderTempEmulationSchema = NULL)
The last two lines define the cdmDatabaseSchema
and
resultSchema
variables. We’ll use these later to tell R
where the data in CDM format live, and where we want to write
intermediate tables. Note that for Microsoft SQL Server, databaseschemas
need to specify both the database and the schema, so for example
cdmDatabaseSchema <- "my_cdm_data.dbo"
.
We need to define the exposures and outcomes for our study. One could
use an external cohort definition tools, but in this example we do this
by writing SQL statements against the OMOP CDM that populate a table of
events in which we are interested. The resulting table should have the
same structure as the cohort
table in the CDM. This means
it should have the fields cohort_definition_id
,
cohort_start_date
, cohort_end_date
,and
subject_id
.
For our example study, we have created a file called coxibVsNonselVsGiBleed.sql with the following contents:
/***********************************
File coxibVsNonselVsGiBleed.sql
***********************************/
DROP TABLE IF EXISTS @resultsDatabaseSchema.coxibVsNonselVsGiBleed;
CREATE TABLE @resultsDatabaseSchema.coxibVsNonselVsGiBleed (
INT,
cohort_definition_id DATE,
cohort_start_date DATE,
cohort_end_date
subject_id BIGINT
);
INSERT INTO @resultsDatabaseSchema.coxibVsNonselVsGiBleed (
cohort_definition_id,
cohort_start_date,
cohort_end_date,
subject_id
)SELECT 1, -- Exposure
drug_era_start_date,
drug_era_end_date,
person_idFROM @cdmDatabaseSchema.drug_era
WHERE drug_concept_id = 1118084;-- celecoxib
INSERT INTO @resultsDatabaseSchema.coxibVsNonselVsGiBleed (
cohort_definition_id,
cohort_start_date,
cohort_end_date,
subject_id
)SELECT 2, -- Comparator
drug_era_start_date,
drug_era_end_date,
person_idFROM @cdmDatabaseSchema.drug_era
WHERE drug_concept_id = 1124300; --diclofenac
INSERT INTO @resultsDatabaseSchema.coxibVsNonselVsGiBleed (
cohort_definition_id,
cohort_start_date,
cohort_end_date,
subject_id
)SELECT 3, -- Outcome
condition_start_date,
condition_end_date,
condition_occurrence.person_idFROM @cdmDatabaseSchema.condition_occurrence
INNER JOIN @cdmDatabaseSchema.visit_occurrence
ON condition_occurrence.visit_occurrence_id = visit_occurrence.visit_occurrence_id
WHERE condition_concept_id IN (
SELECT descendant_concept_id
FROM @cdmDatabaseSchema.concept_ancestor
WHERE ancestor_concept_id = 192671 -- GI - Gastrointestinal haemorrhage
)AND visit_occurrence.visit_concept_id IN (9201, 9203);
This is parameterized SQL which can be used by the
SqlRender
package. We use parameterized SQL so we do not
have to pre-specify the names of the CDM and result schemas. That way,
if we want to run the SQL on a different schema, we only need to change
the parameter values; we do not have to change the SQL code. By also
making use of translation functionality in SqlRender
, we
can make sure the SQL code can be run in many different
environments.
library(SqlRender)
sql <- readSql("coxibVsNonselVsGiBleed.sql")
sql <- render(sql,
cdmDatabaseSchema = cdmDatabaseSchema,
resultsDatabaseSchema = resultsDatabaseSchema)
sql <- translate(sql, targetDialect = connectionDetails$dbms)
connection <- connect(connectionDetails)
executeSql(connection, sql)
In this code, we first read the SQL from the file into memory. In the
next line, we replace the two parameter names with the actual values. We
then translate the SQL into the dialect appropriate for the DBMS we
already specified in the connectionDetails
. Next, we
connect to the server, and submit the rendered and translated SQL.
If all went well, we now have a table with the events of interest. We can see how many events per type:
sql <- paste("SELECT cohort_definition_id, COUNT(*) AS count",
"FROM @resultsDatabaseSchema.coxibVsNonselVsGiBleed",
"GROUP BY cohort_definition_id")
sql <- render(sql, resultsDatabaseSchema = resultsDatabaseSchema)
sql <- translate(sql, targetDialect = connectionDetails$dbms)
querySql(connection, sql)
## cohort_concept_id count
## 1 1 50000
## 2 2 50000
## 3 3 15000
Now we can tell CohortMethod
to define the cohorts based
on our events, construct covariates, and extract all necessary data for
our analysis.
Important: The target and comparator drug must not
be included in the covariates, including any descendant concepts. You
will need to manually add the drugs and descendants to the
excludedCovariateConceptIds
of the covariate settings. In
this example code we exclude all NSAIDs from the covariates by pointing
to the concept ID of the NSAID class and specifying
addDescendantsToExclude = TRUE
.
nsaids <- 21603933
# Define which types of covariates must be constructed:
covSettings <- createDefaultCovariateSettings(excludedCovariateConceptIds = nsaids,
addDescendantsToExclude = TRUE)
#Load data:
cohortMethodData <- getDbCohortMethodData(
connectionDetails = connectionDetails,
cdmDatabaseSchema = cdmDatabaseSchema,
targetId = 1,
comparatorId = 2,
outcomeIds = 3,
studyStartDate = "",
studyEndDate = "",
exposureDatabaseSchema = resultsDatabaseSchema,
exposureTable = "coxibVsNonselVsGiBleed",
outcomeDatabaseSchema = resultsDatabaseSchema,
outcomeTable = "coxibVsNonselVsGiBleed",
cdmVersion = cdmVersion,
firstExposureOnly = TRUE,
removeDuplicateSubjects = "remove all",
restrictToCommonPeriod = FALSE,
washoutPeriod = 180,
covariateSettings = covSettings
)
cohortMethodData
## # CohortMethodData object
##
## Target cohort ID: 1
## Comparator cohort ID: 2
## Outcome cohort ID(s): 3
##
## Inherits from CovariateData:
## # CovariateData object
##
## All cohorts
##
## Inherits from Andromeda:
## # Andromeda object
## # Physical location: C:\Users\admin_mschuemi\AppData\Local\Temp\2\RtmpQZKV1W\file312c37fb2b73.sqlite
##
## Tables:
## $analysisRef (analysisId, analysisName, domainId, startDay, endDay, isBinary, missingMeansZero)
## $cohorts (rowId, personSeqId, personId, treatment, cohortStartDate, daysFromObsStart, daysToCohortEnd, daysToObsEnd)
## $covariateRef (covariateId, covariateName, analysisId, conceptId)
## $covariates (rowId, covariateId, covariateValue)
## $outcomes (rowId, outcomeId, daysToEvent)
There are many parameters, but they are all documented in the
CohortMethod
manual. The
createDefaultCovariateSettings
function is described in the
FeatureExtraction
package. In short, we are pointing the
function to the table created earlier and indicating which concept IDs
in that table identify the target, comparator and outcome. We instruct
that the default set of covariates should be constructed, including
covariates for all conditions, drug exposures, and procedures that were
found on or before the index date. To customize the set of covariates,
please refer to the FeatureExtraction
package vignette by
typing
vignette("UsingFeatureExtraction", package="FeatureExtraction")
.
All data about the cohorts, outcomes, and covariates are extracted
from the server and stored in the cohortMethodData
object.
This object uses the Andromeda
package to store information
in a way that ensures R does not run out of memory, even when the data
are large. We can use the generic summary()
function to
view some more information of the data we extracted:
summary(cohortMethodData)
## CohortMethodData object summary
##
## Target cohort ID: 1
## Comparator cohort ID: 2
## Outcome cohort ID(s): 3
##
## Target persons: 50000
## Comparator persons: 50000
##
## Outcome counts:
## Event count Person count
## 3 36380 7447
##
## Covariates:
## Number of covariates: 62004
## Number of non-zero covariate values: 41097434
Creating the cohortMethodData
file can take considerable
computing time, and it is probably a good idea to save it for future
sessions. Because cohortMethodData
uses
Andromeda
, we cannot use R’s regular save function.
Instead, we’ll have to use the saveCohortMethodData()
function:
saveCohortMethodData(cohortMethodData, "coxibVsNonselVsGiBleed.zip")
We can use the loadCohortMethodData()
function to load
the data in a future session.
Typically, a new user is defined as first time use of a drug (either
target or comparator), and typically a washout period (a minimum number
of days prior first use) is used to make sure it is truly first use.
When using the CohortMethod
package, you can enforce the
necessary requirements for new use in three ways:
getDbCohortMethodData
function, you can use the
firstExposureOnly
, removeDuplicateSubjects
,
restrictToCommonPeriod
, and washoutPeriod
arguments. (As shown in the example above).createStudyPopulation
function (see below) using the
firstExposureOnly
, removeDuplicateSubjects
,
restrictToCommonPeriod
, and washoutPeriod
arguments.The advantage of option 1 is that the input cohorts are already fully
defined outside of the CohortMethod
package, and for
example external cohort characterization tools can be used on the same
cohorts used in this package. The advantage of options 2 and 3 is that
it saves you the trouble of limiting to first use yourself, for example
allowing you to directly use the drug_era
table in the CDM.
Option 2 is more efficient than 3, since only data for first use will be
fetched, while option 3 is less efficient but allows you to compare the
original cohorts to the study population.
Typically, the exposure cohorts and outcome cohorts will be defined
independently of each other. When we want to produce an effect size
estimate, we need to further restrict these cohorts and put them
together, for example by removing exposed subjects that had the outcome
prior to exposure, and only keeping outcomes that fall within a defined
risk window. For this we can use the createStudyPopulation
function:
studyPop <- createStudyPopulation(
cohortMethodData = cohortMethodData,
outcomeId = 3,
firstExposureOnly = FALSE,
restrictToCommonPeriod = FALSE,
washoutPeriod = 0,
removeDuplicateSubjects = "keep all",
removeSubjectsWithPriorOutcome = TRUE,
minDaysAtRisk = 1,
riskWindowStart = 0,
startAnchor = "cohort start",
riskWindowEnd = 30,
endAnchor = "cohort end"
)
Note that we’ve set firstExposureOnly
and
removeDuplicateSubjects
to FALSE, and
washoutPeriod
to zero because we already filtered on these
arguments when using the getDbCohortMethodData
function.
During loading we set restrictToCommonPeriod
to FALSE, and
we do the same here because we do not want to force the comparison to
restrict only to time when both drugs are recorded. We specify the
outcome ID we will use, and that people with outcomes prior to the risk
window start date will be removed. The risk window is defined as
starting at the cohort start date (the index date,
riskWindowStart = 0
and
startAnchor = "cohort start"
), and the risk windows ends 30
days after the cohort ends (riskWindowEnd = 30
and
endAnchor = "cohort end"
). Note that the risk windows are
truncated at the end of observation or the study end date. We also
remove subjects who have no time at risk. To see how many people are
left in the study population we can always use the
getAttritionTable
function:
getAttritionTable(studyPop)
## # A tibble: 5 x 5
## description targetPersons comparatorPersons targetExposures comparatorExposures
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Original cohorts 856973 915830 1946114 1786318
## 2 First exp. only & removed s ... 373874 541386 373874 541386
## 3 Random sample 50000 50000 50000 50000
## 4 No prior outcome 48700 48715 48700 48715
## 5 Have at least 1 days at ris ... 48667 48688 48667 48688
One additional filtering step that is often used is matching or trimming on propensity scores, as will be discussed next.
The CohortMethod
can use propensity scores to adjust for
potential confounders. Instead of the traditional approach of using a
handful of predefined covariates, CohortMethod
typically
uses thousands to millions of covariates that are automatically
constructed based on conditions, procedures and drugs in the records of
the subjects.
We can fit a propensity model using the covariates constructed by the
getDbcohortMethodData()
function:
ps <- createPs(cohortMethodData = cohortMethodData, population = studyPop)
The createPs()
function uses the Cyclops
package to fit a large-scale regularized logistic regression.
To fit the propensity model, Cyclops
needs to know the
hyperparameter value which specifies the variance of the prior. By
default Cyclops
will use cross-validation to estimate the
optimal hyperparameter. However, be aware that this can take a really
long time. You can use the prior
and control
parameters of the createPs()
to specify
Cyclops
behavior, including using multiple CPUs to speed-up
the cross-validation.
We can compute the area under the receiver-operator curve (AUC) for the propensity score model:
computePsAuc(ps)
## [1] 0.81
We can also plot the propensity score distribution, although we prefer the preference score distribution:
plotPs(ps,
scale = "preference",
showCountsLabel = TRUE,
showAucLabel = TRUE,
showEquiposeLabel = TRUE)
It is also possible to inspect the propensity model itself by showing the covariates that have non-zero coefficients:
getPsModel(ps, cohortMethodData)
## # A tibble: 6 x 3
## coefficient covariateId covariateName
## <dbl> <dbl> <chr>
## 1 -3.47 1150871413 ...gh 0 days relative to index: misoprostol
## 2 -2.45 2016006 index year: 2016
## 3 -2.43 2017006 index year: 2017
## 4 -2.39 2018006 index year: 2018
## 5 -2.37 2015006 index year: 2015
## 6 -2.30 2014006 index year: 2014
One advantage of using the regularization when fitting the propensity model is that most coefficients will shrink to zero and fall out of the model. It is a good idea to inspect the remaining variables for anything that should not be there, for example variations of the drugs of interest that we forgot to exclude.
We can use the propensity scores to trim, stratify, match, or weigh our population. For example, one could trim to equipoise, meaning only subjects with a preference score between 0.25 and 0.75 are kept:
trimmedPop <- trimByPsToEquipoise(ps)
plotPs(trimmedPop, ps, scale = "preference")
Instead (or additionally), we could stratify the population based on the propensity score:
stratifiedPop <- stratifyByPs(ps, numberOfStrata = 5)
plotPs(stratifiedPop, ps, scale = "preference")
We can also match subjects based on propensity scores. In this example, we’re using one-to-one matching:
matchedPop <- matchOnPs(ps, caliper = 0.2, caliperScale = "standardized logit", maxRatio = 1)
plotPs(matchedPop, ps)
Note that for both stratification and matching it is possible to
specify additional matching criteria such as age and sex using the
stratifyByPsAndCovariates()
and
matchOnPsAndCovariates()
functions, respectively.
We can see the effect of trimming and/or matching on the population
using the getAttritionTable
function:
getAttritionTable(matchedPop)
## # A tibble: 6 x 5
## description targetPersons comparatorPersons targetExposures comparatorExposures
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Original cohorts 856973 915830 1946114 1786318
## 2 First exp. only & removed s ... 373874 541386 373874 541386
## 3 Random sample 50000 50000 50000 50000
## 4 No prior outcome 48700 48715 48700 48715
## 5 Have at least 1 days at ris ... 48667 48688 48667 48688
## 6 Matched on propensity score 22339 22339 22339 22339
Or, if we like, we can plot an attrition diagram:
drawAttritionDiagram(matchedPop)
To evaluate whether our use of the propensity score is indeed making the two cohorts more comparable, we can compute the covariate balance before and after trimming, matching, and/or stratifying:
balance <- computeCovariateBalance(matchedPop, cohortMethodData)
plotCovariateBalanceScatterPlot(balance, showCovariateCountLabel = TRUE, showMaxLabel = TRUE)
## Warning: Removed 23590 rows containing missing values (`geom_point()`).
plotCovariateBalanceOfTopVariables(balance)
The ‘before matching’ population is the population as extracted by
the getDbCohortMethodData
function, so before any further
filtering steps.
It is customary to include a table in your paper that lists some
select population characteristics before and after
matching/stratification/trimming. This is usually the first table, and
so will be referred to as ‘table 1’. To generate this table, you can use
the createCmTable1
function:
createCmTable1(balance)
Before matching After matching
Target Comparator Target Comparator
Characteristic % % Std. diff % % Std. diff
Age group
25 - 29 0.0 0.0
30 - 34 0.0 0.0
40 - 44 0.0 0.0 0.00 0.0 0.0 -0.01
45 - 49 0.1 0.1 0.00 0.0 0.1 -0.01
50 - 54 0.2 0.2 0.00 0.2 0.2 0.00
55 - 59 0.5 0.5 -0.01 0.5 0.6 -0.01
60 - 64 0.8 1.2 -0.04 1.1 1.1 -0.01
65 - 69 25.8 29.0 -0.07 29.6 29.2 0.01
70 - 74 25.0 25.0 0.00 25.1 24.9 0.01
75 - 79 20.6 18.9 0.04 18.8 19.0 -0.01
80 - 84 15.2 13.2 0.06 13.4 13.7 -0.01
85 - 89 8.3 8.0 0.01 7.8 7.8 0.00
90 - 94 3.0 3.2 -0.01 2.9 2.8 0.01
95 - 99 0.6 0.7 -0.02 0.6 0.6 0.00
100 - 104 0.1 0.1 0.00 0.1 0.0 0.02
Gender: female 59.5 61.8 -0.05 60.0 60.3 0.00
Medical history: General
Acute respiratory disease 18.4 22.2 -0.10 20.2 20.4 0.00
Attention deficit hyperactivity disorder 0.1 0.2 -0.03 0.2 0.2 0.00
Chronic liver disease 0.8 1.4 -0.05 1.0 1.1 -0.01
Chronic obstructive lung disease 10.2 10.4 -0.01 9.4 9.6 -0.01
Crohn's disease 0.3 0.4 -0.02 0.3 0.3 0.00
Dementia 2.9 3.3 -0.03 2.8 2.9 -0.01
Depressive disorder 6.2 9.3 -0.12 8.0 7.8 0.01
Diabetes mellitus 18.6 25.3 -0.16 21.7 21.5 0.00
Gastroesophageal reflux disease 10.0 15.1 -0.16 13.3 13.0 0.01
Gastrointestinal hemorrhage 3.6 3.3 0.02 2.3 2.2 0.01
Human immunodeficiency virus infection 0.0 0.1 -0.02 0.0 0.1 -0.01
Hyperlipidemia 31.5 49.0 -0.36 42.7 41.8 0.02
Hypertensive disorder 50.2 61.3 -0.22 56.8 56.5 0.01
Lesion of liver 0.6 0.8 -0.02 0.6 0.6 0.00
Obesity 3.8 7.3 -0.15 5.6 5.3 0.01
Osteoarthritis 47.0 49.3 -0.05 50.6 50.0 0.01
Pneumonia 4.7 4.7 0.00 4.0 4.2 -0.01
Psoriasis 1.0 1.4 -0.03 1.3 1.2 0.01
Renal impairment 4.7 10.5 -0.22 6.3 6.1 0.01
Rheumatoid arthritis 2.4 3.0 -0.04 2.9 2.9 0.00
Schizophrenia 0.1 0.1 -0.01 0.1 0.1 -0.01
Ulcerative colitis 0.4 0.5 -0.02 0.4 0.4 0.00
Urinary tract infectious disease 8.9 10.8 -0.06 9.7 9.6 0.00
Viral hepatitis C 0.1 0.3 -0.04 0.2 0.2 0.00
Medical history: Cardiovascular disease
Atrial fibrillation 8.7 9.6 -0.03 8.2 8.4 -0.01
Cerebrovascular disease 10.3 10.5 -0.01 9.5 9.6 0.00
Coronary arteriosclerosis 17.2 18.4 -0.03 16.4 16.7 -0.01
Heart disease 38.5 38.7 0.00 35.8 36.1 -0.01
Heart failure 7.8 8.2 -0.01 6.3 6.4 0.00
Ischemic heart disease 8.7 8.6 0.00 7.1 7.3 -0.01
Peripheral vascular disease 7.3 10.1 -0.10 7.9 8.0 0.00
Pulmonary embolism 0.7 0.8 -0.02 0.6 0.7 -0.01
Venous thrombosis 1.9 2.5 -0.04 2.2 2.2 0.00
Medical history: Neoplasms
Malignant lymphoma 0.8 0.9 -0.01 0.8 0.8 0.00
Malignant neoplasm of anorectum 0.5 0.3 0.03 0.4 0.3 0.00
Malignant neoplastic disease 17.3 17.8 -0.01 17.4 17.5 0.00
Malignant tumor of breast 3.0 3.2 -0.01 3.1 3.1 0.00
Malignant tumor of colon 0.9 0.7 0.02 0.8 0.7 0.00
Malignant tumor of lung 0.7 0.5 0.02 0.6 0.6 0.01
Malignant tumor of urinary bladder 0.8 0.8 0.00 0.8 0.8 0.00
Primary malignant neoplasm of prostate 3.9 3.4 0.02 3.6 3.6 0.00
Medication use
Agents acting on the renin-angiotensin system 44.2 49.6 -0.11 47.9 48.2 -0.01
Antibacterials for systemic use 60.9 65.6 -0.10 61.8 62.4 -0.01
Antidepressants 25.7 26.7 -0.02 26.2 26.5 -0.01
Antiepileptics 14.7 17.8 -0.08 16.8 16.8 0.00
Antiinflammatory and antirheumatic products 25.0 28.6 -0.08 28.9 28.8 0.00
Antineoplastic agents 4.1 5.1 -0.05 4.8 4.7 0.00
Antipsoriatics 0.7 1.1 -0.04 0.8 0.9 -0.01
Antithrombotic agents 22.1 19.7 0.06 19.0 19.4 -0.01
Beta blocking agents 34.4 38.1 -0.08 35.3 35.6 -0.01
Calcium channel blockers 26.6 28.7 -0.05 26.9 27.2 -0.01
Diuretics 42.4 41.2 0.02 40.5 41.3 -0.02
Drugs for acid related disorders 37.1 38.9 -0.04 35.6 36.0 -0.01
Drugs for obstructive airway diseases 38.2 44.7 -0.13 42.5 42.4 0.00
Drugs used in diabetes 17.3 21.1 -0.10 18.5 18.6 0.00
Immunosuppressants 3.0 4.9 -0.10 4.2 4.2 0.00
Lipid modifying agents 49.1 56.0 -0.14 54.5 54.3 0.00
Opioids 36.4 34.2 0.05 34.9 35.4 -0.01
Psycholeptics 30.2 29.8 0.01 30.0 30.5 -0.01
Psychostimulants, agents used for adhd and nootropics 1.5 1.8 -0.02 1.9 1.9 -0.01
For various reasons it might be necessary to insert the study
population back into the database, for example because we want to use an
external cohort characterization tool. We can use the
insertDbPopulation
function for this purpose:
insertDbPopulation(
population = matchedPop,
cohortIds = c(101,100),
connectionDetails = connectionDetails,
cohortDatabaseSchema = resultsDatabaseSchema,
cohortTable = "coxibVsNonselVsGiBleed",
createTable = FALSE,
cdmVersion = cdmVersion
)
This function will store the population in a table with the same
structure as the cohort
table in the CDM, in this case in
the same table where we had created our original cohorts.
Before we start fitting an outcome model, we might be interested to know whether we have sufficient power to detect a particular effect size. It makes sense to perform these power calculations once the study population has been fully defined, so taking into account loss to the various inclusion and exclusion criteria (such as no prior outcomes), and loss due to matching and/or trimming. Since the sample size is fixed in retrospective studies (the data has already been collected), and the true effect size is unknown, the CohortMethod package provides a function to compute the minimum detectable relative risk (MDRR) instead:
computeMdrr(
population = studyPop,
modelType = "cox",
alpha = 0.05,
power = 0.8,
twoSided = TRUE
)
## targetPersons comparatorPersons targetExposures comparatorExposures targetDays comparatorDays totalOutcomes mdrr se
## 1 48667 48688 48667 48688 7421404 3693928 554 1.26878 0.08497186
In this example we used the studyPop
object, so the
population before any matching or trimming. If we want to know the MDRR
after matching, we use the matchedPop
object we created
earlier instead:
computeMdrr(
population = matchedPop,
modelType = "cox",
alpha = 0.05,
power = 0.8,
twoSided = TRUE
)
## targetPersons comparatorPersons targetExposures comparatorExposures targetDays comparatorDays totalOutcomes mdrr se
## 1 22339 22339 22339 22339 3118703 1801997 226 1.451674 0.133038
Even thought the MDRR in the matched population is higher, meaning we have less power, we should of course not be fooled: matching most likely eliminates confounding, and is therefore preferred to not matching.
To gain a better understanding of the amount of follow-up available
we can also inspect the distribution of follow-up time. We defined
follow-up time as time at risk, so not censored by the occurrence of the
outcome. The getFollowUpDistribution
can provide a simple
overview:
getFollowUpDistribution(population = matchedPop)
## 100% 75% 50% 25% 0% Treatment
## 1 2 60 60 126 4184 1
## 2 2 45 60 67 2996 0
The output is telling us number of days of follow-up each quantile of the study population has. We can also plot the distribution:
plotFollowUpDistribution(population = matchedPop)
The outcome model is a model describing which variables are associated with the outcome.
In theory we could fit an outcome model without using the propensity scores. In this example we are fitting an outcome model using a Cox regression:
outcomeModel <- fitOutcomeModel(population = studyPop,
modelType = "cox")
outcomeModel
## Model type: cox
## Stratified: FALSE
## Use covariates: FALSE
## Use inverse probability of treatment weighting: FALSE
## Status: OK
##
## Estimate lower .95 upper .95 logRr seLogRr
## treatment 1.25115 1.03524 1.51802 0.22406 0.0976
But of course we want to make use of the matching done on the propensity score:
outcomeModel <- fitOutcomeModel(population = matchedPop,
modelType = "cox",
stratified = TRUE)
outcomeModel
## Model type: cox
## Stratified: TRUE
## Use covariates: FALSE
## Use inverse probability of treatment weighting: FALSE
## Status: OK
##
## Estimate lower .95 upper .95 logRr seLogRr
## treatment 0.9942982 0.7212731 1.3608869 -0.0057181 0.162
Note that we define the sub-population to be only those in the
matchedPop
object, which we created earlier by matching on
the propensity score. We also now use a stratified Cox model,
conditioning on the propensity score match sets.
Instead of matching or stratifying we can also perform Inverse Probability of Treatment Weighting (IPTW):
outcomeModel <- fitOutcomeModel(population = ps,
modelType = "cox",
inversePtWeighting = TRUE)
outcomeModel
## Model type: cox
## Stratified: FALSE
## Use covariates: FALSE
## Use inverse probability of treatment weighting: TRUE
## Status: OK
##
## Estimate lower .95 upper .95 logRr seLogRr
## treatment 1.15095 0.83724 1.61142 0.14059 0.167
We may be interested whether the effect is different across different groups in the population. To explore this, we may include interaction terms in the model. In this example we include three interaction terms:
interactionCovariateIds <- c(8532001, 201826210, 21600960413)
# 8532001 = Female
# 201826210 = Type 2 Diabetes
# 21600960413 = Concurent use of antithrombotic agents
outcomeModel <- fitOutcomeModel(population = matchedPop,
modelType = "cox",
stratified = TRUE,
interactionCovariateIds = interactionCovariateIds)
outcomeModel
## Model type: cox
## Stratified: TRUE
## Use covariates: FALSE
## Use inverse probability of treatment weighting: FALSE
## Status: OK
##
## Estimate lower .95 upper .95 logRr seLogRr
## treatment 1.24991 0.87272 1.80961 0.22307 0.1860
## treatment * condition_era group during day -365 through 0 days relative to index: Type 2 diabetes mellitus 1.05089 0.68593 1.62105 0.04964 0.2194
## treatment * drug_era group during day 0 through 0 days relative to index: ANTITHROMBOTIC AGENTS 0.63846 0.42639 0.96305 -0.44870 0.2078
## treatment * gender = FEMALE 0.78988 0.54227 1.14572 -0.23587 0.1908
Note that you can use the grepCovariateNames
to find
covariate IDs.
It is prudent to verify that covariate balance has also been achieved in the subgroups of interest. For example, we can check the covariate balance in the subpopulation of females:
balanceFemale <- computeCovariateBalance(population = matchedPop,
cohortMethodData = cohortMethodData,
subgroupCovariateId = 8532001)
plotCovariateBalanceScatterPlot(balanceFemale)
One final refinement would be to use the same covariates we used to
fit the propensity model to also fit the outcome model. This way we are
more robust against misspecification of the model, and more likely to
remove bias. For this we use the regularized Cox regression in the
Cyclops
package. (Note that the treatment variable is
automatically excluded from regularization.)
outcomeModel <- fitOutcomeModel(population = matchedPop,
cohortMethodData = cohortMethodData,
modelType = "cox",
stratified = TRUE,
useCovariates = TRUE)
outcomeModel
## Model type: cox
## Stratified: TRUE
## Use covariates: TRUE
## Use inverse probability of treatment weighting: FALSE
## Status: OK
## Prior variance: 0.0374185437128226
##
## Estimate lower .95 upper .95 logRr seLogRr
## treatment 0.9985318 0.6870719 1.4438456 -0.0014693 0.1894
We can inspect more details of the outcome model:
## 900000010805
## 0.9985318
## [1] 0.6870719 1.4438456
We can also see the covariates that ended up in the outcome model:
getOutcomeModel(outcomeModel, cohortMethodData)
## coefficient id name
## 1 -0.001469294 9e+11 Treatment
We can create the Kaplan-Meier plot:
plotKaplanMeier(matchedPop, includeZero = FALSE)
Note that the Kaplan-Meier plot will automatically adjust for any stratification, matching, or trimming that may have been applied.
We can also plot time-to-event, showing both events before and after the index date, and events during and outside the defined time-at-risk window. This plot can provide insight into the temporal pattern of the outcome relative to the exposures:
plotTimeToEvent(cohortMethodData = cohortMethodData,
outcomeId = 3,
firstExposureOnly = FALSE,
washoutPeriod = 0,
removeDuplicateSubjects = "keep all",
minDaysAtRisk = 1,
riskWindowStart = 0,
startAnchor = "cohort start",
riskWindowEnd = 30,
endAnchor = "cohort end")
Note that this plot does not show any adjustment for the propensity score.
Considerable work has been dedicated to provide the
CohortMethod
package.
citation("CohortMethod")
##
## To cite package 'CohortMethod' in publications use:
##
## Schuemie M, Suchard M, Ryan P (2023). _CohortMethod: New-User Cohort Method with Large Scale Propensity and Outcome Models_. https://ohdsi.github.io/CohortMethod,
## https://github.com/OHDSI/CohortMethod.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {CohortMethod: New-User Cohort Method with Large Scale Propensity and Outcome
## Models},
## author = {Martijn Schuemie and Marc Suchard and Patrick Ryan},
## year = {2023},
## note = {https://ohdsi.github.io/CohortMethod,
## https://github.com/OHDSI/CohortMethod},
## }
Further, CohortMethod
makes extensive use of the
Cyclops
package.
citation("Cyclops")
##
## To cite Cyclops in publications use:
##
## Suchard MA, Simpson SE, Zorych I, Ryan P, Madigan D (2013). "Massive parallelization of serial inference algorithms for complex generalized linear models." _ACM Transactions on
## Modeling and Computer Simulation_, *23*, 10. <https://dl.acm.org/doi/10.1145/2414416.2414791>.
##
## A BibTeX entry for LaTeX users is
##
## @Article{,
## author = {M. A. Suchard and S. E. Simpson and I. Zorych and P. Ryan and D. Madigan},
## title = {Massive parallelization of serial inference algorithms for complex generalized linear models},
## journal = {ACM Transactions on Modeling and Computer Simulation},
## volume = {23},
## pages = {10},
## year = {2013},
## url = {https://dl.acm.org/doi/10.1145/2414416.2414791},
## }
This work is supported in part through the National Science Foundation grant IIS 1251151.