Run self-controlled cohort — runSelfControlledCohort • SelfControlledCohort

runSelfControlledCohort generates population-level estimation by comparing exposed and unexposed time among exposed cohort.

runSelfControlledCohort(
  connectionDetails = NULL,
  cdmDatabaseSchema,
  connection = NULL,
  cdmVersion = 5,
  tempEmulationSchema = getOption("sqlRenderTempEmulationSchema"),
  oracleTempSchema = NULL,
  exposureIds = NULL,
  outcomeIds = NULL,
  exposureDatabaseSchema = cdmDatabaseSchema,
  exposureTable = "drug_era",
  outcomeDatabaseSchema = cdmDatabaseSchema,
  outcomeTable = "condition_era",
  firstExposureOnly = TRUE,
  firstOutcomeOnly = TRUE,
  minAge = "",
  maxAge = "",
  studyStartDate = "",
  studyEndDate = "",
  addLengthOfExposureExposed = TRUE,
  riskWindowStartExposed = 1,
  riskWindowEndExposed = 30,
  addLengthOfExposureUnexposed = TRUE,
  riskWindowEndUnexposed = -1,
  riskWindowStartUnexposed = -30,
  hasFullTimeAtRisk = FALSE,
  washoutPeriod = 0,
  followupPeriod = 0,
  computeTarDistribution = FALSE,
  computeThreads = 1,
  riskWindowsTable = "#risk_windows",
  resultsTable = "#results",
  resultsDatabaseSchema = NULL,
  postProcessFunction = NULL,
  postProcessArgs = list(),
  returnEstimates = TRUE
)

Arguments

connectionDetails: An R object of type connectionDetails created using the function createConnectionDetails in the DatabaseConnector package.
cdmDatabaseSchema: Name of database schema that contains the OMOP CDM and vocabulary.
connection: DatabaseConnector connection instance
cdmVersion: Define the OMOP CDM version used: currently support "4" and "5".
tempEmulationSchema: Some database platforms like Oracle and Impala do not truly support temp tables. To emulate temp tables, provide a schema with write privileges where temp tables can be created.
oracleTempSchema: For Oracle only: the name of the database schema where you want all temporary tables to be managed. Requires create/insert permissions to this database.
exposureIds: A vector containing the drug_concept_ids or cohort_definition_ids of the exposures of interest. If empty, all exposures in the exposure table will be included.
outcomeIds: The condition_concept_ids or cohort_definition_ids of the outcomes of interest. If empty, all the outcomes in the outcome table will be included.
exposureDatabaseSchema: The name of the database schema that is the location where the exposure data used to define the exposure cohorts is available. If exposureTable = DRUG_ERA, exposureDatabaseSchema is not used by assumed to be cdmSchema. Requires read permissions to this database.
exposureTable: The tablename that contains the exposure cohorts. If exposureTable <> DRUG_ERA, then expectation is exposureTable has format of COHORT table: cohort_concept_id, SUBJECT_ID, COHORT_START_DATE, COHORT_END_DATE.
outcomeDatabaseSchema: The name of the database schema that is the location where the data used to define the outcome cohorts is available. If exposureTable = CONDITION_ERA, exposureDatabaseSchema is not used by assumed to be cdmSchema. Requires read permissions to this database.
outcomeTable: The tablename that contains the outcome cohorts. If outcomeTable <> CONDITION_OCCURRENCE, then expectation is outcomeTable has format of COHORT table: COHORT_DEFINITION_ID, SUBJECT_ID, COHORT_START_DATE, COHORT_END_DATE.
firstExposureOnly: If TRUE, only use first occurrence of each drug concept id for each person
firstOutcomeOnly: If TRUE, only use first occurrence of each condition concept id for each person.
minAge: Integer for minimum allowable age.
maxAge: Integer for maximum allowable age.
studyStartDate: Date for minimum allowable data for index exposure. Date format is 'yyyymmdd'.
studyEndDate: Date for maximum allowable data for index exposure. Date format is 'yyyymmdd'.
addLengthOfExposureExposed: If TRUE, use the duration from drugEraStart -> drugEraEnd as part of timeAtRisk.
riskWindowStartExposed: Integer of days to add to drugEraStart for start of timeAtRisk (0 to include index date, 1 to start the day after).
riskWindowEndExposed: Additional window to add to end of exposure period (if addLengthOfExposureExposed = TRUE, then add to exposure end date, else add to exposure start date).
addLengthOfExposureUnexposed: If TRUE, use the duration from exposure start -> exposure end as part of timeAtRisk looking back before exposure start.
riskWindowEndUnexposed: Integer of days to add to exposure start for end of timeAtRisk (0 to include index date, -1 to end the day before).
riskWindowStartUnexposed: Additional window to add to start of exposure period (if addLengthOfExposureUnexposed = TRUE, then add to exposure end date, else add to exposure start date).
hasFullTimeAtRisk: If TRUE, restrict to people who have full time-at-risk exposed and unexposed.
washoutPeriod: Integer to define required time observed before exposure start.
followupPeriod: Integer to define required time observed after exposure start.
computeTarDistribution: If TRUE, computer the distribution of time-at-risk and average absolute time between treatment and outcome. Note, may add significant computation time on some database engines.
computeThreads: Number of parallel threads for computing IRRs with exact confidence intervals.
riskWindowsTable: String: optionally store the risk windows in a (non-temporary) table.
resultsTable: String: optionally store the summary results (number exposed/ unexposed patients per outcome-exposure pair) in a (non-temporary) table. Note that this table does not store the rate ratios, only the values required to calculate rate ratios.
resultsDatabaseSchema: Schema to oputput results to. Ignored if resultsTable and riskWindowsTable are temporary.
postProcessFunction: Callback function to handle batches of data. Useful for massive result sets that overflow system memory. See example.
postProcessArgs: Arguments for post processing function callback.
returnEstimates: Boolean opt to not return estimates, only useful in the case where postProcessFunction is used

Value

An object of type sccResults containing the results of the analysis.

Details

Population-level estimation method that estimates incidence rate comparison of exposed/unexposed time within an exposed cohort. If multiple exposureIds and outcomeIds are provided, estimates will be generated for every combination of exposure and outcome.

References

Ryan PB, Schuemie MJ, Madigan D.Empirical performance of a self-controlled cohort method: lessons for developing a risk identification and analysis system. Drug Safety 36 Suppl1:S95-106, 2013

Examples

if (FALSE) {
connectionDetails <- createConnectionDetails(dbms = "sql server",
                                             server = "RNDUSRDHIT07.jnj.com")
sccResult <- runSelfControlledCohort(connectionDetails,
                                     cdmDatabaseSchema = "cdm_truven_mdcr.dbo",
                                     exposureIds = c(767410, 1314924, 907879),
                                     outcomeIds = 444382,
                                     outcomeTable = "condition_era")

# Using a callback function that writes data to a csv file and not store in memory
csvFileName <- "D:/path/to/output.csv"
writeSccData <- function(data, position, csvFileName) {
  vroom::vroom_write(data, csvFileName, delim = ",", append = position != 1, na = "")
}

runSelfControlledCohort(connectionDetails,
                        cdmDatabaseSchema = "cdm_truven_mdcr.dbo",
                        exposureIds = c(767410, 1314924, 907879),
                        outcomeIds = 444382,
                        outcomeTable = "condition_era",
                        postProcessFunction = writeSccData,
                        postProcessArgs = list(csvFileName = csvFileName),
                        returnEstimates = FALSE)
}