Synthesize positive controls

synthesizePositiveControls(
  connectionDetails,
  cdmDatabaseSchema,
  oracleTempSchema = NULL,
  tempEmulationSchema = getOption("sqlRenderTempEmulationSchema"),
  exposureDatabaseSchema = cdmDatabaseSchema,
  exposureTable = "drug_era",
  outcomeDatabaseSchema = cdmDatabaseSchema,
  outcomeTable = "cohort",
  outputDatabaseSchema = outcomeDatabaseSchema,
  outputTable = outcomeTable,
  createOutputTable = FALSE,
  exposureOutcomePairs,
  modelType = "poisson",
  minOutcomeCountForModel = 100,
  minOutcomeCountForInjection = 25,
  minModelCount = 5,
  covariateSettings = FeatureExtraction::createCovariateSettings(useDemographicsAgeGroup
    = TRUE, useDemographicsGender = TRUE, useDemographicsIndexYear = TRUE,
    useDemographicsIndexMonth = TRUE, useConditionGroupEraLongTerm = TRUE,
    useDrugGroupEraLongTerm = TRUE, useProcedureOccurrenceLongTerm = TRUE,
    useMeasurementLongTerm = TRUE, useObservationLongTerm = TRUE, useCharlsonIndex =
    TRUE, useDcsi = TRUE, useChads2Vasc = TRUE, longTermStartDays = 365, endDays = 0),
  prior = Cyclops::createPrior("laplace", exclude = 0, useCrossValidation = TRUE),
  control = Cyclops::createControl(cvType = "auto", startingVariance = 0.1, seed = 1,
    resetCoefficients = TRUE, noiseLevel = "quiet", threads = 10),
  firstExposureOnly = FALSE,
  washoutPeriod = 183,
  riskWindowStart = 0,
  riskWindowEnd = 0,
  endAnchor = "cohort end",
  addIntentToTreat = FALSE,
  firstOutcomeOnly = FALSE,
  removePeopleWithPriorOutcomes = FALSE,
  maxSubjectsForModel = 1e+05,
  effectSizes = c(1, 1.25, 1.5, 2, 4),
  precision = 0.01,
  outputIdOffset = 1000,
  workFolder = "./SignalInjectionTemp",
  cdmVersion = "5",
  modelThreads = 1,
  generationThreads = 1
)

Arguments

connectionDetails

An R object of type ConnectionDetails created using the function createConnectionDetails in the DatabaseConnector package.

cdmDatabaseSchema

Name of database schema that contains OMOP CDM and vocabulary.

oracleTempSchema

DEPRECATED: use `tempEmulationSchema` instead.

tempEmulationSchema

Some database platforms like Oracle and Impala do not truly support temp tables. To emulate temp tables, provide a schema with write privileges where temp tables can be created.

exposureDatabaseSchema

The name of the database schema that is the location where the exposure data used to define the exposure cohorts is available. If exposureTable = DRUG_ERA, exposureDatabaseSchema is not used by assumed to be cdmSchema. Requires read permissions to this database.

exposureTable

The table name that contains the exposure cohorts. If exposureTable <> DRUG_ERA, then expectation is exposureTable has format of COHORT table: cohort_concept_id, SUBJECT_ID, COHORT_START_DATE, COHORT_END_DATE.

outcomeDatabaseSchema

The name of the database schema that is the location where the data used to define the outcome cohorts is available. If exposureTable = CONDITION_ERA, exposureDatabaseSchema is not used by assumed to be cdmSchema. Requires read permissions to this database.

outcomeTable

The table name that contains the outcome cohorts. When the table name is not CONDITION_ERA This table is expected to have the same format as the COHORT table: SUBJECT_ID, COHORT_START_DATE, COHORT_END_DATE, COHORT_CONCEPT_ID (CDM v4) or COHORT_DEFINITION_ID (CDM v5 and higher).

outputDatabaseSchema

The name of the database schema that is the location of the tables containing the new outcomesRequires write permissions to this database.

outputTable

The name of the table names that will contain the generated outcome cohorts.

createOutputTable

Should the output table be created prior to inserting the outcomes? If TRUE and the tables already exists, it will first be deleted. If FALSE, the table is assumed to exist and the outcomes will be inserted. Any existing outcomes with the same IDs will first be deleted.

exposureOutcomePairs

A data frame with at least two columns:

  • "exposureId" containing the drug_concept_ID or cohort_concept_id of the exposure variable

  • "outcomeId" containing the condition_concept_ID or cohort_concept_id of the outcome variable

modelType

Can be either "poisson" or "survival"

minOutcomeCountForModel

Minimum number of outcome events required to build a model.

minOutcomeCountForInjection

Minimum number of outcome events required to inject a signal.

minModelCount

Minimum number of negative controls having enough outcomes to fit an outcome model.

covariateSettings

An object of type covariateSettings as created using the createCovariateSettings function in the FeatureExtraction package.

prior

The prior used to fit the outcome model. See createPrior for details.

control

The control object used to control the cross-validation used to determine the hyperparameters of the prior (if applicable). See createControl for details.

firstExposureOnly

Should signals be injected only for the first exposure? (ie. assuming an acute effect)

washoutPeriod

Number of days at the start of observation for which no signals will be injected, but will be used to determine whether exposure or outcome is the first one, and for extracting covariates to build the outcome model.

riskWindowStart

The start of the risk window relative to the start of the exposure (in days). When 0, risk is assumed to start on the first day of exposure.

riskWindowEnd

The end of the risk window (in days) relative to the endAnchor.

endAnchor

The anchor point for the end of the risk window. Can be "cohort start" or "cohort end".

addIntentToTreat

If true, the signal will not only be injected in the primary time at risk, but also after the time at risk (up until the obseration period end). In both time periods, the target effect size will be enforced. This allows the same positive control synthesis to be used in both on treatment and intent-to-treat analysis variants. However, this will preclude the controls to be used in self-controlled designs that consider the time after exposure. Requires firstExposureOnly = TRUE.

firstOutcomeOnly

Should only the first outcome per person be considered when modeling the outcome?

removePeopleWithPriorOutcomes

Remove people with prior outcomes?

maxSubjectsForModel

Maximum number of people used to fit an outcome model.

effectSizes

A numeric vector of effect sizes that should be inserted.

precision

The allowed ratio between target and injected signal size.

outputIdOffset

What should be the first new outcome ID that is to be created?

workFolder

Path to a folder where intermediate data will be stored.

cdmVersion

Define the OMOP CDM version used: currently support "4" and "5".

modelThreads

Number of parallel threads to use when fitting outcome models.

generationThreads

Number of parallel threads to use when generating outcomes.

Value

A data.frame listing all the drug-pairs in combination with requested effect sizes and the real inserted effect size (might be different from the requested effect size because of sampling error).

Details

This function will insert additional outcomes for a given set of drug-outcome pairs. It is assumed that these drug-outcome pairs represent negative controls, so the true relative risk before inserting any outcomes should be 1. There are two models for inserting the outcomes during the specified risk window of the drug: a Poisson model assuming multiple outcomes could occurr during a single exposure, and a survival model considering only one outcome per exposure. It is possible to use bulk import to insert the generated outcomes in the database. This requires the environmental variable 'USE_MPP_BULK_LOAD' to be set to 'TRUE'. See ?DatabaseConnector::insertTable for details on how to configure the bulk upload.

References

Schuemie MJ, Hripcsak G, Ryan PB, Madigan D, Suchard MA. Empirical confidence interval calibration for population-level effect estimation studies in observational healthcare data. Proc Natl Acad Sci U S A. 2018 Mar 13;115(11):2571-2577.