The SelfControlledCohort package includes a suite of
diagnostics that evaluate whether the assumptions of the Self-Controlled
Cohort (SCC) design hold for a given analysis. These diagnostics run
automatically when runDiagnostics = TRUE and determine
whether study results should be unblinded (viewed) or
kept blinded until issues are resolved.
This vignette describes each diagnostic, the assumption it checks, and how results are interpreted.
Four core diagnostics are available to assess the validity of the SCC analysis.
| Diagnostic Name | Assumption Tested | Default Threshold |
|---|---|---|
| MDRR | Adequate statistical power | MDRR <= 10.0 |
| PRE_EXPOSURE | Correct temporal ordering | Rate Ratio <= 1.0, p > 0.05 |
| EVENT_DEPENDENT_OBSERVATION | Non-informative censoring | Proportion <= 10% |
| EASE | Low systematic error | EASE <= 0.25 |
Default thresholds are available via
getDefaultDiagnosticThresholds():
library(SelfControlledCohort)
str(getDefaultDiagnosticThresholds())
#> List of 6
#> $ mdrrMaxAcceptable : num 10
#> $ maxPreExposureProportion : num 0.05
#> $ preExposurePThreshold : num 0.05
#> $ maxEventDependentCensoring: num 0.25
#> $ minEventsPerWindow : num 3
#> $ easeMaxAcceptable : num 0.25The MDRR quantifies the smallest rate ratio the study has 80% power to detect at alpha = 0.05. A high MDRR indicates that only very large effects would be detected — the study is underpowered.
The calculation uses the Musonda (2006) Signed Root Likelihood (SRL1) method, which is specifically designed for self-controlled designs. It finds the rate ratio satisfying the target power (80%) given the observed person-time and event counts in exposed and unexposed windows.
# Well-powered study
computeMdrrForRateRatio(
exposedPersonTime = 50000,
unexposedPersonTime = 150000,
exposedEvents = 40,
unexposedEvents = 90
)
# Underpowered study (SRL1 solver returns NA if power cannot be met)
computeMdrrForRateRatio(
exposedPersonTime = 500,
unexposedPersonTime = 1500,
exposedEvents = 3,
unexposedEvents = 7
)This diagnostic detects whether outcomes occur before the exposure start date at a rate higher than expected. In a properly specified SCC analysis, outcomes should not systematically precede exposure.
Pre-exposure outcomes suggest one or more of:
The diagnostic is performed using a high-performance SQL query that aggregates counts directly in the database. For each target-outcome pair:
exposure_start_date and the window after.rateratio.test::rateratio.test.This diagnostic identifies whether the observation period ends shortly after an outcome event. If it does, the outcome may be causing censoring (e.g., the outcome leads to death or disenrollment), which biases the rate ratio.
The SCC design compares rates across exposed and unexposed windows within the same person. If observation tends to end after the outcome, then:
EASE quantifies the total expected systematic error in study estimates, combining both bias (deviation of the null distribution mean from zero) and imprecision (spread of the null distribution). It is computed from the null distribution fitted on negative control estimates.
Unlike the other diagnostics, EASE requires negative
controls and is computed after estimation
(during calibration). If no negativeControlPairs are
provided, the EASE diagnostic is simply skipped.
EmpiricalCalibration::fitNull().EmpiricalCalibration::computeExpectedAbsoluteSystematicError().The resulting value represents the expected absolute difference between the estimated and true log rate ratio for a random study estimate drawn from this analysis.
# Compute EASE from negative control estimates
negatives <- data.frame(
rr = c(1.2, 0.8, 1.0, 1.1, 0.95),
seLogRr = c(0.2, 0.1, 0.3, 0.15, 0.25)
)
computeEase(negatives)The individual diagnostics feed into a two-tier blinding system:
Diagnostics are run automatically when
runDiagnostics = TRUE (the default):
r eval=FALSE runSelfControlledCohort( connectionDetails = connectionDetails, cdmDatabaseSchema = "cdm", exposureIds = c(1118084), outcomeIds = c(313217), databaseId = "my_db", resultExportPath = "results", runDiagnostics = TRUE )
Results are saved to scc_diagnostics_summary.csv in the
export folder.
thresholds <- getDefaultDiagnosticThresholds()
thresholds$mdrrMaxAcceptable <- 15.0 # Allow higher MDRR
thresholds$maxPreExposureProportion <- 0.10 # Allow up to 10% pre-exposure
runSelfControlledCohort(
...,
runDiagnostics = TRUE,
diagnosticThresholds = thresholds
)