standardize_prevalence.RdApplies direct method age-sex standardization to crude prevalence data using a reference population. Supports demographic bounds matching and age truncation for real-world database patterns (e.g., Optum age masking).
standardize_prevalence(
prevalenceData,
referencePopulation,
ageMin = NULL,
ageMax = NULL,
ageRightTruncation = NULL
)Data frame with stratified prevalence data. Required columns: age, gender, numerator, denominator
StandardizationReference object defining the standard population for weighting
Numeric. Minimum age for filtering reference population. If NULL (default), no lower bound applied.
Numeric. Maximum age for filtering reference population. If NULL (default), no upper bound applied.
Numeric. Optional age threshold for collapsing ages >= threshold into single "threshold+" group. Useful for handling database age masking (e.g., Optum: 70+ all collapsed to "70+"). If NULL (default), no truncation applied.
Data frame with standardized prevalence results. One row per analysis-span combination. Columns:
analysisId: Character, unique identifier for the analysis
spanLabel: Character, label for the time span/period
totalNum: Integer, total numerator (cases) across all strata
totalDenom: Integer, total denominator (population) across all strata
crudeStat: Numeric, crude prevalence rate (per 100,000)
stdStat: Numeric, age-sex standardized prevalence rate (per 100,000)
reference_name: Character, name of the reference population used
reference_year: Integer, year of the reference population
Algorithm (direct method standardization):
Step 1: Validate & apply right truncation to reference population
If ageRightTruncation specified: validate that threshold is at group boundary (fails fast if mid-group)
Apply truncation + demographic bounds to reference
Step 2: Filter & prepare crude prevalence
Convert gender concept IDs (8532→Female, 8507→Male)
Convert age to numeric, apply ageMin/ageMax filters
Apply right truncation: ages >= threshold → "threshold+"
Step 3: Map crude ages → reference group labels
"N+" values pass through unchanged
For single-year references: zero-pad to "000", "001", ...
For grouped references: lookup which age group each value falls into
Step 4: Summarize by mapped age-gender groups
Group by analysisId, spanLabel, age, gender; sum numerator and denominator
Calculate crude rate per 100,000
Step 5: Re-filter reference to matched crude age-gender combos only, renormalize weights
Filter ref_base to only age-gender pairs present in prevalence data
Renormalize weights to sum to 1.0 within matched subset
Step 6: Join, standardize, and aggregate
inner_join crude rates to reference weights by (age, gender)
Calculate stdValue = rate × weight for each stratum
Aggregate: stdStat = sum(stdValue) per analysisId + spanLabel
Output includes totalNum, totalDenom, crudeStat, and stdStat