Direct Method Standardization of Prevalence Rates — standardize

Applies direct method age-sex standardization to crude prevalence data using a reference population. Supports demographic bounds matching and age truncation for real-world database patterns (e.g., Optum age masking).

standardize_prevalence(
  prevalenceData,
  referencePopulation,
  ageMin = NULL,
  ageMax = NULL,
  ageRightTruncation = NULL
)

Arguments

prevalenceData: Data frame with stratified prevalence data. Required columns: age, gender, numerator, denominator
referencePopulation: StandardizationReference object defining the standard population for weighting
ageMin: Numeric. Minimum age for filtering reference population. If NULL (default), no lower bound applied.
ageMax: Numeric. Maximum age for filtering reference population. If NULL (default), no upper bound applied.
ageRightTruncation: Numeric. Optional age threshold for collapsing ages >= threshold into single "threshold+" group. Useful for handling database age masking (e.g., Optum: 70+ all collapsed to "70+"). If NULL (default), no truncation applied.

Value

Data frame with standardized prevalence results. One row per analysis-span combination. Columns:

analysisId: Character, unique identifier for the analysis
spanLabel: Character, label for the time span/period
totalNum: Integer, total numerator (cases) across all strata
totalDenom: Integer, total denominator (population) across all strata
crudeStat: Numeric, crude prevalence rate (per 100,000)
stdStat: Numeric, age-sex standardized prevalence rate (per 100,000)
reference_name: Character, name of the reference population used
reference_year: Integer, year of the reference population

Details

Algorithm (direct method standardization):

Step 1: Validate & apply right truncation to reference population

If ageRightTruncation specified: validate that threshold is at group boundary (fails fast if mid-group)
Apply truncation + demographic bounds to reference

Step 2: Filter & prepare crude prevalence

Convert gender concept IDs (8532→Female, 8507→Male)
Convert age to numeric, apply ageMin/ageMax filters
Apply right truncation: ages >= threshold → "threshold+"

Step 3: Map crude ages → reference group labels

"N+" values pass through unchanged
For single-year references: zero-pad to "000", "001", ...
For grouped references: lookup which age group each value falls into

Step 4: Summarize by mapped age-gender groups

Group by analysisId, spanLabel, age, gender; sum numerator and denominator
Calculate crude rate per 100,000

Step 5: Re-filter reference to matched crude age-gender combos only, renormalize weights

Filter ref_base to only age-gender pairs present in prevalence data
Renormalize weights to sum to 1.0 within matched subset

Step 6: Join, standardize, and aggregate

inner_join crude rates to reference weights by (age, gender)
Calculate stdValue = rate × weight for each stratum
Aggregate: stdStat = sum(stdValue) per analysisId + spanLabel
Output includes totalNum, totalDenom, crudeStat, and stdStat