Applies direct method age-sex standardization to crude prevalence data using a reference population. Supports demographic bounds matching and age truncation for real-world database patterns (e.g., Optum age masking).

standardize_prevalence(
  prevalenceData,
  referencePopulation,
  ageMin = NULL,
  ageMax = NULL,
  ageRightTruncation = NULL
)

Arguments

prevalenceData

Data frame with stratified prevalence data. Required columns: age, gender, numerator, denominator

referencePopulation

StandardizationReference object defining the standard population for weighting

ageMin

Numeric. Minimum age for filtering reference population. If NULL (default), no lower bound applied.

ageMax

Numeric. Maximum age for filtering reference population. If NULL (default), no upper bound applied.

ageRightTruncation

Numeric. Optional age threshold for collapsing ages >= threshold into single "threshold+" group. Useful for handling database age masking (e.g., Optum: 70+ all collapsed to "70+"). If NULL (default), no truncation applied.

Value

Data frame with standardized prevalence results. One row per analysis-span combination. Columns:

  • analysisId: Character, unique identifier for the analysis

  • spanLabel: Character, label for the time span/period

  • totalNum: Integer, total numerator (cases) across all strata

  • totalDenom: Integer, total denominator (population) across all strata

  • crudeStat: Numeric, crude prevalence rate (per 100,000)

  • stdStat: Numeric, age-sex standardized prevalence rate (per 100,000)

  • reference_name: Character, name of the reference population used

  • reference_year: Integer, year of the reference population

Details

Algorithm (direct method standardization):

Step 1: Validate & apply right truncation to reference population

  • If ageRightTruncation specified: validate that threshold is at group boundary (fails fast if mid-group)

  • Apply truncation + demographic bounds to reference

Step 2: Filter & prepare crude prevalence

  • Convert gender concept IDs (8532→Female, 8507→Male)

  • Convert age to numeric, apply ageMin/ageMax filters

  • Apply right truncation: ages >= threshold → "threshold+"

Step 3: Map crude ages → reference group labels

  • "N+" values pass through unchanged

  • For single-year references: zero-pad to "000", "001", ...

  • For grouped references: lookup which age group each value falls into

Step 4: Summarize by mapped age-gender groups

  • Group by analysisId, spanLabel, age, gender; sum numerator and denominator

  • Calculate crude rate per 100,000

Step 5: Re-filter reference to matched crude age-gender combos only, renormalize weights

  • Filter ref_base to only age-gender pairs present in prevalence data

  • Renormalize weights to sum to 1.0 within matched subset

Step 6: Join, standardize, and aggregate

  • inner_join crude rates to reference weights by (age, gender)

  • Calculate stdValue = rate × weight for each stratum

  • Aggregate: stdStat = sum(stdValue) per analysisId + spanLabel

  • Output includes totalNum, totalDenom, crudeStat, and stdStat