Tidy covariate data

tidyCovariateData(covariateData, covariates, covariateRef, populationSize,
  minFraction = 0.001, normalize = TRUE, removeRedundancy = TRUE)

Arguments

covariateData

An object as generated using the getDbCovariateData function. If provided, the covariates, covariateRef, and populationSize arguments will be ignored.

covariates

An ffdf object with the covariate values in spare format. Will be ignored if covariateData is provided.

covariateRef

An ffdf object with the covariate definitions. Will be ignored if covariateData is provided. Only needed when removeRedundancy = TRUE.

populationSize

An integer specifying the total number of unique cohort entries (rowIds). Will be ignored if covariateData is provided. Only needed when removeRedundancy = TRUE.

minFraction

Minimum fraction of the population that should have a non-zero value for a covariate for that covariate to be kept. Set to 0 to don't filter on frequency.

normalize

Normalize the coviariates? (dividing by the max)

removeRedundancy

Should redundant covariates be removed?

Details

Normalize covariate values by dividing by the max and/or remove redundant covariates and/or remove infrequent covariates. For temporal covariates, redundancy is evaluated per time ID.