A function that wraps around FeatureExtraction::tidyCovariateData to normalise the data and remove rare or redundant features
Source:R/PreprocessingData.R
preprocessData.Rd
A function that wraps around FeatureExtraction::tidyCovariateData to normalise the data and remove rare or redundant features
Usage
preprocessData(covariateData, preprocessSettings = createPreprocessSettings())
Details
Returns an object of class covariateData
that has been processed.
This includes normalising the data and removing rare or redundant features.
Redundant features are features that within an analysisId together cover
all obervations.
Examples
library(dplyr)
data("simulationProfile")
plpData <- simulatePlpData(simulationProfile, n=1000)
#> Generating covariates
#> Generating cohorts
#> Generating outcomes
preProcessedData <- preprocessData(plpData$covariateData, createPreprocessSettings())
#> Removing 0 redundant covariates
#> Removing 0 infrequent covariates
#> Normalizing covariates
#> Tidying covariates took 0.622 secs
# check age is normalized by max value
preProcessedData$covariates %>% dplyr::filter(.data$covariateId == 1002)
#> # Source: SQL [?? x 3]
#> # Database: sqlite 3.47.1 [/tmp/RtmpPJeNgk/file20b1759ba368.sqlite]
#> rowId covariateId covariateValue
#> <int> <dbl> <dbl>
#> 1 1 1002 0.851
#> 2 2 1002 0.830
#> 3 3 1002 0.702
#> 4 4 1002 0.766
#> 5 5 1002 0.723
#> 6 6 1002 1
#> 7 7 1002 0.723
#> 8 8 1002 0.872
#> 9 9 1002 0.915
#> 10 10 1002 0.809
#> # ℹ more rows