A function that wraps around FeatureExtraction::tidyCovariateData to normalise the data and remove rare or redundant features


preprocessData(covariateData, preprocessSettings = createPreprocessSettings())



The covariate part of the training data created by splitData after being sampled and having any required feature engineering


The settings for the preprocessing created by createPreprocessSettings The data processed


The covariateData object with the processed covariates


Returns an object of class covariateData that has been processed. This includes normalising the data and removing rare or redundant features. Redundant features are features that within an analysisId together cover all obervations.


plpData <- simulatePlpData(simulationProfile, n=1000)
#> Generating covariates
#> Generating cohorts
#> Generating outcomes
preProcessedData <- preprocessData(plpData$covariateData, createPreprocessSettings())
#> Removing 0 redundant covariates
#> Removing 0 infrequent covariates
#> Normalizing covariates
#> Tidying covariates took 0.617 secs
# check age is normalized by max value
preProcessedData$covariates %>% dplyr::filter(.data$covariateId == 1002)
#> # Source:   SQL [?? x 3]
#> # Database: sqlite 3.47.1 [/tmp/Rtmpox5Ykh/file22a36fae1313.sqlite]
#>    rowId covariateId covariateValue
#>    <int>       <dbl>          <dbl>
#>  1     1        1002          0.851
#>  2     2        1002          0.830
#>  3     3        1002          0.702
#>  4     4        1002          0.766
#>  5     5        1002          0.723
#>  6     6        1002          1    
#>  7     7        1002          0.723
#>  8     8        1002          0.872
#>  9     9        1002          0.915
#> 10    10        1002          0.809
#> # ℹ more rows