A function that wraps around FeatureExtraction::tidyCovariateData to normalise the data and remove rare or redundant features
Source:R/PreprocessingData.R
preprocessData.Rd
A function that wraps around FeatureExtraction::tidyCovariateData to normalise the data and remove rare or redundant features
Usage
preprocessData(covariateData, preprocessSettings = createPreprocessSettings())
Details
Returns an object of class covariateData
that has been processed.
This includes normalising the data and removing rare or redundant features.
Redundant features are features that within an analysisId together cover
all obervations.
Examples
library(dplyr)
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
data("simulationProfile")
plpData <- simulatePlpData(simulationProfile, n=1000)
#> Generating covariates
#> Generating cohorts
#> Generating outcomes
preProcessedData <- preprocessData(plpData$covariateData, createPreprocessSettings())
#> Removing 2 redundant covariates
#> Removing 0 infrequent covariates
#> Normalizing covariates
#> Tidying covariates took 1.15 secs
# check age is normalized by max value
preProcessedData$covariates %>% dplyr::filter(.data$covariateId == 1002)
#> # Source: SQL [?? x 3]
#> # Database: DuckDB v1.2.2 [unknown@Linux 6.11.0-1014-azure:R 4.5.0//tmp/Rtmp78KWWW/file23e870cc51b6.duckdb]
#> rowId covariateId covariateValue
#> <int> <dbl> <dbl>
#> 1 1 1002 0.660
#> 2 2 1002 0.872
#> 3 3 1002 0.830
#> 4 4 1002 0.702
#> 5 5 1002 0.809
#> 6 6 1002 0.766
#> 7 7 1002 0.787
#> 8 8 1002 0.851
#> 9 9 1002 0.787
#> 10 10 1002 0.766
#> # ℹ more rows