Summarises the covariateData to calculate the mean and standard deviation per covariate if the labels are given it also stratifies this by class label and if the trainRowIds and testRowIds specifying the patients in the train/test sets respectively are input, these values are also stratified by train and test set
Usage
covariateSummary(
covariateData,
cohort,
labels = NULL,
strata = NULL,
variableImportance = NULL,
featureEngineering = NULL
)
Arguments
- covariateData
The covariateData part of the plpData that is extracted using
getPlpData
- cohort
The patient cohort to calculate the summary
- labels
A data.frame with the columns rowId and outcomeCount
- strata
A data.frame containing the columns rowId, strataName
- variableImportance
A data.frame with the columns covariateId and value (the variable importance value)
- featureEngineering
(currently not used ) A function or list of functions specifying any feature engineering to create covariates before summarising
Value
A data.frame containing: CovariateCount, CovariateMean and CovariateStDev for any specified stratification
Examples
data("simulationProfile")
plpData <- simulatePlpData(simulationProfile, n=100)
#> Generating covariates
#> Loading required package: FeatureExtraction
#> Loading required package: DatabaseConnector
#> Loading required package: Andromeda
#> Loading required package: dplyr
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
#> Generating cohorts
#> Generating outcomes
covariateSummary <- covariateSummary(plpData$covariateData, plpData$cohorts)
#> Calculating covariate summary @ 2025-02-17 17:02:32.168481
#> This can take a while...
#> calculating subset of strata 1
#> Restricting to subgroup
#> Calculating summary for subgroup
#> Aggregating with no labels or strata
#> Finished covariate summary @ 2025-02-17 17:02:32.421
#> Time to calculate covariate summary: 0.253 secs
head(covariateSummary)
#> # A tibble: 6 × 9
#> covariateId covariateName analysisId conceptId valueAsConceptId collisions
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 80180102 condition_occurr… 102 80180 NA NA
#> 2 81893102 condition_occurr… 102 81893 NA NA
#> 3 30753102 condition_occurr… 102 30753 NA NA
#> 4 4285898102 condition_occurr… 102 4285898 NA NA
#> 5 4266809102 condition_occurr… 102 4266809 NA NA
#> 6 4310024102 condition_occurr… 102 4310024 NA NA
#> # ℹ 3 more variables: CovariateCount <int>, CovariateMean <dbl>,
#> # CovariateStDev <dbl>