Creates a learning curve object, which can be plotted using the
plotLearningCurve() function.
Usage
createLearningCurve(
plpData,
outcomeId,
parallel = TRUE,
cores = 4,
modelSettings,
saveDirectory = NULL,
analysisId = "learningCurve",
populationSettings = createStudyPopulationSettings(),
splitSettings = createDefaultSplitSetting(),
trainFractions = c(0.25, 0.5, 0.75),
trainEvents = NULL,
sampleSettings = createSampleSettings(),
featureEngineeringSettings = createFeatureEngineeringSettings(),
preprocessSettings = createPreprocessSettings(minFraction = 0.001, normalize = TRUE),
logSettings = createLogSettings(),
executeSettings = createExecuteSettings(runSplitData = TRUE, runSampleData = FALSE,
runFeatureEngineering = FALSE, runPreprocessData = TRUE, runModelDevelopment = TRUE,
runCovariateSummary = FALSE)
)Arguments
- plpData
An object of type
plpData- the patient level prediction data extracted from the CDM.- outcomeId
(integer) The ID of the outcome.
- parallel
Whether to run the code in parallel
- cores
The number of computer cores to use if running in parallel
- modelSettings
An object of class
modelSettingscreated using one of the function:setLassoLogisticRegression()A lasso logistic regression modelsetGradientBoostingMachine()A gradient boosting machinesetAdaBoost()An ada boost modelsetRandomForest()A random forest modelsetDecisionTree()A decision tree modelsetKNN()A KNN model
- saveDirectory
The path to the directory where the results will be saved (if NULL uses working directory)
- analysisId
(integer) Identifier for the analysis. It is used to create, e.g., the result folder. Default is a timestamp.
- populationSettings
An object of type
populationSettingscreated usingcreateStudyPopulationSettingsthat specifies how the data class labels are defined and addition any exclusions to apply to the plpData cohort- splitSettings
An object of type
splitSettingsthat specifies how to split the data into train/validation/test. The default settings can be created usingcreateDefaultSplitSetting.- trainFractions
A list of training fractions to create models for. Note, providing
trainEventswill override your input totrainFractions.- trainEvents
Events have shown to be determinant of model performance. Therefore, it is recommended to provide
trainEventsrather thantrainFractions. Note, providingtrainEventswill override your input totrainFractions. The format should be as follows:c(500, 1000, 1500)- a list of training events
- sampleSettings
An object of type
sampleSettingsthat specifies any under/over sampling to be done. The default is none.- featureEngineeringSettings
An object of
featureEngineeringSettingsspecifying any feature engineering to be learned (using the train data)- preprocessSettings
An object of
preprocessSettings. This setting specifies the minimum fraction of target population who must have a covariate for it to be included in the model training and whether to normalise the covariates before training- logSettings
An object of
logSettingscreated usingcreateLogSettingsspecifying how the logging is done- executeSettings
An object of
executeSettingsspecifying which parts of the analysis to run