Creates a learning curve object, which can be plotted using the
plotLearningCurve()
function.
createLearningCurve(
plpData,
outcomeId,
parallel = T,
cores = 4,
modelSettings,
saveDirectory = getwd(),
analysisId = "learningCurve",
populationSettings = createStudyPopulationSettings(),
splitSettings = createDefaultSplitSetting(),
trainFractions = c(0.25, 0.5, 0.75),
trainEvents = NULL,
sampleSettings = createSampleSettings(),
featureEngineeringSettings = createFeatureEngineeringSettings(),
preprocessSettings = createPreprocessSettings(minFraction = 0.001, normalize = T),
logSettings = createLogSettings(),
executeSettings = createExecuteSettings(runSplitData = T, runSampleData = F,
runfeatureEngineering = F, runPreprocessData = T, runModelDevelopment = T,
runCovariateSummary = F)
)
An object of type plpData
- the patient level prediction
data extracted from the CDM.
(integer) The ID of the outcome.
Whether to run the code in parallel
The number of computer cores to use if running in parallel
An object of class modelSettings
created using one of the function:
setLassoLogisticRegression()
A lasso logistic regression model
setGradientBoostingMachine()
A gradient boosting machine
setAdaBoost()
An ada boost model
setRandomForest()
A random forest model
setDecisionTree()
A decision tree model
setKNN()
A KNN model
The path to the directory where the results will be saved (if NULL uses working directory)
(integer) Identifier for the analysis. It is used to create, e.g., the result folder. Default is a timestamp.
An object of type populationSettings
created using createStudyPopulationSettings
that
specifies how the data class labels are defined and addition any exclusions to apply to the
plpData cohort
An object of type splitSettings
that specifies how to split the data into train/validation/test.
The default settings can be created using createDefaultSplitSetting
.
A list of training fractions to create models for.
Note, providing trainEvents
will override your input to
trainFractions
.
Events have shown to be determinant of model performance.
Therefore, it is recommended to provide trainEvents
rather than
trainFractions
. Note, providing trainEvents
will override
your input to trainFractions
. The format should be as follows:
c(500, 1000, 1500)
- a list of training events
An object of type sampleSettings
that specifies any under/over sampling to be done.
The default is none.
An object of featureEngineeringSettings
specifying any feature engineering to be learned (using the train data)
An object of preprocessSettings
. This setting specifies the minimum fraction of
target population who must have a covariate for it to be included in the model training
and whether to normalise the covariates before training
An object of logSettings
created using createLogSettings
specifying how the logging is done
An object of executeSettings
specifying which parts of the analysis to run
A learning curve object containing the various performance measures
obtained by the model for each training set fraction. It can be plotted
using plotLearningCurve
.
if (FALSE) {
# define model
modelSettings = PatientLevelPrediction::setLassoLogisticRegression()
# create learning curve
learningCurve <- PatientLevelPrediction::createLearningCurve(population,
plpData,
modelSettings)
# plot learning curve
PatientLevelPrediction::plotLearningCurve(learningCurve)
}