Creates a learning curve in parallel, which can be plotted using
the plotLearningCurve() function. Currently this functionality is
only supported by Lasso Logistic Regression.
createLearningCurvePar( population, plpData, modelSettings, testSplit = "stratified", testFraction = 0.25, trainFractions = c(0.25, 0.5, 0.75), trainEvents = NULL, splitSeed = NULL, nfold = 3, indexes = NULL, verbosity = "TRACE", minCovariateFraction = 0.001, normalizeData = T, saveDirectory = getwd(), savePlpData = F, savePlpResult = F, savePlpPlots = F, saveEvaluation = F, timeStamp = FALSE, analysisId = "lc-", cores = NULL )
| population | The population created using |
|---|---|
| plpData | An object of type |
| modelSettings | An object of class
|
| testSplit | Specifies the type of evaluation used. Can be either
|
| testFraction | The fraction of the data, which will be used as the testing set in the patient split evaluation. |
| trainFractions | A list of training fractions to create models for.
Note, providing |
| trainEvents | Events have shown to be determinant of model performance.
Therefore, it is recommended to provide
|
| splitSeed | The seed used to split the testing and training set when using a 'person' type split |
| nfold | The number of folds used in the cross validation (default =
|
| indexes | A dataframe containing a rowId and index column where the
index value of -1 means in the test set, and positive integer represents
the cross validation fold (default is |
| verbosity | Sets the level of the verbosity. If the log level is at or higher in priority than the logger threshold, a message will print. The levels are:
|
| minCovariateFraction | Minimum covariate prevalence in population to avoid removal during preprocssing. |
| normalizeData | Whether to normalise the data |
| saveDirectory | Location to save log and results |
| savePlpData | Whether to save the plpData |
| savePlpResult | Whether to save the plpResult |
| savePlpPlots | Whether to save the plp plots |
| saveEvaluation | Whether to save the plp performance csv files |
| timeStamp | Include a timestamp in the log |
| analysisId | The analysis unique identifier |
| cores | The number of cores to use |
A learning curve object containing the various performance measures
obtained by the model for each training set fraction. It can be plotted
using plotLearningCurve.
if (FALSE) { # define model modelSettings = setLassoLogisticRegression() # register parallel backend registerParallelBackend() # create learning curve learningCurve <- createLearningCurvePar(population, plpData, modelSettings) # plot learning curve plotLearningCurve(learningCurve) }