ensemble - Create an ensembling model using different models — runEnsembleModel • PatientLevelPrediction

runEnsembleModel(
  population,
  dataList,
  modelList,
  testSplit = "time",
  testFraction = 0.2,
  stackerUseCV = TRUE,
  splitSeed = NULL,
  nfold = 3,
  saveDirectory = NULL,
  saveEnsemble = F,
  savePlpData = F,
  savePlpResult = F,
  savePlpPlots = F,
  saveEvaluation = F,
  analysisId = NULL,
  verbosity = "INFO",
  ensembleStrategy = "mean",
  cores = NULL
)

Arguments

population	The population created using createStudyPopulation() who will be used to develop the model
dataList	An list of object of type `plpData` - the patient level prediction data extracted from the CDM.
modelList	An list of type of base model created using one of the function in final ensembling model, the base model can be any model implemented in this package.
testSplit	Either 'person' or 'time' specifying the type of evaluation used. 'time' find the date where testFraction of patients had an index after the date and assigns patients with an index prior to this date into the training set and post the date into the test set 'person' splits the data into test (1-testFraction of the data) and train (validationFraction of the data) sets. The split is stratified by the class label.
testFraction	The fraction of the data to be used as the test set in the patient split evaluation.
stackerUseCV	When doing stacking you can either use the train CV predictions to train the stacker (TRUE) or leave 20 percent of the data to train the stacker
splitSeed	The seed used to split the test/train set when using a person type testSplit
nfold	The number of folds used in the cross validation (default 3)
saveDirectory	The path to the directory where the results will be saved (if NULL uses working directory)
saveEnsemble	Binary indicating whether to save the ensemble
savePlpData	Binary indicating whether to save the plpData object (default is F)
savePlpResult	Binary indicating whether to save the object returned by runPlp (default is F)
savePlpPlots	Binary indicating whether to save the performance plots as pdf files (default is F)
saveEvaluation	Binary indicating whether to save the oerformance as csv files (default is T)
analysisId	The analysis ID
verbosity	Sets the level of the verbosity. If the log level is at or higher in priority than the logger threshold, a message will print. The levels are: DEBUGHighest verbosity showing all debug statements TRACEShowing information about start and end of steps INFOShow informative information (Default) WARNShow warning messages ERRORShow error messages FATALBe silent except for fatal errors
ensembleStrategy	The strategy used for ensembling the outputs from different models, it can be 'mean', 'product', 'weighted' and 'stacked' 'mean' the average probability from differnt models 'product' the product rule 'weighted' the weighted average probability from different models using train AUC as weights. 'stacked' the stakced ensemble trains a logistics regression on different models.
cores	The number of cores to use when training the ensemble

Details

This function applied a list of models and combines them into an ensemble model