runEnsembleModel(population, dataList, modelList, testSplit = "time",
  testFraction = 0.2, splitSeed = NULL, nfold = 3, saveDirectory = NULL,
  saveEnsemble = F, savePlpData = F, savePlpResult = F,
  savePlpPlots = F, saveEvaluation = F, analysisId = NULL,
  verbosity = "INFO", ensembleStrategy = "mean")



The population created using createStudyPopulation() who will be used to develop the model


An list of object of type plpData - the patient level prediction data extracted from the CDM.


An list of type of base model created using one of the function in final ensembling model, the base model can be any model implemented in this package.


Either 'person' or 'time' specifying the type of evaluation used. 'time' find the date where testFraction of patients had an index after the date and assigns patients with an index prior to this date into the training set and post the date into the test set 'person' splits the data into test (1-testFraction of the data) and train (validationFraction of the data) sets. The split is stratified by the class label.


The fraction of the data to be used as the test set in the patient split evaluation.


The seed used to split the test/train set when using a person type testSplit


The number of folds used in the cross validation (default 3)


The path to the directory where the results will be saved (if NULL uses working directory)


Binary indicating whether to save the ensemble


Binary indicating whether to save the plpData object (default is F)


Binary indicating whether to save the object returned by runPlp (default is F)


Binary indicating whether to save the performance plots as pdf files (default is F)


Binary indicating whether to save the oerformance as csv files (default is T)


The analysis ID


Sets the level of the verbosity. If the log level is at or higher in priority than the logger threshold, a message will print. The levels are:

  • DEBUGHighest verbosity showing all debug statements

  • TRACEShowing information about start and end of steps

  • INFOShow informative information (Default)

  • WARNShow warning messages

  • ERRORShow error messages

  • FATALBe silent except for fatal errors


The strategy used for ensembling the outputs from different models, it can be 'mean', 'product', 'weighted' and 'stacked' 'mean' the average probability from differnt models 'product' the product rule 'weighted' the weighted average probability from different models using train AUC as weights. 'stacked' the stakced ensemble trains a logistics regression on different models.