Package index

Extracting data from the OMOP CDM database

Functions for getting the necessary data from the database in Common Data Model and saving/loading.

createDatabaseDetails(): Create a setting that holds the details about the cdmDatabase connection for data extraction

createRestrictPlpDataSettings(): createRestrictPlpDataSettings define extra restriction settings when calling getPlpData

getPlpData(): Extract the patient level prediction data from the server

getEunomiaPlpData(): Create a plpData object from the Eunomia database'

savePlpData(): Save the plpData to folder

loadPlpData(): Load the plpData from a folder

getCohortCovariateData(): Extracts covariates based on cohorts

print(<plpData>): Print a plpData object

print(<summary.plpData>): Print a summary.plpData object

summary(<plpData>): Summarize a plpData object

Settings for designing a prediction models

Design settings required when developing a model.

createStudyPopulationSettings(): create the study population settings

createDefaultSplitSetting(): Create the settings for defining how the plpData are split into test/validation/train sets using default splitting functions (either random stratified by outcome, time or subject splitting)

createExistingSplitSettings(): Create the settings for defining how the plpData are split into test/validation/train sets using an existing split - good to use for reproducing results from a different run

createSampleSettings(): Create the settings for defining how the trainData from splitData are sampled using default sample functions.

createFeatureEngineeringSettings(): Create the settings for defining any feature engineering that will be done

createPreprocessSettings(): Create the settings for preprocessing the trainData.

Optional design settings

Settings for optional steps that can be used in the PLP pipeline

createCohortCovariateSettings(): Extracts covariates based on cohorts

createRandomForestFeatureSelection(): Create the settings for random foreat based feature selection

createUnivariateFeatureSelection(): Create the settings for defining any feature selection that will be done

createSplineSettings(): Create the settings for adding a spline for continuous variables

createStratifiedImputationSettings(): Create the settings for using stratified imputation.

createNormalizer(): Create the settings for normalizing the data @param type The type of normalization to use, either "minmax" or "robust"

createSimpleImputer(): Create Simple Imputer settings

createIterativeImputer(): Create Iterative Imputer settings

createRareFeatureRemover(): Create the settings for removing rare features

External validation

createValidationDesign(): createValidationDesign - Define the validation design for external validation

validateExternal(): validateExternal - Validate model performance on new data

createValidationSettings(): createValidationSettings define optional settings for performing external validation

recalibratePlp(): recalibratePlp

recalibratePlpRefit(): recalibratePlpRefit

Execution settings when developing a model

Execution settings required when developing a model.

createLogSettings(): Create the settings for logging the progression of the analysis

createExecuteSettings(): Creates list of settings specifying what parts of runPlp to execute

createDefaultExecuteSettings(): Creates default list of settings specifying what parts of runPlp to execute

Binary Classification Models

Functions for creating binary models

setAdaBoost(): Create setting for AdaBoost with python DecisionTreeClassifier base estimator

setDecisionTree(): Create setting for the scikit-learn DecisionTree with python

setGradientBoostingMachine(): Create setting for gradient boosting machine model using gbm_xgboost implementation

setLassoLogisticRegression(): Create modelSettings for lasso logistic regression

setMLP(): Create setting for neural network model with python's scikit-learn. For bigger models, consider using DeepPatientLevelPrediction package.

setNaiveBayes(): Create setting for naive bayes model with python

setRandomForest(): Create setting for random forest model using sklearn

setSVM(): Create setting for the python sklearn SVM (SVC function)

setIterativeHardThresholding(): Create setting for Iterative Hard Thresholding model

setLightGBM(): Create setting for gradient boosting machine model using lightGBM (https://github.com/microsoft/LightGBM/tree/master/R-package).

Survival Models

Functions for creating survival models

setCoxModel(): Create setting for lasso Cox model

Single Patient-Level Prediction Model

Functions for training/evaluating/applying a single patient-level-prediction model

runPlp(): runPlp - Develop and internally evaluate a model using specified settings

externalValidateDbPlp(): externalValidateDbPlp - Validate a model on new databases

savePlpModel(): Saves the plp model

loadPlpModel(): loads the plp model

savePlpResult(): Saves the result from runPlp into the location directory

loadPlpResult(): Loads the evalaution dataframe

diagnosePlp(): diagnostic - Investigates the prediction problem settings - use before training a model

Multiple Patient-Level Prediction Models

Functions for training multiple patient-level-prediction model in an efficient way.

createModelDesign(): Specify settings for developing a single model

runMultiplePlp(): Run a list of predictions analyses

validateMultiplePlp(): externally validate the multiple plp models across new datasets

savePlpAnalysesJson(): Save the modelDesignList to a json file

loadPlpAnalysesJson(): Load the multiple prediction json settings from a file

diagnoseMultiplePlp(): Run a list of predictions diagnoses

Individual pipeline functions

Functions for running parts of the PLP workflow

createStudyPopulation(): Create a study population

splitData(): Split the plpData into test/train sets using a splitting settings of class splitSettings

preprocessData(): A function that wraps around FeatureExtraction::tidyCovariateData to normalise the data and remove rare or redundant features

fitPlp(): fitPlp

predictPlp(): predictPlp

evaluatePlp(): evaluatePlp

covariateSummary(): covariateSummary

Saving results into database

Functions for saving the prediction model and performances into a database.

insertResultsToSqlite(): Create sqlite database with the results

createPlpResultTables(): Create the results tables to store PatientLevelPrediction models and results into a database

createDatabaseSchemaSettings(): Create the PatientLevelPrediction database result schema settings

extractDatabaseToCsv(): Exports all the results from a database into csv files

insertCsvToDatabase(): Function to insert results into a database from csvs

migrateDataModel(): Migrate Data model

Shiny Viewers

Functions for viewing results via a shiny app

viewPlp(): viewPlp - Interactively view the performance and model settings

viewMultiplePlp(): open a local shiny app for viewing the result of a multiple PLP analyses

viewDatabaseResultPlp(): open a local shiny app for viewing the result of a PLP analyses from a database

Plotting

Functions for various performance plots

plotPlp(): Plot all the PatientLevelPrediction plots

plotSparseRoc(): Plot the ROC curve using the sparse thresholdSummary data frame

plotSmoothCalibration(): Plot the smooth calibration as detailed in Calster et al. "A calibration heirarchy for risk models was defined: from utopia to empirical data" (2016)

plotSparseCalibration(): Plot the calibration

plotSparseCalibration2(): Plot the conventional calibration

plotNetBenefit(): Plot the net benefit

plotDemographicSummary(): Plot the Observed vs. expected incidence, by age and gender

plotF1Measure(): Plot the F1 measure efficiency frontier using the sparse thresholdSummary data frame

plotGeneralizability(): Plot the train/test generalizability diagnostic

plotPrecisionRecall(): Plot the precision-recall curve using the sparse thresholdSummary data frame

plotPredictedPDF(): Plot the Predicted probability density function, showing prediction overlap between true and false cases

plotPreferencePDF(): Plot the preference score probability density function, showing prediction overlap between true and false cases #'

plotPredictionDistribution(): Plot the side-by-side boxplots of prediction distribution, by class

plotVariableScatterplot(): Plot the variable importance scatterplot

outcomeSurvivalPlot(): Plot the outcome incidence over time

Learning Curves

Functions for creating and plotting learning curves

createLearningCurve(): createLearningCurve

plotLearningCurve(): plotLearningCurve

Simulation

Functions for simulating PLP data objects.

simulatePlpData(): Generate simulated data

simulationProfile: A simulation profile for generating synthetic patient level prediction data

Data manipulation functions

Functions for manipulating data

toSparseM(): Convert the plpData in COO format into a sparse R matrix

MapIds(): Map covariate and row Ids so they start from 1

Helper/utility functions

listAppend(): join two lists

listCartesian(): Cartesian product

createTempModelLoc(): Create a temporary model location

configurePython(): Sets up a python environment to use for PLP (can be conda or venv)

setPythonEnvironment(): Use the python environment created using configurePython()

Evaluation measures

averagePrecision(): Calculate the average precision

brierScore(): brierScore

calibrationLine(): calibrationLine

computeAuc(): Compute the area under the ROC curve

ici(): Calculate the Integrated Calibration Index from Austin and Steyerberg https://onlinelibrary.wiley.com/doi/full/10.1002/sim.8281

modelBasedConcordance(): Calculate the model-based concordance, which is a calculation of the expected discrimination performance of a model under the assumption the model predicts the "TRUE" outcome as detailed in van Klaveren et al. https://pubmed.ncbi.nlm.nih.gov/27251001/

computeGridPerformance(): Computes grid performance with a specified performance function

getCalibrationSummary(): Get a sparse summary of the calibration

getDemographicSummary(): Get a demographic summary

getThresholdSummary(): Calculate all measures for sparse ROC

getPredictionDistribution(): Calculates the prediction distribution

Saving/loading models as json

Functions for saving or loading models as json

sklearnFromJson(): Loads sklearn python model from json

sklearnToJson(): Saves sklearn python model object to json in path

Functions for loading/saving objects for sharing

savePlpShareable(): Save the plp result as json files and csv files for transparent sharing

loadPlpShareable(): Loads the plp result saved as json/csv files for transparent sharing

loadPrediction(): Loads the prediction dataframe to json

savePrediction(): Saves the prediction dataframe to a json file

Feature importance

pfi(): Permutation Feature Importance

Other functions

predictCyclops(): Create predictive probabilities

predictGlm(): predict using a logistic regression model

createGlmModel(): createGlmModel

createSklearnModel(): Plug an existing scikit learn python model into the PLP framework