Skip to contents

Extracting data from the OMOP CDM database

Functions for getting the necessary data from the database in Common Data Model and saving/loading.

createDatabaseDetails()
Create a setting that holds the details about the cdmDatabase connection for data extraction
createRestrictPlpDataSettings()
createRestrictPlpDataSettings define extra restriction settings when calling getPlpData
getPlpData()
Extract the patient level prediction data from the server
getEunomiaPlpData()
Create a plpData object from the Eunomia database'
savePlpData()
Save the plpData to folder
loadPlpData()
Load the plpData from a folder
getCohortCovariateData()
Extracts covariates based on cohorts
print(<plpData>)
Print a plpData object
print(<summary.plpData>)
Print a summary.plpData object
summary(<plpData>)
Summarize a plpData object

Settings for designing a prediction models

Design settings required when developing a model.

createStudyPopulationSettings()
create the study population settings
createDefaultSplitSetting()
Create the settings for defining how the plpData are split into test/validation/train sets using default splitting functions (either random stratified by outcome, time or subject splitting)
createExistingSplitSettings()
Create the settings for defining how the plpData are split into test/validation/train sets using an existing split - good to use for reproducing results from a different run
createSampleSettings()
Create the settings for defining how the trainData from splitData are sampled using default sample functions.
createFeatureEngineeringSettings()
Create the settings for defining any feature engineering that will be done
createPreprocessSettings()
Create the settings for preprocessing the trainData.

Optional design settings

Settings for optional steps that can be used in the PLP pipeline

createCohortCovariateSettings()
Extracts covariates based on cohorts
createRandomForestFeatureSelection()
Create the settings for random foreat based feature selection
createUnivariateFeatureSelection()
Create the settings for defining any feature selection that will be done
createSplineSettings()
Create the settings for adding a spline for continuous variables
createStratifiedImputationSettings()
Create the settings for using stratified imputation.
createNormalizer()
Create the settings for normalizing the data @param type The type of normalization to use, either "minmax" or "robust"
createSimpleImputer()
Create Simple Imputer settings
createIterativeImputer()
Create Iterative Imputer settings
createRareFeatureRemover()
Create the settings for removing rare features

External validation

createValidationDesign()
createValidationDesign - Define the validation design for external validation
validateExternal()
validateExternal - Validate model performance on new data
createValidationSettings()
createValidationSettings define optional settings for performing external validation
recalibratePlp()
recalibratePlp
recalibratePlpRefit()
recalibratePlpRefit

Execution settings when developing a model

Execution settings required when developing a model.

createLogSettings()
Create the settings for logging the progression of the analysis
createExecuteSettings()
Creates list of settings specifying what parts of runPlp to execute
createDefaultExecuteSettings()
Creates default list of settings specifying what parts of runPlp to execute

Binary Classification Models

Functions for creating binary models

setAdaBoost()
Create setting for AdaBoost with python DecisionTreeClassifier base estimator
setDecisionTree()
Create setting for the scikit-learn DecisionTree with python
setGradientBoostingMachine()
Create setting for gradient boosting machine model using gbm_xgboost implementation
setLassoLogisticRegression()
Create modelSettings for lasso logistic regression
setMLP()
Create setting for neural network model with python's scikit-learn. For bigger models, consider using DeepPatientLevelPrediction package.
setNaiveBayes()
Create setting for naive bayes model with python
setRandomForest()
Create setting for random forest model using sklearn
setSVM()
Create setting for the python sklearn SVM (SVC function)
setIterativeHardThresholding()
Create setting for Iterative Hard Thresholding model
setLightGBM()
Create setting for gradient boosting machine model using lightGBM (https://github.com/microsoft/LightGBM/tree/master/R-package).

Survival Models

Functions for creating survival models

setCoxModel()
Create setting for lasso Cox model

Single Patient-Level Prediction Model

Functions for training/evaluating/applying a single patient-level-prediction model

runPlp()
runPlp - Develop and internally evaluate a model using specified settings
externalValidateDbPlp()
externalValidateDbPlp - Validate a model on new databases
savePlpModel()
Saves the plp model
loadPlpModel()
loads the plp model
savePlpResult()
Saves the result from runPlp into the location directory
loadPlpResult()
Loads the evalaution dataframe
diagnosePlp()
diagnostic - Investigates the prediction problem settings - use before training a model

Multiple Patient-Level Prediction Models

Functions for training multiple patient-level-prediction model in an efficient way.

createModelDesign()
Specify settings for developing a single model
runMultiplePlp()
Run a list of predictions analyses
validateMultiplePlp()
externally validate the multiple plp models across new datasets
savePlpAnalysesJson()
Save the modelDesignList to a json file
loadPlpAnalysesJson()
Load the multiple prediction json settings from a file
diagnoseMultiplePlp()
Run a list of predictions diagnoses

Individual pipeline functions

Functions for running parts of the PLP workflow

createStudyPopulation()
Create a study population
splitData()
Split the plpData into test/train sets using a splitting settings of class splitSettings
preprocessData()
A function that wraps around FeatureExtraction::tidyCovariateData to normalise the data and remove rare or redundant features
fitPlp()
fitPlp
predictPlp()
predictPlp
evaluatePlp()
evaluatePlp
covariateSummary()
covariateSummary

Saving results into database

Functions for saving the prediction model and performances into a database.

insertResultsToSqlite()
Create sqlite database with the results
createPlpResultTables()
Create the results tables to store PatientLevelPrediction models and results into a database
createDatabaseSchemaSettings()
Create the PatientLevelPrediction database result schema settings
extractDatabaseToCsv()
Exports all the results from a database into csv files
insertCsvToDatabase()
Function to insert results into a database from csvs
migrateDataModel()
Migrate Data model

Shiny Viewers

Functions for viewing results via a shiny app

viewPlp()
viewPlp - Interactively view the performance and model settings
viewMultiplePlp()
open a local shiny app for viewing the result of a multiple PLP analyses
viewDatabaseResultPlp()
open a local shiny app for viewing the result of a PLP analyses from a database

Plotting

Functions for various performance plots

plotPlp()
Plot all the PatientLevelPrediction plots
plotSparseRoc()
Plot the ROC curve using the sparse thresholdSummary data frame
plotSmoothCalibration()
Plot the smooth calibration as detailed in Calster et al. "A calibration heirarchy for risk models was defined: from utopia to empirical data" (2016)
plotSparseCalibration()
Plot the calibration
plotSparseCalibration2()
Plot the conventional calibration
plotNetBenefit()
Plot the net benefit
plotDemographicSummary()
Plot the Observed vs. expected incidence, by age and gender
plotF1Measure()
Plot the F1 measure efficiency frontier using the sparse thresholdSummary data frame
plotGeneralizability()
Plot the train/test generalizability diagnostic
plotPrecisionRecall()
Plot the precision-recall curve using the sparse thresholdSummary data frame
plotPredictedPDF()
Plot the Predicted probability density function, showing prediction overlap between true and false cases
plotPreferencePDF()
Plot the preference score probability density function, showing prediction overlap between true and false cases #'
plotPredictionDistribution()
Plot the side-by-side boxplots of prediction distribution, by class
plotVariableScatterplot()
Plot the variable importance scatterplot
outcomeSurvivalPlot()
Plot the outcome incidence over time

Learning Curves

Functions for creating and plotting learning curves

createLearningCurve()
createLearningCurve
plotLearningCurve()
plotLearningCurve

Simulation

Functions for simulating PLP data objects.

simulatePlpData()
Generate simulated data
simulationProfile
A simulation profile for generating synthetic patient level prediction data

Data manipulation functions

Functions for manipulating data

toSparseM()
Convert the plpData in COO format into a sparse R matrix
MapIds()
Map covariate and row Ids so they start from 1

Helper/utility functions

listAppend()
join two lists
listCartesian()
Cartesian product
createTempModelLoc()
Create a temporary model location
configurePython()
Sets up a python environment to use for PLP (can be conda or venv)
setPythonEnvironment()
Use the python environment created using configurePython()

Evaluation measures

averagePrecision()
Calculate the average precision
brierScore()
brierScore
calibrationLine()
calibrationLine
computeAuc()
Compute the area under the ROC curve
ici()
Calculate the Integrated Calibration Index from Austin and Steyerberg https://onlinelibrary.wiley.com/doi/full/10.1002/sim.8281
modelBasedConcordance()
Calculate the model-based concordance, which is a calculation of the expected discrimination performance of a model under the assumption the model predicts the "TRUE" outcome as detailed in van Klaveren et al. https://pubmed.ncbi.nlm.nih.gov/27251001/
computeGridPerformance()
Computes grid performance with a specified performance function
getCalibrationSummary()
Get a sparse summary of the calibration
getDemographicSummary()
Get a demographic summary
getThresholdSummary()
Calculate all measures for sparse ROC
getPredictionDistribution()
Calculates the prediction distribution

Saving/loading models as json

Functions for saving or loading models as json

sklearnFromJson()
Loads sklearn python model from json
sklearnToJson()
Saves sklearn python model object to json in path

Load/save for sharing

Functions for loading/saving objects for sharing

savePlpShareable()
Save the plp result as json files and csv files for transparent sharing
loadPlpShareable()
Loads the plp result saved as json/csv files for transparent sharing
loadPrediction()
Loads the prediction dataframe to json
savePrediction()
Saves the prediction dataframe to a json file

Feature importance

pfi()
Permutation Feature Importance

Other functions

predictCyclops()
Create predictive probabilities
predictGlm()
predict using a logistic regression model
createGlmModel()
createGlmModel
createSklearnModel()
Plug an existing scikit learn python model into the PLP framework