Extracting data from the OMOP CDM database

Functions for getting the necessary data from the database in Common Data Model and saving/loading.

createDatabaseDetails()

Create a setting that holds the details about the cdmDatabase connection for data extraction

createRestrictPlpDataSettings()

createRestrictPlpDataSettings define extra restriction settings when calling getPlpData

getPlpData()

Get the patient level prediction data from the server

savePlpData()

Save the cohort data to folder

loadPlpData()

Load the cohort data from a folder

getCohortCovariateData()

Extracts covariates based on cohorts

Settings for designing a prediction models

Design settings required when developing a model.

createStudyPopulationSettings()

create the study population settings

createDefaultSplitSetting()

Create the settings for defining how the plpData are split into test/validation/train sets using default splitting functions (either random stratified by outcome, time or subject splitting)

createSampleSettings()

Create the settings for defining how the trainData from splitData are sampled using default sample functions.

createFeatureEngineeringSettings()

Create the settings for defining any feature engineering that will be done

createPreprocessSettings()

Create the settings for preprocessing the trainData.

Optional design settings

Settings for optional steps that can be used in the PLP pipeline

createCohortCovariateSettings()

Extracts covariates based on cohorts

createRandomForestFeatureSelection()

Create the settings for random foreat based feature selection

createUnivariateFeatureSelection()

Create the settings for defining any feature selection that will be done

createSplineSettings()

Create the settings for adding a spline for continuous variables

createStratifiedImputationSettings()

Create the settings for adding a spline for continuous variables

External validation

createValidationDesign()

createValidationDesign - Define the validation design for external validation

validateExternal()

externalValidatePlp - Validate model performance on new data

createValidationSettings()

createValidationSettings define optional settings for performing external validation

recalibratePlp()

recalibratePlp

recalibratePlpRefit()

recalibratePlpRefit

Execution settings when developing a model

Execution settings required when developing a model.

createLogSettings()

Create the settings for logging the progression of the analysis

createExecuteSettings()

Creates list of settings specifying what parts of runPlp to execute

createDefaultExecuteSettings()

Creates default list of settings specifying what parts of runPlp to execute

Binary Classification Models

Functions for setting binary classifiers and their hyper-parameter search.

setAdaBoost()

Create setting for AdaBoost with python DecisionTreeClassifier base estimator

setDecisionTree()

Create setting for the scikit-learn 1.0.1 DecisionTree with python

setGradientBoostingMachine()

Create setting for gradient boosting machine model using gbm_xgboost implementation

setKNN()

Create setting for knn model

setLassoLogisticRegression()

Create setting for lasso logistic regression

setMLP()

Create setting for neural network model with python

setNaiveBayes()

Create setting for naive bayes model with python

setRandomForest()

Create setting for random forest model with python (very fast)

setSVM()

Create setting for the python sklearn SVM (SVC function)

setIterativeHardThresholding()

Create setting for lasso logistic regression

setLightGBM()

Create setting for gradient boosting machine model using lightGBM (https://github.com/microsoft/LightGBM/tree/master/R-package).

Survival Models

Functions for setting survival models and their hyper-parameter search.

setCoxModel()

Create setting for lasso Cox model

Single Patient-Level Prediction Model

Functions for training/evaluating/applying a single patient-level-prediction model

runPlp()

runPlp - Develop and internally evaluate a model using specified settings

externalValidateDbPlp()

externalValidateDbPlp - Validate a model on new databases

savePlpModel()

Saves the plp model

loadPlpModel()

loads the plp model

savePlpResult()

Saves the result from runPlp into the location directory

loadPlpResult()

Loads the evalaution dataframe

diagnosePlp()

diagnostic - Investigates the prediction problem settings - use before training a model

Multiple Patient-Level Prediction Models

Functions for training mutliple patient-level-prediction model in an efficient way.

createModelDesign()

Specify settings for deceloping a single model

runMultiplePlp()

Run a list of predictions analyses

validateMultiplePlp()

externally validate the multiple plp models across new datasets

savePlpAnalysesJson()

Save the modelDesignList to a json file

loadPlpAnalysesJson()

Load the multiple prediction json settings from a file

diagnoseMultiplePlp()

Run a list of predictions diagnoses

Individual pipeline functions

Functions for running parts of the PLP workflow

createStudyPopulation()

Create a study population

splitData()

Split the plpData into test/train sets using a splitting settings of class splitSettings

preprocessData()

A function that wraps around FeatureExtraction::tidyCovariateData to normalise the data and remove rare or redundant features

fitPlp()

fitPlp

predictPlp()

predictPlp

evaluatePlp()

evaluatePlp

covariateSummary()

covariateSummary

Saving results into database

Functions for saving the prediction model and performances into a database.

insertResultsToSqlite()

Create sqlite database with the results

createPlpResultTables()

Create the results tables to store PatientLevelPrediction models and results into a database

addMultipleRunPlpToDatabase()

Populate the PatientLevelPrediction results tables

addRunPlpToDatabase()

Function to add the run plp (development or validation) to database

createDatabaseSchemaSettings()

Create the PatientLevelPrediction database result schema settings

createDatabaseList()

Create a list with the database details and database meta data entries

addDiagnosePlpToDatabase()

Insert a diagnostic result into a PLP result schema database

addMultipleDiagnosePlpToDatabase()

Insert mutliple diagnosePlp results saved to a directory into a PLP result schema database

extractDatabaseToCsv()

Exports all the results from a database into csv files

insertCsvToDatabase()

Function to insert results into a database from csvs

insertModelDesignInDatabase()

Insert a model design into a PLP result schema database

migrateDataModel()

Migrate Data model

Shiny Viewers

Functions for viewing results via a shiny app

viewPlp()

viewPlp - Interactively view the performance and model settings

viewMultiplePlp()

open a local shiny app for viewing the result of a multiple PLP analyses

viewDatabaseResultPlp()

open a local shiny app for viewing the result of a PLP analyses from a database

Plotting

Functions for various performance plots

plotPlp()

Plot all the PatientLevelPrediction plots

plotSparseRoc()

Plot the ROC curve using the sparse thresholdSummary data frame

plotSmoothCalibration()

Plot the smooth calibration as detailed in Calster et al. "A calibration heirarchy for risk models was defined: from utopia to empirical data" (2016)

plotSparseCalibration()

Plot the calibration

plotSparseCalibration2()

Plot the conventional calibration

plotDemographicSummary()

Plot the Observed vs. expected incidence, by age and gender

plotF1Measure()

Plot the F1 measure efficiency frontier using the sparse thresholdSummary data frame

plotGeneralizability()

Plot the train/test generalizability diagnostic

plotPrecisionRecall()

Plot the precision-recall curve using the sparse thresholdSummary data frame

plotPredictedPDF()

Plot the Predicted probability density function, showing prediction overlap between true and false cases

plotPreferencePDF()

Plot the preference score probability density function, showing prediction overlap between true and false cases #'

plotPredictionDistribution()

Plot the side-by-side boxplots of prediction distribution, by class#'

plotVariableScatterplot()

Plot the variable importance scatterplot

outcomeSurvivalPlot()

Plot the outcome incidence over time

Learning Curves

Functions for creating and plotting learning curves

createLearningCurve()

createLearningCurve

plotLearningCurve()

plotLearningCurve

Simulation

Functions for simulating cohort method data objects.

simulatePlpData()

Generate simulated data

plpDataSimulationProfile

A simulation profile

Data manipulation functions

Functions for manipulating data

toSparseM()

Convert the plpData in COO format into a sparse R matrix

MapIds()

Map covariate and row Ids so they start from 1

Helper/utility functions

listAppend()

join two lists

listCartesian()

Cartesian product

createTempModelLoc()

Create a temporary model location

configurePython()

Sets up a virtual environment to use for PLP (can be conda or python)

setPythonEnvironment()

Use the virtual environment created using configurePython()

Evaluation measures

accuracy()

Calculate the accuracy

averagePrecision()

Calculate the average precision

brierScore()

brierScore

calibrationLine()

calibrationLine

computeAuc()

Compute the area under the ROC curve

f1Score()

Calculate the f1Score

falseDiscoveryRate()

Calculate the falseDiscoveryRate

falseNegativeRate()

Calculate the falseNegativeRate

falseOmissionRate()

Calculate the falseOmissionRate

falsePositiveRate()

Calculate the falsePositiveRate

ici()

Calculate the Integrated Calibration Information from Austin and Steyerberg https://onlinelibrary.wiley.com/doi/full/10.1002/sim.8281

modelBasedConcordance()

Calculate the model-based concordance, which is a calculation of the expected discrimination performance of a model under the assumption the model predicts the "TRUE" outcome as detailed in van Klaveren et al. https://pubmed.ncbi.nlm.nih.gov/27251001/

negativeLikelihoodRatio()

Calculate the negativeLikelihoodRatio

negativePredictiveValue()

Calculate the negativePredictiveValue

positiveLikelihoodRatio()

Calculate the positiveLikelihoodRatio

positivePredictiveValue()

Calculate the positivePredictiveValue

sensitivity()

Calculate the sensitivity

specificity()

Calculate the specificity

computeGridPerformance()

Computes grid performance with a specified performance function

diagnosticOddsRatio()

Calculate the diagnostic odds ratio

getCalibrationSummary()

Get a sparse summary of the calibration

getDemographicSummary()

Get a calibration per age/gender groups

getThresholdSummary()

Calculate all measures for sparse ROC

getThresholdSummary_binary()

Calculate all measures for sparse ROC when prediction is bianry classification

getPredictionDistribution()

Calculates the prediction distribution

getPredictionDistribution_binary()

Calculates the prediction distribution

Saving/loading models as json

Functions for saving or loading models as json

sklearnFromJson()

Loads sklearn python model from json

sklearnToJson()

Saves sklearn python model object to json in path

Load/save for sharing

Functions for loading/saving objects for sharing

savePlpShareable()

Save the plp result as json files and csv files for transparent sharing

loadPlpShareable()

Loads the plp result saved as json/csv files for transparent sharing

loadPrediction()

Loads the prediciton dataframe to csv

savePrediction()

Saves the prediction dataframe to RDS

Feature importance

pfi()

pfi

Other functions

predictCyclops()

Create predictive probabilities