• Clean up dependencies, tibble removed and IHT and ParallelLogger from CRAN
  • Use cohortIds for cohortCovariates to comply with FeatureExtraction
  • Add cdmDatabaseName from DatabaseDetails to model output
  • Fix bug when attributes weren’t preserved on trainData$covariateData after split
  • Fix warnings in tests and speed them up
  • Fix bug in assignment operator in configurePython
  • Delay evaluation of plpData when using do.call like in learningCurves and runMultiplePlp
  • Speed up population generation when subjectId’s are distinct
  • Fix bug when population was still generated when provided to runPlp
  • fix bug with ohdsi shiny modules version check (issue 415)
  • Fix sklearnToJson to be compatible with scikit-learn>=1.3
  • Fix github actions so it’s not hardcoded to use python 3.7
  • added spline feature engineering
  • added age/sex stratified imputation feature engineering
  • changed result table execution date types to varchar
  • updated covariateSummary to use feature engineering
  • fixed bug introduced with new reticulate update in model saving to json tests
  • fixed bug with database insert if result is incomplete
  • updated/fixed documentation (Egill)
  • added model path to models (Henrik)
  • updated hyper-parameter saving to data.frame and made consistent
  • fixed bug with multiple covariate settings in diagnose plp
  • added min cell count when exporting database results to csv files
  • light GBM added (thanks Jin Choi and Chungsoo Kim)
  • fixed minor bugs when uploading results to database
  • added ensure_installed(“ResultModelManager”) to getDataMigrator()
  • shiny app is now using ShinyAppBuilder with a config saved in the /inst folder
  • fixed bugs introduced when sklearn inputs changed
  • added sklearn model being saved as jsons
  • made changes around the DatabaseConnection get table names function to make it work for the updated DatabaseConnection
  • removed check RAM stop (now it just warns)
  • Updated test to skip test for FE setting if the model does not fit (this was causing occasional test fail)
  • replaced .data$ with “” for all dplyr::select to remove warnings
  • Fix bug with python type being required to be int
  • Allow priorType to be passed down to getCV function in case prior is not ‘laplace’
  • Seed specified in Cyclops model wasn’t passed to Cyclops
  • fixed issue with shiny viewer converting connection details to large json
  • added check for cdmDatabaseId into createDatabaseDetails
  • added test for check for cdmDatabaseId into createDatabaseDetails to error when NULL
  • removed session$onSessionEnded(shiny::stopApp) from shiny server
  • fixing cox predictions
  • forcing cdmDatabaseId to be a string if integer is input
  • replaced utils::read.csv with readr::read_csv when inserting results from csv
  • replaced gsub with sub when inserting csvs to database
  • saved result specification csv in windows to fix odd formating issue
  • fixed sample data bugs
  • updated to use v1.0.0 of OhdsiShinyModules
  • updated plp database result tables to use the same structure for cohort and database as other HADES packages
  • added function to insert csv results into plp database result tables
  • added input for databaseId (database and version) when extracting data to be consistent with other HADES packages. This is saved in plp objects.
  • fixed issue with ‘preprocess’ vs ‘preprocessing’ inconsistently used across models
  • added metaData tracking for feature engineering or preprocessing when predicting
  • fixed issue with FE using trainData$covariateData metaData rather than trainData
  • fixed bug when using sameData for FE
  • pulled in multiple bug fixes and test improvements from Egill
  • pulled in fix for learning curves from Henrik
  • Pulled in fix for feature engineering from Solomon
  • Cleaned check messages about comparing class(x) with a string by changing to inherits()
  • removed json saving for sklearn models since sklearn-json is no longer working for the latest sklearn
  • renamed the input corresponding to the string that gets appended to the results table names to tablePrefix
  • fixed issues with system.file() from SqlRender code breaking the tests
  • added an input fileAppend to the function that exports the database tables to csv files
  • moved the plp model (including preprocessing details) outside of the result database (into a specified folder) due to the size of the objects (too large to insert into the database).
  • added saving of plp models into the result database
  • added default cohortDefinitions in runMultiplePlp
  • added modelType to all models for database upload
  • moved FeatureExtraction to depends
  • fixed using inherits()
  • moved most of the shiny app code into OhdsiShinyModules
  • removed shiny dependencies and added OhdsiShinyModules to suggests
  • fixed bug with linux sklearn saving
  • replaced cohortId to targetId for consistency throughout code
  • replaced targetId in model design to cohortId for consistency throughout code
  • replaced plpDataSettings to restrictPlpDataSettings to improve naming consistency
  • added ability to use initial population in runPlp by adding the population to plpData$population
  • added splitSettings into modelDesign
  • replaced saving json settings with ParallelLogger function
  • updated database result schema (removed researcher_id from tables - if desired a new table with the setting_ids and researcher_id could be added, removed study tables and revised results table to performances table with a reference to model_design_id and development_database_id to enable validation results without a model to be inserted)
  • added diagnostic code based on PROBAST
  • added diagnostic shiny module
  • added code to create sqlite database and populate in uploadToDatabase
  • add code to convert runPlp+val to sqlite database when viewing shiny
  • added code to extract database results into csv files: extractDatabaseToCsv()
  • pulled in GBM update (default hyper-parameters and variable importance fix) work done by Egill (egillax)
  • updated installation documents
  • added tryCatch around plots to prevent code stopping
  • updated result schema (added model_design table with settings and added attrition table)
  • updated shiny app for new database result schema
  • removed C++ code for AUC and Rcpp dependency, now using pROC instead as faster
  • made covariate summary optional when externally validating
  • updated json structure for specifying study design (made it friendlier to read)
  • includes smooth calibration plot fix - work done by Alex (rekkasa)
  • fixed bug with multiple sample methods or feature engineering settings causing invalid error
  • plpModel now saved as json files when possible
  • Updated runPlp to make more modular
  • now possible to customise data splitting, feature engineering, sampling (over/under) and learning algorithm
  • added function for extracting cohort covariates
  • updated evalaution to evaluate per strata (evaluation column)
  • updated plpModel structure
  • updated runPlp structure
  • updated shiny and package to use tidyr and not reshape2
  • sklearn learning algorithms share the same fit function
  • r learning algorithms share the same fit function
  • interface to cyclops code revised
  • ensemble learning removed (will be in separate package)
  • deep learning removed (will be in DeepPatientLevelPrediction package)
  • revised toSparseM() to do conversion in one go but check RAM availablility beforehand.
  • removed temporal plpData conversion in toSparseM (this will be done in DeepPatientLevelPrediction)
  • shiny can now read csv results
  • objects loaded via loadPlpFromCsv() can be saved using savePlpResult()
  • added database result storage
  • added interface to database results in shiny
  • merged in shinyRepo that changed the shiny app to make it modular and added new features
  • removed deep learning as this is being added into new OHDSI package DeepPatientLevelPrediction
  • save xgboost model as json file for transparency
  • set connectionDetails to NULL in getPlpData
  • updated andromeda functions - restrict to pop and tidy covs for speed
  • quick fix for GBM survival predicting negative values
  • fixed occasional demoSum error for survival models
  • updated index creation to use Andromeda function
  • fixed bug when normalize data is false
  • fixed bugs when single feature (gbm + python)
  • updated GBM
  • updated calibration slope
  • fixed missing age/gender in prediction
  • fixed shiny intercept bug
  • fixed diagnostic
  • fixed missing covariateSettings in load cvs plp
  • Removed plpData from evaluation
  • Added recalibration into externalVal
  • Updated shiny app for recalibration
  • Added population creation setting to use cohortEndDate as timeAtRisk end
  • fixed tests
  • Reduced imports by adding code to install some dependencies when used
  • fixed csv result saving bug when no model param
  • fixed r check vignette issues
  • added conda install to test
  • finalised permutation feature importance
  • fixed deepNN index issue (reported on github - thanks dapritchard)
  • add compression to python pickles
  • removed requirement to have outcomeCount for prediction with python models
  • cleaned all checks
  • fixed bug in python toSparseMatrix
  • fixed warning in studyPop
  • fixed bug (identified by Chungsoo) in covariateSummary
  • fixed bug with thresholdSummary
  • edited threshold summary function to make it cleaner
  • added to ensemble where you can combine multiple models into an ensemble
  • cleaned up the notes and tests
  • updated simulated data covariateId in tests to use integer64
  • fixed description imports (and sorted them)
  • fixed Cox model calibration plots
  • fixed int64 conversion bug
  • added baseline risk to Cox model
  • updated shiny: added attrition and hyper-parameter grid search into settings
  • updated shiny app added 95% CI to AUC in summary, size is now complete data size and there is a column valPercent that tells what percentage of the data were used for validation
  • updated GBMsurvival to use survival metrics and c-stat
  • added survival metrics
  • added updates and fixes into master from development branch
  • fixed bug with pdw data extraction due to multiple person_id columns
  • fixed bug in shiny app converting covariate values due to tibble
  • added calibration updates: cal-in-large, weak cal
  • updated smooth cal plot (sample for speed in big data)
  • defaulted to 100 values in calibrationSummary + updated cal plot
  • fixed backwards compat with normalization
  • fixed python joblib dependancy
  • fixed bug in preprocessing
  • added cross validation aucs to LR, GBM, RF and MLP
  • added more settings into MLP
  • added threads option in LR
  • fixed minor bug with shiny dependency
  • fixed some tests
  • added standardizedMeanDiff to covariatesummary
  • updated createStudyPopulation to make it cleaner to read and count outcome per TAR
  • Andromeda replaced ff data objects
  • added age/gender into cohort
  • fixed python warnings
  • updated shiny plp viewer
  • Fixed bug when running multiple analyses using a data extraction sample with multiple covariate settings
  • improved shiny PLP viewer
  • added diagnostic shiny viewer
  • updated external validate code to enable custom covariates using ATLAS cohorts
  • fixed issues with startAnchor and endAnchor
  • Deprecating addExposureDaysToStart and addExposureDaysToEnd arguments in createStudyPopulation, adding new arguments called startAnchor and endAnchor. The hope is this is less confusing.
  • fixed transfer learning code (can now transfer or fine-tune model)
  • made view plp shiny apps work when some results are missing
  • set up testing
  • fixed build warnings
  • added tests to get >70% coverage (keras tests too slow for travis)
  • Fixed minor bugs
  • Fixed deep learning code and removed pythonInR dependancy
  • combined shiny into one file with one interface
  • added recalibration using 25% sample in existing models
  • added option to provide score to probabilities for existing models
  • fixed warnings with some plots

Small bug fixes: - added analysisId into model saving/loading - made external validation saving recursive - added removal of patients with negative TAR when creating population - added option to apply model without preprocessing settings (make them NULL) - updated create study population to remove patients with negative time-at-risk

Changes: - merged in bug fix from Martijn - fixed AUC bug causing crash with big data - update SQL code to be compatible with v6.0 OMOP CDM - added save option to external validate PLP

Changes: - Updated splitting functions to include a splitby subject and renamed personSplitter to randomSplitter - Cast indices to integer in python functions to fix bug with non integer sparse matrix indices

Changes: - Added GLM status to log (will now inform about any fitting issue in log) - Added GBM survival model (still under development) - Added RF quantile regression (still under development) - Updated viewMultiplePlp() to match PLP skeleton package app - Updated single plp vignette with additional example - Merge in deep learning updates from Chan

Changes: - Updated website

Changes: - Added more tests - test files now match R files

Changes: - Fixed ensemble stacker

Changes: - Using reticulate for python interface - Speed improvements - Bug fixes