This page describes how you can use renv
to make sure the right dependencies are installed when people run your study package. Please read the entire page, even when you’re familiar with renv
.
The renv
package offers two main benefits:
Specifying the exact package versions that must be loaded for executing a specific study. It offers functions for recording the versions of packages that are currently used, and for restoring a library of packages on the same computer (in the future) or other computers, thus greatly improving reproducibility.
Isolating the package library to a (RStudio) project. This means that different projects (e.g. different studies) can use very different package versions on the same computer without conflicting. When switching from one project to another, R will switch to the appropriate library.
At the core of renv
is the ‘renv.lock’ file that specifies all required packages. This file should be in the root folder of the project. The structure of the lock file is not very complicated, and it can easily be edited by hand if need be. An example of a full lock file can be found here. Here’s an example partial lock file, with just 3 entries:
{
"R" : {
"Version" : "4.0.3",
"Repositories" : [
{
"Name" : "CRAN",
"URL" : "https://cloud.r-project.org"
}
]
},
"Packages" : {
"rlang" : {
"Package" : "rlang",
"Version" : "0.4.11",
"Source" : "Repository",
"Repository" : "CRAN"
},
"FeatureExtraction" : {
"Package" : "FeatureExtraction",
"Version" : "3.1.1",
"Source" : "GitHub",
"RemoteType" : "github",
"RemoteHost" : "api.github.com",
"RemoteRepo" : "FeatureExtraction",
"RemoteUsername" : "ohdsi",
"RemoteRef" : "v3.1.1"
},
"Covid19EstimationHydroxychloroquine" : {
"Package" : "Covid19EstimationHydroxychloroquine",
"Version" : "0.0.1",
"Source" : "GitHub",
"RemoteType" : "github",
"RemoteHost" : "api.github.com",
"RemoteRepo" : "Covid19EstimationHydroxychloroquine",
"RemoteUsername" : "ohdsi-studies",
"RemoteRef" : "main"
}
}
}
We can see different types of entries in a lock file. All types have a name of the entry, and the following fields:
In an OHDSI lock file, the following types can be encountered:
"rlang" : {
"Package" : "rlang",
"Version" : "0.4.11",
"Source" : "Repository",
"Repository" : "CRAN"
},
Most packages, like rlang
, but also several HADES packages like DatabaseConnector
, will be available from CRAN, and will therefore have Source
equal to “Repository”, and one additional field:
"FeatureExtraction" : {
"Package" : "FeatureExtraction",
"Version" : "3.1.1",
"Source" : "GitHub",
"RemoteType" : "github",
"RemoteHost" : "api.github.com",
"RemoteRepo" : "FeatureExtraction",
"RemoteUsername" : "ohdsi",
"RemoteRef" : "v3.1.1"
},
Many HADES packages need to be installed from GitHub rather than CRAN. See the ‘Availability’ column on the Package statuses page to learn which packages are not in CRAN. For these packages, Source
is equal to “GitHub”. Other required fields are:
"Covid19EstimationHydroxychloroquine" : {
"Package" : "Covid19EstimationHydroxychloroquine",
"Version" : "0.0.1",
"Source" : "GitHub",
"RemoteType" : "github",
"RemoteHost" : "api.github.com",
"RemoteRepo" : "Covid19EstimationHydroxychloroquine",
"RemoteUsername" : "ohdsi-studies",
"RemoteRef" : "main"
}
As discussed later in this document, sometimes we would like to include an OHDSI study package in the lock file as well. These entries are identical in terms of Source
, RemoteType
, and RemoteHost
to those for HADES GitHub packages, but differ in these fields:
One could construct a lock file by hand, but this would take a lot of work. The renv
package itself provides the snapshot()
function for automatically constructing lock files, but this function only really works well for dependencies from CRAN. For OHDSI studies it is therefore recommended to use the createRenvLockFile()
function in the OhdsiRTools
package. You can install OhdsiRTools
using remotes
:
remotes::install_github("ohdsi/OhdsiRTools")
The createRenvLockFile()
function has two modes: “auto”, and “description”. The description mode is for more advanced users.
In auto mode, the createRenvLockFile()
function will leverage renv::init()
to scan all R scripts in the project folder and subfolders for references to packages, and will include those (and their dependencies). The advantage of this mode is that it automatically captures all dependencies. The disadvantage is that this may pull in too many dependencies, making for large lock files, and challenges at partner sites.
The auto mode is the default, and comes with two potentially important arguments:
includeRootPackage = TRUE
.For example, we could create a lock file for the Covid19EstimationHydroxychloroquine
study package using this command:
OhdsiRTools::createRenvLockFile(rootPackage = "Covid19EstimationHydroxychloroquine",
includeRootPackage = TRUE)
In description mode, the createRenvLockFile()
function will only include the dependencies of the study package as documented in the package’s DESCRIPTION file. The advantage of this mode is that it can lead to substantially smaller lock files. The disadvantage is that it requires the user to accurately document all dependencies in the DESCRIPTION file.
In description mode, these arguments of createRenvLockFile()
are of particular importance:
keyring
. We can specify those here.Important: The createRenvLockFile()
uses the information in the ‘DESCRIPTION’ file of your study package, so make sure this contains all dependencies. A good way to verify if all required packages are in the DESCRIPTION is by running R check on your study package. The function also assumes you have the right package versions installed, so make sure your study package runs before calling createRenvLockFile()
.
For example, we could create a lock file for the Covid19EstimationHydroxychloroquine
study package using this command:
OhdsiRTools::createRenvLockFile(mode = "description",
rootPackage = "Covid19EstimationHydroxychloroquine",
additionalRequiredPackages = "keyring",
includeRootPackage = TRUE)
The root package is your study package. If your study package is in the ohdsi-studies GitHub, it is recommended to include it in the lock file. That way, we can use renv
to automatically install that package as well. Otherwise, people will have to clone your study package repo, run renv:restore()
, and then build the study package itself.
There are two ways to use the lock file, depending on whether the root package is included in the lock file.
When the study package is included in the lock file, as described above, we can use renv
to install not only the dependencies, but the study package itself as well. This means the user will not need to clone the study package repo, and does not have to build the study package separately.
Follow these steps to install the study package and all its dependencies:
Make sure R, RTools, Java, and RStudio are properly configured as described here.
Create an empty folder or new RStudio project, and in R, use the following code to install the study package and its dependencies:
install.packages("renv")
download.file("https://raw.githubusercontent.com/ohdsi-studies/Covid19SubjectsAesiIncidenceRate/master/renv.lock", "renv.lock")
renv::init()
# When asked, choose to "Restore the project from the lockfile".
Where the URL used in download.file()
points to the lock file in the ohdsi-studies repo.
Important: the URL should point to the raw lock file contents, not the GitHub page with additional information about the file. So “https://raw.githubusercontent.com/ohdsi-studies/Covid19SubjectsAesiIncidenceRate/master/renv.lock”, not “https://github.com/ohdsi-studies/Covid19SubjectsAesiIncidenceRate/blob/master/renv.lock”.
When the study package itself is not included in the lock file, the user will need to clone the project (which requires git to be installed), and build the study package after all dependencies have been loaded.
Follow these steps to install the study package and all its dependencies:
Even though renv
tries to make sure all the versions specified in the lock file are loaded in the library, for many reasons it may be that the wrong versions are loaded at runtime. One scenario where this has happened is when the user forgets to run renv
before running the study package. To make absolutely sure the right versions are executed, it is recommended to add a function to the study package that checks all dependency versions, and execute that function as a first step in the study’s main function. Below is an example function. Note that this function depends on RJSONIO
and dplyr
, so make sure these are in your lock file as well.
verifyDependencies <- function() {
expected <- RJSONIO::fromJSON("renv.lock")
expected <- dplyr::bind_rows(expected[[2]])
basePackages <- rownames(installed.packages(priority = "base"))
expected <- expected[!expected$Package %in% basePackages, ]
observedVersions <- sapply(sapply(expected$Package, packageVersion), paste, collapse = ".")
expectedVersions <- sapply(sapply(expected$Version, numeric_version), paste, collapse = ".")
mismatchIdx <- which(observedVersions != expectedVersions)
if (length(mismatchIdx) > 0) {
lines <- sapply(mismatchIdx, function(idx) sprintf("- Package %s version %s should be %s",
expected$Package[idx],
observedVersions[idx],
expectedVersions[idx]))
message <- paste(c("Mismatch between required and installed package versions. Did you forget to run renv::restore()?",
lines),
collapse = "\n")
stop(message)
}
}
Once this function has been added to the study package it is recommended to add a call to this function to your study’s main function like this.
Once you’re done creating a lock file, and writing the instructions for how people can install your study package including its dependencies, make sure to test the instructions. Since renv
isolates the study project, you can even do this using the same machine used to create the package and lock file (just in a different folder).
If you wish to update a dependency in the lock file (like your study package), there are two options, depending on whether the version number of the package has increased:
If the version number has increased (for example the version number of your study package increased because you changed it in the DESCRIPTION file), you can simply update the version number in the lock file. See details on the lock file earlier in the document. Then, to update the package library in your R project, you can call
renv::restore(packages = c("myPackage"))
Where ‘myPackage’ is the package you’re updating.
If the package has changed, but the version number hasn’t (for example because you just made a minor change to your study package) you should be aware that renv
maintains a package cache. That means that, even if you were to delete the entire project library, renv
would still install the old version of the package from the cache, rather than installing it from its source.
To completely purge a package from the renv
cache, and install it again, make sure to restart R (so the package isn’t locked), and use:
# Remove the package from the library:
remove.packages("myPackage")
# Purge the package from the renv cache:
renv::purge("myPackage")
# Restore the new version of the package from its source:
renv::restore(packages = c("myPackage"))
Where ‘myPackage’ is the package you’re updating.