This page describes how you can use renv
to make sure
the right dependencies are installed when people run your study package.
Please read the entire page, even when you’re familiar with
renv
.
The renv
package offers two main benefits:
Specifying the exact package versions that must be loaded for executing a specific study. It offers functions for recording the versions of packages that are currently used, and for restoring a library of packages on the same computer (in the future) or other computers, thus greatly improving reproducibility.
Isolating the package library to a (RStudio) project. This means that different projects (e.g. different studies) can use very different package versions on the same computer without conflicting. When switching from one project to another, R will switch to the appropriate library.
At the core of renv
is the ‘renv.lock’ file that
specifies all required packages. This file should be in the root folder
of the project. The structure of the lock file is not very complicated,
and it can easily be edited by hand if need be. An example of a full
lock file can be found here.
Here’s an example partial lock file, with just 3 entries:
{
"R" : {
"Version" : "4.0.3",
"Repositories" : [
{
"Name" : "CRAN",
"URL" : "https://cloud.r-project.org"
}
]
},
"Packages" : {
"rlang" : {
"Package" : "rlang",
"Version" : "0.4.11",
"Source" : "Repository",
"Repository" : "CRAN"
},
"FeatureExtraction" : {
"Package" : "FeatureExtraction",
"Version" : "3.1.1",
"Source" : "GitHub",
"RemoteType" : "github",
"RemoteHost" : "api.github.com",
"RemoteRepo" : "FeatureExtraction",
"RemoteUsername" : "ohdsi",
"RemoteRef" : "v3.1.1"
},
"Covid19EstimationHydroxychloroquine" : {
"Package" : "Covid19EstimationHydroxychloroquine",
"Version" : "0.0.1",
"Source" : "GitHub",
"RemoteType" : "github",
"RemoteHost" : "api.github.com",
"RemoteRepo" : "Covid19EstimationHydroxychloroquine",
"RemoteUsername" : "ohdsi-studies",
"RemoteRef" : "main"
}
}
}
We can see different types of entries in a lock file. All types have a name of the entry, and the following fields:
In an OHDSI lock file, the following types can be encountered:
"rlang" : {
"Package" : "rlang",
"Version" : "0.4.11",
"Source" : "Repository",
"Repository" : "CRAN"
},
Most packages, like rlang
, but also several HADES
packages like DatabaseConnector
, will be available from
CRAN, and will therefore have Source
equal to “Repository”,
and one additional field:
"FeatureExtraction" : {
"Package" : "FeatureExtraction",
"Version" : "3.1.1",
"Source" : "GitHub",
"RemoteType" : "github",
"RemoteHost" : "api.github.com",
"RemoteRepo" : "FeatureExtraction",
"RemoteUsername" : "ohdsi",
"RemoteRef" : "v3.1.1"
},
Many HADES packages need to be installed from GitHub rather than
CRAN. See the ‘Availability’ column on the Package statuses page to learn which
packages are not in CRAN. For these packages, Source
is
equal to “GitHub”. Other required fields are:
"Covid19EstimationHydroxychloroquine" : {
"Package" : "Covid19EstimationHydroxychloroquine",
"Version" : "0.0.1",
"Source" : "GitHub",
"RemoteType" : "github",
"RemoteHost" : "api.github.com",
"RemoteRepo" : "Covid19EstimationHydroxychloroquine",
"RemoteUsername" : "ohdsi-studies",
"RemoteRef" : "main"
}
As discussed later in this document, sometimes we would like to
include an OHDSI study package in the lock file as well. These entries
are identical in terms of Source
, RemoteType
,
and RemoteHost
to those for HADES GitHub packages, but
differ in these fields:
One could construct a lock file by hand, but this would take a lot of
work. The renv
package itself provides the
snapshot()
function for automatically constructing lock
files, but this function only really works well for dependencies from
CRAN. For OHDSI studies it is therefore recommended to use the createRenvLockFile()
function in the OhdsiRTools
package. You can install
OhdsiRTools
using remotes
:
remotes::install_github("ohdsi/OhdsiRTools")
The createRenvLockFile()
function has two modes: “auto”,
and “description”. The description mode is for more advanced users.
In auto mode, the createRenvLockFile()
function will
leverage renv::init()
to scan all R scripts in the project
folder and subfolders for references to packages, and will include those
(and their dependencies). The advantage of this mode is that it
automatically captures all dependencies. The disadvantage is that this
may pull in too many dependencies, making for large lock files, and
challenges at partner sites.
The auto mode is the default, and comes with two potentially important arguments:
includeRootPackage = TRUE
.For example, we could create a lock file for the
Covid19EstimationHydroxychloroquine
study package using
this command:
OhdsiRTools::createRenvLockFile(rootPackage = "Covid19EstimationHydroxychloroquine",
includeRootPackage = TRUE)
In description mode, the createRenvLockFile()
function
will only include the dependencies of the study package as documented in
the package’s DESCRIPTION file. The advantage of this mode is that it
can lead to substantially smaller lock files. The disadvantage is that
it requires the user to accurately document all dependencies in the
DESCRIPTION file.
In description mode, these arguments of
createRenvLockFile()
are of particular importance:
keyring
. We
can specify those here.Important: The createRenvLockFile()
uses the information in the ‘DESCRIPTION’ file of your study package, so
make sure this contains all dependencies. A good way to verify if all
required packages are in the DESCRIPTION is by running R check on your
study package. The function also assumes you have the right package
versions installed, so make sure your study package runs before calling
createRenvLockFile()
.
For example, we could create a lock file for the
Covid19EstimationHydroxychloroquine
study package using
this command:
OhdsiRTools::createRenvLockFile(mode = "description",
rootPackage = "Covid19EstimationHydroxychloroquine",
additionalRequiredPackages = "keyring",
includeRootPackage = TRUE)
The root package is your study package. If your study package is in
the ohdsi-studies GitHub, it is recommended to include it in the lock
file. That way, we can use renv
to automatically install
that package as well. Otherwise, people will have to clone your study
package repo, run renv:restore()
, and then build the study
package itself.
There are two ways to use the lock file, depending on whether the root package is included in the lock file.
When the study package is included in the lock file, as described
above, we can use renv
to install not only the
dependencies, but the study package itself as well. This means the user
will not need to clone the study package repo, and does not have to
build the study package separately.
Follow these steps to install the study package and all its dependencies:
Make sure R, RTools, Java, and RStudio are properly configured as described here.
Create an empty folder or new RStudio project, and in R, use the following code to install the study package and its dependencies:
install.packages("renv")
download.file("https://raw.githubusercontent.com/ohdsi-studies/Covid19SubjectsAesiIncidenceRate/master/renv.lock", "renv.lock")
renv::init()
# When asked, choose to "Restore the project from the lockfile".
Where the URL used in download.file()
points to the lock
file in the ohdsi-studies repo.
Important: the URL should point to the raw lock file contents, not the GitHub page with additional information about the file. So “https://raw.githubusercontent.com/ohdsi-studies/Covid19SubjectsAesiIncidenceRate/master/renv.lock”, not “https://github.com/ohdsi-studies/Covid19SubjectsAesiIncidenceRate/blob/master/renv.lock”.
When the study package itself is not included in the lock file, the user will need to clone the project (which requires git to be installed), and build the study package after all dependencies have been loaded.
Follow these steps to install the study package and all its dependencies:
Even though renv
tries to make sure all the versions
specified in the lock file are loaded in the library, for many reasons
it may be that the wrong versions are loaded at runtime. One scenario
where this has happened is when the user forgets to run
renv
before running the study package. To make absolutely
sure the right versions are executed, it is recommended to add a
function to the study package that checks all dependency versions, and
execute that function as a first step in the study’s main
function. Below is an example function. Note that this function
depends on RJSONIO
and dplyr
, so make sure
these are in your lock file as well.
verifyDependencies <- function() {
expected <- RJSONIO::fromJSON("renv.lock")
expected <- dplyr::bind_rows(expected[[2]])
basePackages <- rownames(installed.packages(priority = "base"))
expected <- expected[!expected$Package %in% basePackages, ]
observedVersions <- sapply(sapply(expected$Package, packageVersion), paste, collapse = ".")
expectedVersions <- sapply(sapply(expected$Version, numeric_version), paste, collapse = ".")
mismatchIdx <- which(observedVersions != expectedVersions)
if (length(mismatchIdx) > 0) {
lines <- sapply(mismatchIdx, function(idx) sprintf("- Package %s version %s should be %s",
expected$Package[idx],
observedVersions[idx],
expectedVersions[idx]))
message <- paste(c("Mismatch between required and installed package versions. Did you forget to run renv::restore()?",
lines),
collapse = "\n")
stop(message)
}
}
Once this function has been added to the study package it is recommended to add a call to this function to your study’s main function like this.
Once you’re done creating a lock file, and writing the instructions
for how people can install your study package including its
dependencies, make sure to test the instructions. Since
renv
isolates the study project, you can even do this using
the same machine used to create the package and lock file (just in a
different folder).
If you wish to update a dependency in the lock file (like your study package), there are two options, depending on whether the version number of the package has increased:
If the version number has increased (for example the version number of your study package increased because you changed it in the DESCRIPTION file), you can simply update the version number in the lock file. See details on the lock file earlier in the document. Then, to update the package library in your R project, you can call
renv::restore(packages = c("myPackage"))
Where ‘myPackage’ is the package you’re updating.
If the package has changed, but the version number hasn’t (for
example because you just made a minor change to your study package) you
should be aware that renv
maintains a package cache. That
means that, even if you were to delete the entire project library,
renv
would still install the old version of the package
from the cache, rather than installing it from its source.
To completely purge a package from the renv
cache, and
install it again, make sure to restart R (so the package isn’t locked),
and use:
# Remove the package from the library:
remove.packages("myPackage")
# Purge the package from the renv cache:
renv::purge("myPackage")
# Restore the new version of the package from its source:
renv::restore(packages = c("myPackage"))
Where ‘myPackage’ is the package you’re updating.