vignettes/CreatingModules.Rmd
CreatingModules.Rmd
This document aims to document the steps necessary to create analytic module that is compatible with Strategus. Please treat this document as a work in progress as Strategus is still under development.
A Strategus analytic module is an R Project that uses renv. NOTE: Please make sure you are using renv > 1.0.0 when creating a new analytic module to make sure it is compatible with Strategus.
A Strategus module will contain the following files:
[01;34mmodule_skeleton
[0m
+--
[01;34mextras
[0m
| +--
[32mCreateJobContextForTesting.R
[0m
| \--
[32mModuleMaintenance.R
[0m
+--
[32mMain.R
[0m
+-- MetaData.json
+-- NEWS.md
+-- README.md
+--
[01;34mrenv
[0m
| +--
[32mactivate.R
[0m
| +-- settings.dcf
| \-- settings.json
+-- renv.lock
+--
[01;34mresources
[0m
| +-- exampleAnalysisSpecifications.json
| \-- resultsDataModelSpecification.csv
+--
[32mSettingsFunctions.R
[0m
+--
[01;34mtests
[0m
| +--
[32mtest-eunomia.R
[0m
| +-- testJobContext.rds
| \--
[32mtestScript.R
[0m
\-- YourProjectModule.Rproj
Here we will detail how each file is used by Strategus and what is required in the contents of the file.
renv
This is the R Project (.Rproj) file for your module and should end in
“Module”. You may create this as a standard R Project via RStudio. Once
the project is created, please use renv::init()
to set up
the renv
folder as shown above. This will create the
necessary .RProfile
in the root of your project and the
renv
subfolder with the necessary R code for
renv
’s operations.
This is a standard NEWS markdown file that is a change log for your module. See this post for more information.
MetaData.json holds information that describes the module and its dependencies:
{
"Name": "CohortGeneratorModule",
"Version": "0.0.1",
"Dependencies": ["SomePrerequisiteModule"],
"TablePrefix": "cg_"
}
To detail the contents of the JSON file:
[]
.resultsDataModelSpecification.csv
.
Please see Main.R for more information on how this
value is used.This file holds the main executable for your module. This file must
contain a function called execute(jobContext)
.
execute <- function(jobContext) {
# VALIDATE THE jobContext
rlang::inform("Validating inputs")
# YOUR VALIDATION CODE GOES HERE....
# EXECUTE THE ANALYTICS
rlang::inform("Executing")
# YOUR EXECUTION CODE GOES HERE....
# ASSEMBLE AND .ZIP THE RESULTS
rlang::inform("Export data")
# YOUR CODE GOES HERE....
}
As shown in the code above, your execute(jobContext)
should handle: validating the jobContext
object to ensure
it has all of the information necessary for your code to function, a
section to execute the analytics and finally code to assemble the
output. Here we will describe the requirements for the way in which your
module must output its results:
writeCsv()
which will ensure your output is formatted
properly. For more information, please see: https://ohdsi.github.io/CohortGenerator/reference/writeCsv.html.
IMPORTANT: File names must correspond to the
table names that are specified in the
resultsModuleSpecification.csv
.resultsModuleSpecification.csv
in your output directory.
The format of this file is as follows:table_name,column_name,data_type,is_required,primary_key,empty_is_na
my_table,cohort_id,bigint,Yes,Yes,No
my_table,cohort_name,varchar,Yes,No,No
my_table,generation_status,varchar,No,No,No
my_table,start_time,Timestamp,No,No,No
my_table,end_time,Timestamp,No,No,No
my_table,database_id,varchar,Yes,Yes,No
The resultsModuleSpecification.csv
has the following
columns:
This file contains one or more functions required to create the module settings for use in Strategus. We plan to later remove this requirement when we can describe the module specification using the OpenAPI 3.0 Specification. For now, your module should contain a function similar to the following:
createCohortGeneratorModuleSpecifications <- function(incremental = TRUE,
generateStats = TRUE) {
analysis <- list()
for (name in names(formals(createCohortGeneratorModuleSpecifications))) {
analysis[[name]] <- get(name)
}
checkmate::assert_file_exists("MetaData.json")
moduleInfo <- ParallelLogger::loadSettingsFromJson("MetaData.json")
specifications <- list(module = moduleInfo$Name,
version = moduleInfo$Version,
remoteRepo = "github.com",
remoteUsername = "ohdsi",
settings = analysis)
class(specifications) <- c("CohortGeneratorModuleSpecifications", "ModuleSpecifications")
return(specifications)
}
As shown above, this example comes from the
CohortGeneratorModule
and the function name reflects the
fact that the function will create the settings used to dictate the
behavior of the module. The parameters of the function will differ based
on the requirements of your module - if there are choices to be made
when running your module, you should include these as parameters to your
module specification function.
Internal to the function above, the formal parameters to the function
are used to construct a list()
named analysis
which holds the analysis settings. Next the MetaData.json
file is used to obtain the module name/version for inclusion in the
specifications
list. The specifications
list
contains the remoteRepo
and remoteUsername
properties to indicate where your module is stored on GitHub. Finally,
we set the class()
of the specifications
object to
c("CohortGeneratorModuleSpecifications", "ModuleSpecifications")
.
For your module, you will want to substitute
"CohortGeneratorModuleSpecifications"
for the name of your
module and retain the "ModuleSpecifications"
in the
vector.
The following JSON fragment shows how the output of
createCohortGeneratorModuleSpecifications()
is used in the
moduleSpecifications
section of the overall analysis
settings JSON for Strategus:
{
"sharedResources": [
{
"cohortDefinitions": [
{
"cohortId": "1",
"cohortName": "celecoxib",
"cohortDefinition": "...truncated..."
}
],
"attr_class": ["CohortDefinitionSharedResources", "SharedResources"]
}
],
"moduleSpecifications": [
{
"module": "CohortGeneratorModule",
"version": "0.0.1",
"remoteRepo": "github.com",
"remoteUsername": "ohdsi",
"settings": {
"incremental": true,
"generateStats": true
},
"attr_class": ["CohortGeneratorModuleSpecifications", "ModuleSpecifications"]
}
],
"attr_class": "AnalysisSpecifications"
}
Each module will make use of renv to
capture its R package dependencies. Furthermore, Strategus will make use
of the renv
settings in your module to create a run-time
environment when executing your module to ensure all of the necessary
dependencies are available to your module.
It is recommended to use the HADES-wide renv.lock file which is
available at https://github.com/OHDSI/Hades/blob/main/hadesWideReleases.
Find the most recent release based on the folder name and copy the
renv.lock
file into the root of your module project.
If you need to install additional dependencies for your project, use
renv::record()
to record it in the lock file.
It is advisable to add an extras
folder to your project
to include other useful files for managing and testing your module.
We’ll detail those files here:
This file is used to store utility functions for your module, such as the code mentioned earlier for generating the renv.lock file. Here is an example of the contents of ModuleMaintenance.R:
# Format and check code:
styler::style_dir()
OhdsiRTools::updateCopyrightYearFolder()
OhdsiRTools::findNonAsciiStringsInFolder()
devtools::spell_check()
# Generate renv lock file and activate renv:
OhdsiRTools::createRenvLockFile(
rootPackage = "CohortGenerator",
includeRootPackage = TRUE,
mode = "description",
additionalRequiredPackages = c("checkmate", "CirceR")
)
renv::init()
The following file is used to create a test jobContext for use in the
execute(jobContext)
as described in Main.R:
https://github.com/OHDSI/CohortGeneratorModule/blob/main/extras/test/CreateJobContext.R
And the following file is used to create a test harness for running your module:
https://github.com/OHDSI/CohortGeneratorModule/blob/main/extras/test/TestModuleStandalone.R