vignettes/WritingHydraConfigs.Rmd
WritingHydraConfigs.Rmd
This vignette describes how developers of package skeletons can write Hydra configuration files. Hydra configuration files tell Hydra how to hydrate the skeleton according to a study specifications object. The Hydra configuration file should be embedded in the skeleton zip file.
The study specifications are generated by some external editor, such as ATLAS. Study specifications will be in JSON format. The specifications will define every aspect of the study, such as the cohorts to use as exposures and outcomes, and what covariates to include. Here are the first few lines of an example study specification:
{
"id": 1,
"version": "v0.9.0",
"name": "Study of some cohorts of interest",
"packageName": "ASimpleStudy",
"skeletonType": "SimpleExampleStudy",
"skeletonVersion": "v0.0.1",
"createdBy": "schuemie@ohdsi.org",
"createdDate": "2018-03-09T18:25:43.511Z",
"modifiedBy": "schuemie@ohdsi.org",
"modifiedDate": "2018-04-13T18:25:43.511Z",
"cohortDefinitions": [{
"id": 1,
A package skeleton is an R package for fully executing a specific
type of study, such as a new-user cohort study or a predictive modeling
study. The skeleton has placeholders for study elements such as those
specified in the study specifications. Package skeletons are provided as
zip files, and are embedded inside Hydra (see the
inst/skeletons
folder) .
By hydration we mean the process by which the package skeleton is configured according to the study specification to become a fully executable study package. A study package will perform all tasks necessary to execute the study at a site, including creation of the cohorts needed in the study.
The hydration configuration is a JSON file. Below is an example configuration:
{
"skeletonType": "SimpleExampleStudy",
"skeletonVersion": "v1.0.0",
"requiredHydraVersion": "v0.0.1",
"actions":[{
"type": "fileNameFindAndReplace",
"input": "packageName",
"find": "SimpleExampleStudy"
},{
"type": "stringFindAndReplace",
"input": "packageName",
"find": "SimpleExampleStudy"
},{
"type": "jsonArrayToCsv",
"input": "cohortDefinitions",
"mapping": [{"source": "id", "target": "cohortId"},
{"source": "id", "target": "atlasId"},
{"source": "name", "target": "name", "modifiers": ["convertToFileName"]}],
"output": "inst/settings/CohortsToCreate.csv"
},{
"type": "jsonArrayToJson",
"input": "cohortDefinitions",
"fileName": "name",
"payload": "expression",
"output": "inst/cohorts"
},{
"type": "jsonArrayToSql",
"input": "cohortDefinitions",
"fileName": "name",
"payload": "expression",
"output": "inst/sql/sql_server"
}]
}
The configuration consists of some meta data such as “skeletonType”, followed by an array of actions. In this example, there are 5 actions that will be executed by Hydra in sequence after unzipping the skeleton:
SimpleExampleStudy.*
, and rename
them to the name specified in the “packageName” attribute in the study
specifications (while keeping the same extention).SimpleExampleStudy
in
all files in the package folder, and replace it with the string
specified in the “packageName” attribute in the study
specifications.inst/settings/CohortsToCreate.csv
, and populate it with the
elements from the cohortDefinitions
array in the study
specifications.inst/cohorts
folder, one for
each element in the cohortDefinitions
array in the study
specifications.inst/sql/sql_server
folder, one
for each element in the cohortDefinitions
array in the
study specifications. The SQL is generated by Hydra using Circe.Each action can be made conditional by adding a condition
field. For example, in this configuration the jsonToArgs
action is only executed if the doPositiveControlSynthesis
field in the study specifications resolves to the value ‘true’ (or
’1).
{
"type": "jsonToRargs",
"input": "positiveControlSynthesisArgs",
"condition": "doPositiveControlSynthesis",
"file": "R/SynthesizePositiveControls.R",
"startTag": "# Start positiveControlSynthesisArgs",
"endTag": "# End positiveControlSynthesisArgs"
}
Boolean logic is also permitted in condition fields. Here are some more examples of conditions:
"(mainSettings.subsetting.type == 'foo') & bar != 3"
"fooOption IN ('bar', 'pie', 'sky')"
"(foo == 1) | (bar == 2)"
Below all available action types and their arguments are described.
Whenever an element in the study specification is referenced, nested
elements can be accessed by using a dot (‘.’). For example,
“estimationAnalysisSettings.analysisSpecification” returns the
analysisSpecification
element of the
estimationAnalysisSettings
root element.
Finds all files with the given name, and renames them to a target name (while maintaining the file extension).
Arguments:
Finds all mentioned of a string in all files in the package, and replaces them with a target string.
*.jar
excludes all jar files.Convert an array in the study specifications to a CSV file in the package, where each element of the array will generate one row in the CSV file.
Convert an array in the study specifications to a set of JSON files in the package, where each element of the array will generate one JSON file.
Convert an array in the study specifications to a set of SQL files in the package, where each element of the array will generate one SQL file. SQL files are automatically generated using Circe.
Convert a single element in the study specifications to a SQL file in the package using Circe.
Convert a single element in the study specifications to a JSON file in the package.
Convert an element in the study specifications to R arguments.
For example, imagine the study specifications contains this element:
"args" : {
"foo": "Hello",
"bar": {
"x": 123,
"y": 456
}
}
And the Hydra configuration specifies:
{
"type": "jsonToRargs",
"input": "args",
"file": "R/fooBar.R",
"startTag": "# Start fooBar",
"endTag": "# End fooBar",
"argumentFunctions": [{"source": "bar", "function": "createBar"}]
}
If the original file fooBar.R in the skeleton contains:
doFooBar(
# Start fooBar
foo = "Bye",
bar = NULL
# End fooBar
)
The hydrated fooBar.R will become:
doFooBar(
foo = "Hello",
bar = createBar(x = 123,
y = 456)
)