(Adapted from https://style.tidyverse.org/)
The styler
package is highly recommended for automatically applying some (but not
all) of the style recommendations here. styler
is available
as a stand-alone R package, but also comes with a handy RStudio
add-in.
We use camelCase in R. Function and variable names all start with lowercase. Package names start with uppercase.
Examples:
cohortData <- loadCohortData("myFolder")
SqlRender
packageFunction names typically start with a verb. Variable names are typically nouns. Do not encode the data type in the variable names. Also, everything is data, so no need to say that unless unavoidable.
Good
fitOutcomeModel
function.computeCovariateBalance
function.population
argument.Bad
sampling
as variable name (not a noun)namesVector
, covariatesDf
(encodes the
data type)getResultData
(everything is data)Place spaces around all infix operators (=
,
+
, -
, <-
, etc.). The same rule
applies when using =
in function calls. Always put a space
after a comma, and never before (just like in regular English).
Good
average <- mean(feet / 12 + inches, na.rm = TRUE)
Bad
average<-mean(feet/12+inches,na.rm=TRUE)
There’s a small exception to this rule: :
,
::
and :::
don’t need spaces around them.
Good
x <- 1:10
base::get
Bad
x <- 1 : 10
base :: get
Place a space before left parentheses, except in a function call.
Good
if (debug) {
do(x)
}
plot(x, y)
Bad
if(debug){
do(x)
}
plot (x, y)
Extra spacing (i.e., more than one space in a row) is ok if it
improves alignment of equal signs or assignments
(<-
).
Do not place spaces around code in parentheses or square brackets (unless there’s a comma, in which case see above).
Good
if (debug) {
do(x)
}
diamonds[5, ]
Bad
if ( debug ) { # No spaces around debug
do(x)
}
x[1,] # Needs a space after the comma
x[1 ,] # Space goes after comma not beforeCurly braces
An opening curly brace should never go on its own line and should always be followed by a new line. A closing curly brace should always go on its own line, unless it’s followed by else.
Always indent the code inside curly braces. It’s ok to leave very short statements on the same line:
if (y < 0 && debug) {
message("Y is negative")
}
Strive to limit your code to 100 characters per line. This fits comfortably on a printed page with a reasonably sized font. If you find yourself running out of room, this is a good indication that you should encapsulate some of the work in a separate function.
When indenting your code, use tabs. Never use spaces or mix tabs and spaces.
Hint: In RStudio you can use ctrl-i to automatically indent the code for you.
Use <-, not =, for assignment.
Good
x <- 5
Bad
x = 5
If-then-else clauses should always use curly brackets, even if there’s only one clause and it’s one statement.
Good
if (a == b) {
doSomething()
}
Bad
if (a == b) doSomething()
When calling a function that has more than one argument, make sure to refer to each argument by name instead of relying on the order of arguments.
Good
translateSql(sql = "COMMIT", targetDialect = "PDW")
Bad
translateSql("COMMIT", "PDW")
Comment your code only where the intent is not immediately obvious.
Each line of a comment should begin with the comment symbol and a single
space: #
. Comments should explain the why, not the
what.
Use commented lines of -
to break up your file into
easily readable chunks, for example:
## Load data ---------------------------
x <- readRDS("data.rds")
## Plot data ---------------------------
plot(x)
Opening curly brackets should precede a new line. A closing curly
bracket should be followed by a new line except when it is followed by
else
or a closing parenthesis.
Good
if (a == b) {
doSomething()
} else {
doSomethingElse()
}
Bad
if (a == b)
{
doSomething()
}
else
{
doSomethingElse()
}
Pipes should always be at the end of the line.
Good
foo %>%
filter(x > 0) %>%
group_by(y) %>%
summarize(total = sum(x))
Bad
foo %>% filter(x > 0) %>% group_by(y) %>% summarize(total = sum(x))
Dplyr joins and merge statements should always have a ‘by’ argument.
Good
foo %>%
inner_join(bar, by = "covariateId")
Bad
foo %>%
inner_join(bar)
The OHDSI code style for SQL is heavily inspired by the Poor Man’s T-SQL Formatter, which is available as a NotePad++ plugin. The only difference with the default settings is that in OHDSI, commas are trailing. You can automatically format your SQL correctly by using the Poor Man’s T-SQL Formatter Online Tool (but don’t forget to set Trailing Commas).
Because several database platforms are case-insensitive and tend to convert table and field names to either uppercase (e.g. Oracle) or lowercase (e.g. PostgreSQL), we use snake_case. All names should be in lowercase. Reserved words should be in upper case.
Good
SELECT COUNT(*) AS person_count FROM person
Bad
SELECT COUNT(*) AS personCount FROM person
SELECT COUNT(*) AS Person_Count FROM person
SELECT COUNT(*) AS PERSON_COUNT FROM person
select count(*) as person_count from person
Commas should be trailing.
Good
SELECT COUNT(*) AS person_count,
condition_concept_id,
condition_type_concept_id
FROM condition_era
GROUP BY condition_concept_id,
condition_type_concept_id
Bad
SELECT COUNT(*) AS person_count
,condition_concept_id
,condition_type_concept_id
FROM condition_era
GROUP BY condition_concept_id
,condition_type_concept_id
Indentation is done using tabs. Field definitions are followed by a new line.
Good
SELECT COUNT(*) AS person_count,
condition_type_concept_id
FROM (
SELECT *
FROM condition_era
WHERE condition_concept_id = 123
) tmp
GROUP BY condition_type_concept_id;
Bad
SELECT COUNT(*) AS person_count, condition_type_concept_id
FROM (SELECT * FROM condition_era WHERE condition_concept_id = 123) tmp
GROUP BY condition_type_concept_id;
In R we use camel case, and in SQL we use snake case. Therefore, on
the interface between the two languages we must convert from one
convention to the other. To facilitate this, the OHDSI SqlRender
and
DatabaseConnector
packages provide various features.
In general, in R we can convert from one case to another using the
camelCaseToSnakeCase
and snakeCaseToCamelcase
functions in the SqlRender
package:
data <- data.frame(cohortId = 1,
cohortName = "test")
colnames(data) <- camelCaseToSnakeCase(colnames(data))
colnames(data)
# [1] "cohort_id" "cohort name"
colnames(data) <- snakeCaseToCamelcase(colnames(data))
colnames(data)
# [1] "cohortId" "cohortName"
When downloading data, a shortcut is to use the
snakeCaseToCamelcase
argument of the querySql
function in the DatabaseConnector
package:
sql <- "SELECT cohort_definition_id, subject_id FROM cdm.cohort;"
cohort <- querySql(connection = connection,
sql = sql,
snakeCaseToCamelcase = TRUE)
Where cohort
will be a data frame with columns
cohortDefinitionId
and subjectId
.
When uploading data, a shortcut is to use the
camelCaseToSnakeCase
argument of the
insertTable
in the DatabaseConnector
package:
dataToInsert <- data.frame(cohortId = 1,
cohortName = "test")
insertTable(connection = connection,
tableName = "my_table",
data = dataToInsert,
camelCaseToSnakeCase = TRUE)
Where the table called my_table
will have columns named
cohort_id
and cohort_name
.