(Adapted from HADES)
The styler
package is highly recommended for automatically applying some (but not all) of the style recommendations here. styler
is available as a stand-alone R package, but also comes with a handy RStudio add-in.
We use camelCase in R. Function and variable names all start with lowercase. Package names start with uppercase.
Examples:
cohortData <- loadCohortData("myFolder")
SqlRender
packageFunction names typically start with a verb. Variable names are typically nouns. Do not encode the data type in the variable names. Also, everything is data, so no need to say that unless unavoidable.
Good
fitOutcomeModel
function.computeCovariateBalance
function.population
argument.Bad
sampling
as variable name (not a noun)namesVector
, covariatesDf
(encodes the data type)getResultData
(everything is data)Place spaces around all infix operators (=
, +
, -
, <-
, etc.). The same rule applies when using =
in function calls. Always put a space after a comma, and never before (just like in regular English).
Good
average <- mean(feet / 12 + inches, na.rm = TRUE)
Bad
average<-mean(feet/12+inches,na.rm=TRUE)
There’s a small exception to this rule: :
, ::
and :::
don’t need spaces around them.
Good
x <- 1:10
base::get
Bad
x <- 1 : 10
base :: get
Place a space before left parentheses, except in a function call.
Good
if (debug) {
do(x)
}
plot(x, y)
Bad
if(debug){
do(x)
}
plot (x, y)
Extra spacing (i.e., more than one space in a row) is ok if it improves alignment of equal signs or assignments (<-
).
Do not place spaces around code in parentheses or square brackets (unless there’s a comma, in which case see above).
Good
if (debug) {
do(x)
}
diamonds[5, ]
Bad
if ( debug ) { # No spaces around debug
do(x)
}
x[1,] # Needs a space after the comma
x[1 ,] # Space goes after comma not beforeCurly braces
An opening curly brace should never go on its own line and should always be followed by a new line. A closing curly brace should always go on its own line, unless it’s followed by else.
Always indent the code inside curly braces. It’s ok to leave very short statements on the same line:
if (y < 0 && debug) {
message("Y is negative")
}
Strive to limit your code to 100 characters per line. This fits comfortably on a printed page with a reasonably sized font. If you find yourself running out of room, this is a good indication that you should encapsulate some of the work in a separate function.
When indenting your code, use tabs. Never use spaces or mix tabs and spaces.
Hint: In RStudio you can use ctrl-i to automatically indent the code for you.
Use <-, not =, for assignment.
Good
x <- 5
Bad
x = 5
If-then-else clauses should always use curly brackets, even if there’s only one clause and it’s one statement.
Good
if (a == b) {
doSomething()
}
Bad
if (a == b) doSomething()
When calling a function that has more than one argument, make sure to refer to each argument by name instead of relying on the order of arguments.
Good
translateSql(sql = "COMMIT", targetDialect = "PDW")
Bad
translateSql("COMMIT", "PDW")
Comment your code only where the intent is not immediately obvious. Each line of a comment should begin with the comment symbol and a single space: #
. Comments should explain the why, not the what.
Use commented lines of -
to break up your file into easily readable chunks, for example:
## Load data ---------------------------
x <- readRDS("data.rds")
## Plot data ---------------------------
plot(x)
Opening curly brackets should precede a new line. A closing curly bracket should be followed by a new line except when it is followed by else
or a closing parenthesis.
Good
if (a == b) {
doSomething()
} else {
doSomethingElse()
}
Bad
if (a == b)
{
doSomething()
}
else
{
doSomethingElse()
}
Pipes should always be at the end of the line.
Good
foo %>%
filter(x > 0) %>%
group_by(y) %>%
summarize(total = sum(x))
Bad
foo %>% filter(x > 0) %>% group_by(y) %>% summarize(total = sum(x))
Dplyr joins and merge statements should always have a ‘by’ argument.
Good
foo %>%
inner_join(bar, by = "covariateId")
Bad
foo %>%
inner_join(bar)
The CHoRUS code style for SQL is heavily inspired by the Poor Man’s T-SQL Formatter, which is available as a NotePad++ plugin. The only difference with the default settings is that in CHoRUS, commas are trailing. You can automatically format your SQL correctly by using the Poor Man’s T-SQL Formatter Online Tool (but don’t forget to set Trailing Commas).
Because several database platforms are case-insensitive and tend to convert table and field names to either uppercase (e.g. Oracle) or lowercase (e.g. PostgreSQL), we use snake_case. All names should be in lowercase. Reserved words should be in upper case.
Good
SELECT COUNT(*) AS person_count FROM person
Bad
SELECT COUNT(*) AS personCount FROM person
SELECT COUNT(*) AS Person_Count FROM person
SELECT COUNT(*) AS PERSON_COUNT FROM person
select count(*) as person_count from person
Commas should be trailing.
Good
SELECT COUNT(*) AS person_count,
condition_concept_id,
condition_type_concept_id
FROM condition_era
GROUP BY condition_concept_id,
condition_type_concept_id
Bad
SELECT COUNT(*) AS person_count
,condition_concept_id
,condition_type_concept_id
FROM condition_era
GROUP BY condition_concept_id
,condition_type_concept_id
Indentation is done using tabs. Field definitions are followed by a new line.
Good
SELECT COUNT(*) AS person_count,
condition_type_concept_id
FROM (
SELECT *
FROM condition_era
WHERE condition_concept_id = 123
) tmp
GROUP BY condition_type_concept_id;
Bad
SELECT COUNT(*) AS person_count, condition_type_concept_id
FROM (SELECT * FROM condition_era WHERE condition_concept_id = 123) tmp
GROUP BY condition_type_concept_id;
TODO
TODO