vignettes/DbiAndDbplyr.Rmd
DbiAndDbplyr.Rmd
This vignette describes how to use the DatabaseConnector
package through the DBI
and dbplyr
interfaces.
It assumes you already know how to create a connection as described in
the ‘Connecting to a database’ vignette.
All functions of the DatabaseConnector
DBI
interface apply SQL translation, thus making it an interface to a
virtual database platform speaking OHDSISql as defined in
SqlRender
.
We can use the dbConnect()
function, which is equivalent
to the connect()
function:
connection <- dbConnect(
DatabaseConnectorDriver(),
dbms = "postgresql",
server = "localhost/postgres",
user = "joe",
password = "secret"
)
## Connecting using PostgreSQL driver
dbIsValid(conn)
## [1] TRUE
Querying and executing SQL can be done through the usual
DBI
functions. SQL statements are assumed to be written in
‘OhdsiSql’, a subset of SQL Server SQL (see the SqlRender
package for details), and are automatically translated to the
appropriate SQL dialect. For example:
dbGetQuery(connection, "SELECT TOP 3 * FROM cdmv5.person")
## person_id gender_concept_id year_of_birth
## 1 1 8507 1975
## 2 2 8507 1976
## 3 3 8507 1977
Or:
res <- dbSendQuery(connection, "SELECT TOP 3 * FROM cdmv5.person")
dbFetch(res)
## person_id gender_concept_id year_of_birth
## 1 1 8507 1975
## 2 2 8507 1976
## 3 3 8507 1977
dbHasCompleted(res)
## [1] TRUE
dbClearResult(res)
dbDisconnect(res)
We can create a table based on a DatabaseConnector
connection. The inDatabaseSchema()
function allows us to
use standard databaseSchema
notation promoted by
SqlRender:
library(dpylr)
person <- tbl(connection, inDatabaseSchema("cdmv5", "person"))
person
## person_id gender_concept_id year_of_birth
## 1 1 8507 1975
## 2 2 8507 1976
## 3 3 8507 1977
we can apply the usual dplyr
syntax:
person %>%
filter(gender_concept_id == 8507) %>%
count() %>%
pull()
## [1] 1234
The dbplyr
package does not support date functions, but
DatabaseConnector
provides the dateDiff()
,
dateAdd()
, eoMonth()
,
dateFromParts()
, year()
, month()
,
and day()
functions that will correctly translate to
various data platforms:
observationPeriod <- tbl(connection, inDatabaseSchema("cdmv5", "observation_period"))
observationPeriod %>%
filter(
dateDiff("day", observation_period_start_date, observation_period_end_date) > 365
) %>%
count() %>%
pull()
## [1] 987
Because of the many idiosyncrasies in how different dataplatforms
store and transform table and field names, it is currently not possible
to use any names that would require quotes. So for example the names
person
, person_id
, and
observation_period
are fine, but Person ID
and
Obs. Period
are not. In general, it is highly recommend to
use lower case snake_case for database table and field
names.
The DBI
interface uses temp table emulation on those
platforms that do not support real temp tables. This does require that
for those platforms the user specify a tempEmulationSchema
,
preferably using
option(sqlRenderTempEmulationSchema = "a_schema")
Where "a_schema"
refers to a schema where the user has
write access. If we know we will need temp tables, we can use the
assertTempEmulationSchemaSet()
to verify this option has
been set. This function will throw an error if it is not set, but only
if the provided dbms is a platform that requires temp table
emulation.
In OHDSISql
, temp tables are referred to using a ‘#’
prefix. For example:
dbWriteTable(connection, "#temp", cars)
## Inserting data took 0.053 secs
The copy_to
function creates a temp table:
carsTable <- copy_to(connection, cars)
## Created a temporary table named #cars
The compute()
function also creates a temp table, for
example:
tempTable <- person %>%
filter(gender_concept_id == 8507) %>%
compute()
## Created a temporary table named #dbplyr_001
Emulated temp tables are not really temporary, and therefore have to
be removed when no longer needed. A convenient way to drop all emulated
temp tables created so far in an R session is using the
dropEmulatedTempTables()
function:
dropEmulatedTempTables(connection)
In our example, this does not do anything because were using a PostgreSQL server, which does natively support temp tables.
We can use the dbDisconnect()
function, which is
equivalent to the disconnect()
function:
dbDisconnect(connection)