R/Sql.R
renderTranslateQueryApplyBatched.Rd
This function renders, and translates SQL, sends it to the server, processes the data in batches with a call back function. Note that this function should perform a row-wise operation. This is designed to work with massive data that won't fit in to memory.
The batch sizes are determined by the java virtual machine and will depend on the data.
renderTranslateQueryApplyBatched(
connection,
sql,
fun,
args = list(),
errorReportFile = file.path(getwd(), "errorReportSql.txt"),
snakeCaseToCamelCase = FALSE,
tempEmulationSchema = getOption("sqlRenderTempEmulationSchema"),
integerAsNumeric = getOption("databaseConnectorIntegerAsNumeric", default = TRUE),
integer64AsNumeric = getOption("databaseConnectorInteger64AsNumeric", default = TRUE),
...
)
The connection to the database server created using either
connect()
or DBI::dbConnect()
.
The SQL to be send.
Function to apply to batch. Must take data.frame and integer position as parameters.
List of arguments to be passed to function call.
The file where an error report will be written if an error occurs. Defaults to 'errorReportSql.txt' in the current working directory.
If true, field names are assumed to use snake_case, and are converted to camelCase.
Some database platforms like Oracle and Impala do not truly support temp tables. To emulate temp tables, provide a schema with write privileges where temp tables can be created.
Logical: should 32-bit integers be converted to numeric (double) values? If FALSE
32-bit integers will be represented using R's native Integer
class.
Logical: should 64-bit integers be converted to numeric (double) values? If FALSE
64-bit integers will be represented using bit64::integer64
.
Parameters that will be used to render the SQL.
Invisibly returns a list of outputs from each call to the provided function.
Fields will be automatically converted for improved consistenty in these situations:
SQLite: Fields with names ending in _date
will be converted to DATE fields. Rationale: SQLite
does not support DATE fields.
SQLite: Fields with names ending in _datetime
will be converted to POSIXct fields. Rationale:
SQLite does not support DATETIME fields.
BigQuery and Snowflake: Integer fields will be converted to Integer if it fits in an integer, or will remain Integer64 otherwise. Rationale: these platforms do not distinguish between INT and BIGINT.
This function calls the render
and translate
functions in the SqlRender
package before
calling querySql()
.
if (FALSE) { # \dontrun{
connectionDetails <- createConnectionDetails(
dbms = "postgresql",
server = "localhost",
user = "root",
password = "blah",
schema = "cdm_v4"
)
connection <- connect(connectionDetails)
# First example: write data to a large CSV file:
filepath <- "myBigFile.csv"
writeBatchesToCsv <- function(data, position, ...) {
write.csv(data, filepath, append = position != 1)
return(NULL)
}
renderTranslateQueryApplyBatched(connection,
"SELECT * FROM @schema.person;",
schema = "cdm_synpuf",
fun = writeBatchesToCsv
)
# Second example: write data to Andromeda
# (Alternative to querySqlToAndromeda if some local computation needs to be applied)
bigResults <- Andromeda::andromeda()
writeBatchesToAndromeda <- function(data, position, ...) {
data$p <- EmpiricalCalibration::computeTraditionalP(data$logRr, data$logSeRr)
if (position == 1) {
bigResults$rrs <- data
} else {
Andromeda::appendToTable(bigResults$rrs, data)
}
return(NULL)
}
sql <- "SELECT target_id, comparator_id, log_rr, log_se_rr FROM @schema.my_results;"
renderTranslateQueryApplyBatched(connection,
sql,
fun = writeBatchesToAndromeda,
schema = "my_results",
snakeCaseToCamelCase = TRUE
)
disconnect(connection)
} # }