Runs the cohort diagnostics on all (or a subset of) the cohorts instantiated using the Assumes the cohorts have already been instantiated. with the CohortGenerator package
Characterization:
If runTemporalCohortCharacterization argument is TRUE, then the following default covariateSettings object will be created
using RFeatureExtraction::createTemporalCovariateSettings
Alternatively, a covariate setting object may be created using the above as an example.
Usage
executeDiagnostics(
cohortDefinitionSet,
exportFolder,
databaseId,
cohortDatabaseSchema,
databaseName = NULL,
databaseDescription = NULL,
connectionDetails = NULL,
connection = NULL,
cdmDatabaseSchema,
tempEmulationSchema = getOption("sqlRenderTempEmulationSchema"),
cohortTable = "cohort",
cohortTableNames = CohortGenerator::getCohortTableNames(cohortTable = cohortTable),
vocabularyDatabaseSchema = cdmDatabaseSchema,
cohortIds = NULL,
cdmVersion = 5,
runInclusionStatistics = TRUE,
runIncludedSourceConcepts = TRUE,
runOrphanConcepts = TRUE,
runTimeSeries = FALSE,
runVisitContext = TRUE,
runBreakdownIndexEvents = TRUE,
runIncidenceRate = TRUE,
runCohortRelationship = TRUE,
runTemporalCohortCharacterization = TRUE,
temporalCovariateSettings = getDefaultCovariateSettings(),
minCellCount = 5,
minCharacterizationMean = 0.01,
irWashoutPeriod = 0,
incremental = FALSE,
incrementalFolder = file.path(exportFolder, "incremental"),
runFeatureExtractionOnSample = FALSE,
sampleN = 1000,
seed = 64374,
seedArgs = NULL
)
Arguments
- cohortDefinitionSet
Data.frame of cohorts must include columns cohortId, cohortName, json, sql
- exportFolder
The folder where the output will be exported to. If this folder does not exist it will be created.
- databaseId
A short string for identifying the database (e.g. 'Synpuf').
- cohortDatabaseSchema
Schema name where your cohort table resides. Note that for SQL Server, this should include both the database and schema name, for example 'scratch.dbo'.
- databaseName
The full name of the database. If NULL, defaults to value in cdm_source table
- databaseDescription
A short description (several sentences) of the database. If NULL, defaults to value in cdm_source table
- connectionDetails
An object of type
connectionDetails
as created using thecreateConnectionDetails
function in the DatabaseConnector package. Can be left NULL ifconnection
is provided.- connection
An object of type
connection
as created using theconnect
function in the DatabaseConnector package. Can be left NULL ifconnectionDetails
is provided, in which case a new connection will be opened at the start of the function, and closed when the function finishes.- cdmDatabaseSchema
Schema name where your patient-level data in OMOP CDM format resides. Note that for SQL Server, this should include both the database and schema name, for example 'cdm_data.dbo'.
- tempEmulationSchema
Some database platforms like Oracle and Impala do not truly support temp tables. To emulate temp tables, provide a schema with write privileges where temp tables can be created.
- cohortTable
Name of the cohort table.
- cohortTableNames
Cohort Table names used by CohortGenerator package
- vocabularyDatabaseSchema
Schema name where your OMOP vocabulary data resides. This is commonly the same as cdmDatabaseSchema. Note that for SQL Server, this should include both the database and schema name, for example 'vocabulary.dbo'.
- cohortIds
Optionally, provide a subset of cohort IDs to restrict the diagnostics to.
- cdmVersion
The version of the OMOP CDM. Default 5. (Note: only 5 is supported.)
- runInclusionStatistics
Generate and export statistic on the cohort inclusion rules?
- runIncludedSourceConcepts
Generate and export the source concepts included in the cohorts?
- runOrphanConcepts
Generate and export potential orphan concepts?
- runTimeSeries
Generate and export the time series diagnostics?
- runVisitContext
Generate and export index-date visit context?
- runBreakdownIndexEvents
Generate and export the breakdown of index events?
- runIncidenceRate
Generate and export the cohort incidence rates?
- runCohortRelationship
Generate and export the cohort relationship? Cohort relationship checks the temporal relationship between two or more cohorts.
- runTemporalCohortCharacterization
Generate and export the temporal cohort characterization? Only records with values greater than 0.001 are returned.
- temporalCovariateSettings
Either an object of type
covariateSettings
as created using one of the createTemporalCovariateSettings function in the FeatureExtraction package, or a list of such objects.- minCellCount
The minimum cell count for fields contains person counts or fractions.
- minCharacterizationMean
The minimum mean value for characterization output. Values below this will be cut off from output. This will help reduce the file size of the characterization output, but will remove information on covariates that have very low values. The default is 0.001 (i.e. 0.1 percent)
- irWashoutPeriod
Number of days washout to include in calculation of incidence rates - default is 0
- incremental
Create only cohort diagnostics that haven't been created before?
- incrementalFolder
If
incremental = TRUE
, specify a folder where records are kept of which cohort diagnostics has been executed.- runFeatureExtractionOnSample
Logical. If TRUE, the function will operate on a sample of the data. Default is FALSE, meaning the function will operate on the full data set.
- sampleN
Integer. The number of records to include in the sample if runFeatureExtractionOnSample is TRUE. Default is 1000. Ignored if runFeatureExtractionOnSample is FALSE.
- seed
Integer. The seed for the random number generator used to create the sample. This ensures that the same sample can be drawn again in future runs. Default is 64374.
- seedArgs
List. Additional arguments to pass to the sampling function. This can be used to control aspects of the sampling process beyond the seed and sample size.
Details
The cohortSetReference
argument must be a data frame with at least the following columns.These fields will be exported as is to the cohort table that is part of Cohort Diagnostics results data model. Any additional fields found will be stored as JSON object in the metadata field of the cohort table:
- cohortId
The cohort Id is the id used to identify a cohort definition. This is required to be unique. It will be used to create file names.
- cohortName
The full name of the cohort. This will be shown in the Shiny app.
- json
The JSON cohort definition for the cohort.
- sql
The SQL of the cohort definition rendered from the cohort json.
Examples
if (FALSE) { # \dontrun{
# Load cohorts (assumes that they have already been instantiated)
cohortTableNames <- CohortGenerator::getCohortTableNames(cohortTable = "cohort")
cohorts <- CohortGenerator::getCohortDefinitionSet(packageName = "MyGreatPackage")
connectionDetails <- createConnectionDetails(
dbms = "postgresql",
server = "ohdsi.com",
port = 5432,
user = "me",
password = "secure"
)
executeDiagnostics(
cohorts = cohorts,
exportFolder = "export",
cohortTableNames = cohortTableNames,
cohortDatabaseSchema = "results",
cdmDatabaseSchema = "cdm",
databaseId = "mySpecialCdm",
connectionDetails = connectionDetails
)
# Use a custom set of cohorts defined in a data.frame
cohorts <- data.frame(
cohortId = c(100),
cohortName = c("Cohort Name"),
logicDescription = c("My Cohort"),
sql = c(readLines("path_to.sql")),
json = c(readLines("path_to.json"))
)
executeDiagnostics(
cohorts = cohorts,
exportFolder = "export",
cohortTable = "cohort",
cohortDatabaseSchema = "results",
cdmDatabaseSchema = "cdm",
databaseId = "mySpecialCdm",
connectionDetails = connectionDetails
)
} # }