Skip to contents

Runs the cohort diagnostics on all (or a subset of) the cohorts instantiated using the Assumes the cohorts have already been instantiated. with the CohortGenerator package

Characterization: If runTemporalCohortCharacterization argument is TRUE, then the following default covariateSettings object will be created using RFeatureExtraction::createTemporalCovariateSettings Alternatively, a covariate setting object may be created using the above as an example.

Usage

executeDiagnostics(
  cohortDefinitionSet,
  exportFolder,
  databaseId,
  cohortDatabaseSchema,
  databaseName = NULL,
  databaseDescription = NULL,
  connectionDetails = NULL,
  connection = NULL,
  cdmDatabaseSchema,
  tempEmulationSchema = getOption("sqlRenderTempEmulationSchema"),
  cohortTable = "cohort",
  cohortTableNames = CohortGenerator::getCohortTableNames(cohortTable = cohortTable),
  vocabularyDatabaseSchema = cdmDatabaseSchema,
  cohortIds = NULL,
  cdmVersion = 5,
  runInclusionStatistics = TRUE,
  runIncludedSourceConcepts = TRUE,
  runOrphanConcepts = TRUE,
  runTimeSeries = FALSE,
  runVisitContext = TRUE,
  runBreakdownIndexEvents = TRUE,
  runIncidenceRate = TRUE,
  runCohortRelationship = TRUE,
  runTemporalCohortCharacterization = TRUE,
  temporalCovariateSettings = getDefaultCovariateSettings(),
  minCellCount = 5,
  minCharacterizationMean = 0.01,
  irWashoutPeriod = 0,
  incremental = FALSE,
  incrementalFolder = file.path(exportFolder, "incremental"),
  runFeatureExtractionOnSample = FALSE,
  sampleN = 1000,
  seed = 64374,
  seedArgs = NULL
)

Arguments

cohortDefinitionSet

Data.frame of cohorts must include columns cohortId, cohortName, json, sql

exportFolder

The folder where the output will be exported to. If this folder does not exist it will be created.

databaseId

A short string for identifying the database (e.g. 'Synpuf').

cohortDatabaseSchema

Schema name where your cohort table resides. Note that for SQL Server, this should include both the database and schema name, for example 'scratch.dbo'.

databaseName

The full name of the database. If NULL, defaults to value in cdm_source table

databaseDescription

A short description (several sentences) of the database. If NULL, defaults to value in cdm_source table

connectionDetails

An object of type connectionDetails as created using the createConnectionDetails function in the DatabaseConnector package. Can be left NULL if connection is provided.

connection

An object of type connection as created using the connect function in the DatabaseConnector package. Can be left NULL if connectionDetails is provided, in which case a new connection will be opened at the start of the function, and closed when the function finishes.

cdmDatabaseSchema

Schema name where your patient-level data in OMOP CDM format resides. Note that for SQL Server, this should include both the database and schema name, for example 'cdm_data.dbo'.

tempEmulationSchema

Some database platforms like Oracle and Impala do not truly support temp tables. To emulate temp tables, provide a schema with write privileges where temp tables can be created.

cohortTable

Name of the cohort table.

cohortTableNames

Cohort Table names used by CohortGenerator package

vocabularyDatabaseSchema

Schema name where your OMOP vocabulary data resides. This is commonly the same as cdmDatabaseSchema. Note that for SQL Server, this should include both the database and schema name, for example 'vocabulary.dbo'.

cohortIds

Optionally, provide a subset of cohort IDs to restrict the diagnostics to.

cdmVersion

The version of the OMOP CDM. Default 5. (Note: only 5 is supported.)

runInclusionStatistics

Generate and export statistic on the cohort inclusion rules?

runIncludedSourceConcepts

Generate and export the source concepts included in the cohorts?

runOrphanConcepts

Generate and export potential orphan concepts?

runTimeSeries

Generate and export the time series diagnostics?

runVisitContext

Generate and export index-date visit context?

runBreakdownIndexEvents

Generate and export the breakdown of index events?

runIncidenceRate

Generate and export the cohort incidence rates?

runCohortRelationship

Generate and export the cohort relationship? Cohort relationship checks the temporal relationship between two or more cohorts.

runTemporalCohortCharacterization

Generate and export the temporal cohort characterization? Only records with values greater than 0.001 are returned.

temporalCovariateSettings

Either an object of type covariateSettings as created using one of the createTemporalCovariateSettings function in the FeatureExtraction package, or a list of such objects.

minCellCount

The minimum cell count for fields contains person counts or fractions.

minCharacterizationMean

The minimum mean value for characterization output. Values below this will be cut off from output. This will help reduce the file size of the characterization output, but will remove information on covariates that have very low values. The default is 0.001 (i.e. 0.1 percent)

irWashoutPeriod

Number of days washout to include in calculation of incidence rates - default is 0

incremental

Create only cohort diagnostics that haven't been created before?

incrementalFolder

If incremental = TRUE, specify a folder where records are kept of which cohort diagnostics has been executed.

runFeatureExtractionOnSample

Logical. If TRUE, the function will operate on a sample of the data. Default is FALSE, meaning the function will operate on the full data set.

sampleN

Integer. The number of records to include in the sample if runFeatureExtractionOnSample is TRUE. Default is 1000. Ignored if runFeatureExtractionOnSample is FALSE.

seed

Integer. The seed for the random number generator used to create the sample. This ensures that the same sample can be drawn again in future runs. Default is 64374.

seedArgs

List. Additional arguments to pass to the sampling function. This can be used to control aspects of the sampling process beyond the seed and sample size.

Details

The cohortSetReference argument must be a data frame with at least the following columns.These fields will be exported as is to the cohort table that is part of Cohort Diagnostics results data model. Any additional fields found will be stored as JSON object in the metadata field of the cohort table:

cohortId

The cohort Id is the id used to identify a cohort definition. This is required to be unique. It will be used to create file names.

cohortName

The full name of the cohort. This will be shown in the Shiny app.

json

The JSON cohort definition for the cohort.

sql

The SQL of the cohort definition rendered from the cohort json.

Examples

if (FALSE) { # \dontrun{
# Load cohorts (assumes that they have already been instantiated)
cohortTableNames <- CohortGenerator::getCohortTableNames(cohortTable = "cohort")
cohorts <- CohortGenerator::getCohortDefinitionSet(packageName = "MyGreatPackage")
connectionDetails <- createConnectionDetails(
  dbms = "postgresql",
  server = "ohdsi.com",
  port = 5432,
  user = "me",
  password = "secure"
)

executeDiagnostics(
  cohorts = cohorts,
  exportFolder = "export",
  cohortTableNames = cohortTableNames,
  cohortDatabaseSchema = "results",
  cdmDatabaseSchema = "cdm",
  databaseId = "mySpecialCdm",
  connectionDetails = connectionDetails
)

# Use a custom set of cohorts defined in a data.frame
cohorts <- data.frame(
  cohortId = c(100),
  cohortName = c("Cohort Name"),
  logicDescription = c("My Cohort"),
  sql = c(readLines("path_to.sql")),
  json = c(readLines("path_to.json"))
)
executeDiagnostics(
  cohorts = cohorts,
  exportFolder = "export",
  cohortTable = "cohort",
  cohortDatabaseSchema = "results",
  cdmDatabaseSchema = "cdm",
  databaseId = "mySpecialCdm",
  connectionDetails = connectionDetails
)
} # }