Execute cohort diagnostics — executeDiagnostics • CohortDiagnostics

Runs the cohort diagnostics on all (or a subset of) the cohorts instantiated using the Assumes the cohorts have already been instantiated. with the CohortGenerator package

Characterization: If runTemporalCohortCharacterization argument is TRUE, then the following default covariateSettings object will be created using RFeatureExtraction::createTemporalCovariateSettings Alternatively, a covariate setting object may be created using the above as an example.

Usage

executeDiagnostics(
  cohortDefinitionSet,
  exportFolder,
  databaseId,
  cohortDatabaseSchema,
  databaseName = NULL,
  databaseDescription = NULL,
  connectionDetails = NULL,
  connection = NULL,
  cdmDatabaseSchema,
  tempEmulationSchema = getOption("sqlRenderTempEmulationSchema"),
  cohortTable = "cohort",
  cohortTableNames = CohortGenerator::getCohortTableNames(cohortTable = cohortTable),
  vocabularyDatabaseSchema = cdmDatabaseSchema,
  cohortIds = NULL,
  cdmVersion = 5,
  runInclusionStatistics = TRUE,
  runIncludedSourceConcepts = TRUE,
  runOrphanConcepts = TRUE,
  runTimeSeries = FALSE,
  runVisitContext = TRUE,
  runBreakdownIndexEvents = TRUE,
  runIncidenceRate = TRUE,
  runCohortRelationship = TRUE,
  runTemporalCohortCharacterization = TRUE,
  temporalCovariateSettings = getDefaultCovariateSettings(),
  minCellCount = 5,
  minCharacterizationMean = 0.01,
  irWashoutPeriod = 0,
  incremental = FALSE,
  incrementalFolder = file.path(exportFolder, "incremental"),
  runFeatureExtractionOnSample = FALSE,
  sampleN = 1000,
  seed = 64374,
  seedArgs = NULL
)

Arguments

cohortDefinitionSet: Data.frame of cohorts must include columns cohortId, cohortName, json, sql
exportFolder: The folder where the output will be exported to. If this folder does not exist it will be created.
databaseId: A short string for identifying the database (e.g. 'Synpuf').
cohortDatabaseSchema: Schema name where your cohort table resides. Note that for SQL Server, this should include both the database and schema name, for example 'scratch.dbo'.
databaseName: The full name of the database. If NULL, defaults to value in cdm_source table
databaseDescription: A short description (several sentences) of the database. If NULL, defaults to value in cdm_source table
connectionDetails: An object of type connectionDetails as created using the createConnectionDetails function in the DatabaseConnector package. Can be left NULL if connection is provided.
connection: An object of type connection as created using the connect function in the DatabaseConnector package. Can be left NULL if connectionDetails is provided, in which case a new connection will be opened at the start of the function, and closed when the function finishes.
cdmDatabaseSchema: Schema name where your patient-level data in OMOP CDM format resides. Note that for SQL Server, this should include both the database and schema name, for example 'cdm_data.dbo'.
tempEmulationSchema: Some database platforms like Oracle and Impala do not truly support temp tables. To emulate temp tables, provide a schema with write privileges where temp tables can be created.
cohortTable: Name of the cohort table.
cohortTableNames: Cohort Table names used by CohortGenerator package
vocabularyDatabaseSchema: Schema name where your OMOP vocabulary data resides. This is commonly the same as cdmDatabaseSchema. Note that for SQL Server, this should include both the database and schema name, for example 'vocabulary.dbo'.
cohortIds: Optionally, provide a subset of cohort IDs to restrict the diagnostics to.
cdmVersion: The version of the OMOP CDM. Default 5. (Note: only 5 is supported.)
runInclusionStatistics: Generate and export statistic on the cohort inclusion rules?
runIncludedSourceConcepts: Generate and export the source concepts included in the cohorts?
runOrphanConcepts: Generate and export potential orphan concepts?
runTimeSeries: Generate and export the time series diagnostics?
runVisitContext: Generate and export index-date visit context?
runBreakdownIndexEvents: Generate and export the breakdown of index events?
runIncidenceRate: Generate and export the cohort incidence rates?
runCohortRelationship: Generate and export the cohort relationship? Cohort relationship checks the temporal relationship between two or more cohorts.
runTemporalCohortCharacterization: Generate and export the temporal cohort characterization? Only records with values greater than 0.001 are returned.
temporalCovariateSettings: Either an object of type covariateSettings as created using one of the createTemporalCovariateSettings function in the FeatureExtraction package, or a list of such objects.
minCellCount: The minimum cell count for fields contains person counts or fractions.
minCharacterizationMean: The minimum mean value for characterization output. Values below this will be cut off from output. This will help reduce the file size of the characterization output, but will remove information on covariates that have very low values. The default is 0.001 (i.e. 0.1 percent)
irWashoutPeriod: Number of days washout to include in calculation of incidence rates - default is 0
incremental: Create only cohort diagnostics that haven't been created before?
incrementalFolder: If incremental = TRUE, specify a folder where records are kept of which cohort diagnostics has been executed.
runFeatureExtractionOnSample: Logical. If TRUE, the function will operate on a sample of the data. Default is FALSE, meaning the function will operate on the full data set.
sampleN: Integer. The number of records to include in the sample if runFeatureExtractionOnSample is TRUE. Default is 1000. Ignored if runFeatureExtractionOnSample is FALSE.
seed: Integer. The seed for the random number generator used to create the sample. This ensures that the same sample can be drawn again in future runs. Default is 64374.
seedArgs: List. Additional arguments to pass to the sampling function. This can be used to control aspects of the sampling process beyond the seed and sample size.

Details

The cohortSetReference argument must be a data frame with at least the following columns.These fields will be exported as is to the cohort table that is part of Cohort Diagnostics results data model. Any additional fields found will be stored as JSON object in the metadata field of the cohort table:

cohortId: The cohort Id is the id used to identify a cohort definition. This is required to be unique. It will be used to create file names.
cohortName: The full name of the cohort. This will be shown in the Shiny app.
json: The JSON cohort definition for the cohort.
sql: The SQL of the cohort definition rendered from the cohort json.

Examples

if (FALSE) { # \dontrun{
# Load cohorts (assumes that they have already been instantiated)
cohortTableNames <- CohortGenerator::getCohortTableNames(cohortTable = "cohort")
cohorts <- CohortGenerator::getCohortDefinitionSet(packageName = "MyGreatPackage")
connectionDetails <- createConnectionDetails(
  dbms = "postgresql",
  server = "ohdsi.com",
  port = 5432,
  user = "me",
  password = "secure"
)

executeDiagnostics(
  cohorts = cohorts,
  exportFolder = "export",
  cohortTableNames = cohortTableNames,
  cohortDatabaseSchema = "results",
  cdmDatabaseSchema = "cdm",
  databaseId = "mySpecialCdm",
  connectionDetails = connectionDetails
)

# Use a custom set of cohorts defined in a data.frame
cohorts <- data.frame(
  cohortId = c(100),
  cohortName = c("Cohort Name"),
  logicDescription = c("My Cohort"),
  sql = c(readLines("path_to.sql")),
  json = c(readLines("path_to.json"))
)
executeDiagnostics(
  cohorts = cohorts,
  exportFolder = "export",
  cohortTable = "cohort",
  cohortDatabaseSchema = "results",
  cdmDatabaseSchema = "cdm",
  databaseId = "mySpecialCdm",
  connectionDetails = connectionDetails
)
} # }