Introduction

This vignette describes how you can use the Characterization package for various descriptive studies using OMOP CDM data. The Characterization package currently contains three different types of analyses:

  • Aggregate Covariates: this returns the mean feature value for a set of features specified by the user for i) the Target cohort population, ii) the Outcome cohort population, iii) the Target population patients who had the outcome during some user specified time-at-risk and iv) the Target population patients who did not have the outcome during some user specified time-at-risk.
  • DechallengeRechallenge: this is mainly aimed at investigating whether a drug and event are causally related by seeing whether the drug is stopped close in time to the event occurrence (dechallenge) and then whether the drug is restarted (a rechallenge occurs) and if so, whether the event starts again (a failed rechallenge). In this analysis, the Target cohorts are the drug users of interest and the Outcome cohorts are the medical events you wish to see whether the drug may cause. The user must also specify how close in time a drug must be stopped after the outcome to be considered a dechallenge and how close in time an Outcome must occur after restarting the drug to be considered a failed rechallenge).
  • Time-to-event: this returns descriptive results showing the timing between the target cohort and outcome. This can help identify whether the outcome often precedes the target cohort or whether it generally comes after.

Setup

In this vignette we will show working examples using the Eunomia R package that contains simulated data. Run the following code to install the Eunomia R package:

install.packages("remotes")
remotes::install_github("ohdsi/Eunomia")

Eunomia can be used to create a temporary SQLITE database with the simulated data. The function getEunomiaConnectionDetails creates a SQLITE connection to a temporary location. The function createCohorts then populates the temporary SQLITE database with the simulated data ready to be used.

connectionDetails <- Eunomia::getEunomiaConnectionDetails()
Eunomia::createCohorts(connectionDetails = connectionDetails)
## Connecting using SQLite driver
## Creating cohort: Celecoxib
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
## Executing SQL took 0.0138 secs
## Creating cohort: Diclofenac
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
## Executing SQL took 0.0122 secs
## Creating cohort: GiBleed
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
## Executing SQL took 0.0228 secs
## Creating cohort: NSAIDs
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
## Executing SQL took 0.0641 secs
## Cohorts created in table main.cohort
##   cohortId       name
## 1        1  Celecoxib
## 2        2 Diclofenac
## 3        3    GiBleed
## 4        4     NSAIDs
##                                                                                        description
## 1    A simplified cohort definition for new users of celecoxib, designed specifically for Eunomia.
## 2    A simplified cohort definition for new users ofdiclofenac, designed specifically for Eunomia.
## 3 A simplified cohort definition for gastrointestinal bleeding, designed specifically for Eunomia.
## 4       A simplified cohort definition for new users of NSAIDs, designed specifically for Eunomia.
##   count
## 1  1844
## 2   850
## 3   479
## 4  2694

We also need to have the Characterization package installed and loaded

remotes::install_github("ohdsi/FeatureExtraction")
remotes::install_github("ohdsi/Characterization")
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Examples

Aggreagate Covariates

To run an ‘Aggregate Covariate’ analysis you need to create a setting object using createAggregateCovariateSettings. This requires specifying:

  • one or more targetIds (these must be pre-generated in a cohort table)
  • one or more outcomeIds (these must be pre-generated in a cohort table)
  • the covariate settings using FeatureExtraction::createCovariateSettings or by creating your own custom feature extraction code.
  • the time-at-risk settings
  • riskWindowStart
  • startAnchor
  • riskWindowEnd
  • endAnchor

Using the Eunomia data were we previous generated four cohorts, we can use cohort ids 1,2 and 4 as the targetIds and cohort id 3 as the outcomeIds:

exampleTargetIds <- c(1, 2, 4)
exampleOutcomeIds <- 3

If we want to get information on the sex assigned at birth, age at index and Charlson Comorbidity index we can create the settings using FeatureExtraction::createCovariateSettings:

exampleCovariateSettings <- FeatureExtraction::createCovariateSettings(
  useDemographicsGender = T,
  useDemographicsAge = T,
  useCharlsonIndex = T
)

If we want to create the aggregate features for all our target cohorts, our outcome cohort and each target cohort restricted to those with a record of the outcome 1 day after target cohort start date until 365 days after target cohort end date, excluding mean values below 0.01, we can run:

exampleAggregateCovariateSettings <- createAggregateCovariateSettings(
  targetIds = exampleTargetIds,
  outcomeIds = exampleOutcomeIds,
  riskWindowStart = 1, startAnchor = "cohort start",
  riskWindowEnd = 365, endAnchor = "cohort start",
  covariateSettings = exampleCovariateSettings,
  minCharacterizationMean = 0.01
)

Next we need to use the exampleAggregateCovariateSettings as the settings to computeAggregateCovariateAnalyses, we need to use the Eunomia connectionDetails and in Eunomia the OMOP CDM data and cohort table are in the ‘main’ schema. The cohort table name is ‘cohort’. The following code will apply the aggregated covariates analysis using the previously specified settings on the simulated Eunomia data:

agc <- computeAggregateCovariateAnalyses(
  connectionDetails = connectionDetails,
  cdmDatabaseSchema = "main",
  cdmVersion = 5,
  targetDatabaseSchema = "main",
  targetTable = "cohort",
  aggregateCovariateSettings = exampleAggregateCovariateSettings,
  databaseId = "Eunomia",
  runId = 1
)

If you would like to save the results you can use the function saveAggregateCovariateAnalyses and this can then be loaded using loadAggregateCovariateAnalyses.

The results are Andromeda objects that can we viewed using dplyr. There are four tables:

  • covariates:
agc$covariates %>%
  collect() %>%
  kableExtra::kbl()
databaseId runId cohortDefinitionId covariateId sumValue averageValue
Eunomia 1 1 8507001 237 0.4947808
Eunomia 1 1 8532001 242 0.5052192
Eunomia 1 2 8507001 180 0.5070423
Eunomia 1 2 8532001 175 0.4929577
Eunomia 1 3 8507001 57 0.4596774
Eunomia 1 3 8532001 67 0.5403226
Eunomia 1 4 8507001 237 0.4947808
Eunomia 1 4 8532001 242 0.5052192
Eunomia 1 5 8507001 894 0.4966667
Eunomia 1 5 8532001 906 0.5033333
Eunomia 1 6 8507001 395 0.4759036
Eunomia 1 6 8532001 435 0.5240964
Eunomia 1 7 8507001 1289 0.4901141
Eunomia 1 7 8532001 1341 0.5098859
Eunomia 1 8 8507001 180 0.5070423
Eunomia 1 8 8532001 175 0.4929577
Eunomia 1 9 8507001 57 0.4596774
Eunomia 1 9 8532001 67 0.5403226
Eunomia 1 10 8507001 237 0.4947808
Eunomia 1 10 8532001 242 0.5052192
Eunomia 1 11 8507001 180 0.5070423
Eunomia 1 11 8532001 175 0.4929577
Eunomia 1 12 8507001 57 0.4596774
Eunomia 1 12 8532001 67 0.5403226
Eunomia 1 13 8507001 237 0.4947808
Eunomia 1 13 8532001 242 0.5052192
Eunomia 1 14 8507001 914 0.4956616
Eunomia 1 14 8532001 930 0.5043384
Eunomia 1 15 8507001 407 0.4788235
Eunomia 1 15 8532001 443 0.5211765
Eunomia 1 16 8507001 1321 0.4903489
Eunomia 1 16 8532001 1373 0.5096511
Eunomia 1 17 8507001 237 0.4947808
Eunomia 1 17 8532001 242 0.5052192
Eunomia 1 18 8507001 180 0.5070423
Eunomia 1 18 8532001 175 0.4929577
Eunomia 1 19 8507001 57 0.4596774
Eunomia 1 19 8532001 67 0.5403226
Eunomia 1 20 8507001 237 0.4947808
Eunomia 1 20 8532001 242 0.5052192
  • covariatesContinuous:
agc$covariatesContinuous %>%
  collect() %>%
  kableExtra::kbl()
databaseId runId cohortDefinitionId covariateId countValue minValue maxValue averageValue standardDeviation medianValue p10Value p25Value p75Value p90Value
Eunomia 1 3 1901 41 0 2 0.4274194 0.4606464 0 0 0 1 1
Eunomia 1 9 1901 41 0 2 0.4274194 0.4606464 0 0 0 1 1
Eunomia 1 12 1901 41 0 2 0.4274194 0.4606464 0 0 0 1 1
Eunomia 1 19 1901 41 0 2 0.4274194 0.4606464 0 0 0 1 1
Eunomia 1 8 1901 275 0 2 0.9549296 0.4233403 1 0 1 1 2
Eunomia 1 11 1901 275 0 2 0.9549296 0.4233403 1 0 1 1 2
Eunomia 1 2 1901 275 0 2 0.9577465 0.4256226 1 0 1 1 2
Eunomia 1 18 1901 275 0 2 0.9577465 0.4256226 1 0 1 1 2
Eunomia 1 6 1901 296 0 3 0.4024096 0.3450454 0 0 0 1 1
Eunomia 1 15 1901 304 0 3 0.4035294 0.3446752 0 0 0 1 1
Eunomia 1 10 1901 316 0 2 0.8183716 0.4280688 1 0 0 1 2
Eunomia 1 13 1901 316 0 2 0.8183716 0.4280688 1 0 0 1 2
Eunomia 1 1 1901 316 0 2 0.8204593 0.4299773 1 0 0 1 2
Eunomia 1 4 1901 316 0 2 0.8204593 0.4299773 1 0 0 1 2
Eunomia 1 17 1901 316 0 2 0.8204593 0.4299773 1 0 0 1 2
Eunomia 1 20 1901 316 0 2 0.8204593 0.4299773 1 0 0 1 2
Eunomia 1 5 1901 935 0 2 0.6144444 0.3867813 1 0 0 1 1
Eunomia 1 14 1901 958 0 2 0.6144252 0.3865994 1 0 0 1 1
Eunomia 1 7 1901 1231 0 3 0.5475285 0.3777510 0 0 0 1 1
Eunomia 1 16 1901 1262 0 3 0.5478842 0.3775118 0 0 0 1 1
Eunomia 1 9 1002 124 32 46 38.8709677 3.4000663 39 34 36 41 44
Eunomia 1 12 1002 124 32 46 38.8709677 3.4000663 39 34 36 41 44
Eunomia 1 3 1002 124 32 47 38.9758065 3.4226973 39 34 36 41 44
Eunomia 1 19 1002 124 32 47 38.9758065 3.4226973 39 34 36 41 44
Eunomia 1 8 1002 355 32 46 38.7746479 3.2746121 39 35 36 41 43
Eunomia 1 11 1002 355 32 46 38.7746479 3.2746121 39 35 36 41 43
Eunomia 1 2 1002 355 32 46 38.9014085 3.2449654 39 35 36 41 43
Eunomia 1 18 1002 355 32 46 38.9014085 3.2449654 39 35 36 41 43
Eunomia 1 10 1002 479 32 46 38.7995825 3.3042257 39 34 36 41 44
Eunomia 1 13 1002 479 32 46 38.7995825 3.3042257 39 34 36 41 44
Eunomia 1 1 1002 479 32 47 38.9206681 3.2884308 39 35 36 41 44
Eunomia 1 4 1002 479 32 47 38.9206681 3.2884308 39 35 36 41 44
Eunomia 1 17 1002 479 32 47 38.9206681 3.2884308 39 35 36 41 44
Eunomia 1 20 1002 479 32 47 38.9206681 3.2884308 39 35 36 41 44
Eunomia 1 6 1002 830 31 46 38.5746988 3.2910429 39 34 36 41 43
Eunomia 1 15 1002 850 31 46 38.5658824 3.2816028 39 34 36 41 43
Eunomia 1 5 1002 1800 31 47 38.6450000 3.3212435 39 34 36 41 43
Eunomia 1 14 1002 1844 31 47 38.6469631 3.3192555 39 34 36 41 43
Eunomia 1 7 1002 2630 31 47 38.6228137 3.3112779 39 34 36 41 43
Eunomia 1 16 1002 2694 31 47 38.6213808 3.3070275 39 34 36 41 43
  • covariateRef:
agc$covariateRef %>%
  collect() %>%
  kableExtra::kbl()
databaseId runId covariateId covariateName analysisId conceptId
Eunomia 1 8507001 gender = MALE 1 8507
Eunomia 1 8532001 gender = FEMALE 1 8532
Eunomia 1 1901 Charlson index - Romano adaptation 901 0
Eunomia 1 1002 age in years 2 0
  • analysisRef:
agc$analysisRef %>%
  collect() %>%
  kableExtra::kbl()
databaseId runId analysisId analysisName domainId startDay endDay isBinary missingMeansZero
Eunomia 1 1 DemographicsGender Demographics NA NA Y NA
Eunomia 1 901 CharlsonIndex Condition NA 0 N Y
Eunomia 1 2 DemographicsAge Demographics NA NA N Y

Dechallenge Rechallenge

To run a ‘Dechallenge Rechallenge’ analysis you need to create a setting object using createDechallengeRechallengeSettings. This requires specifying:

  • one or more targetIds (these must be pre-generated in a cohort table)
  • one or more outcomeIds (these must be pre-generated in a cohort table)
  • dechallengeStopInterval
  • dechallengeEvaluationWindow

Using the Eunomia data were we previous generated four cohorts, we can use cohort ids 1,2 and 4 as the targetIds and cohort id 3 as the outcomeIds:

exampleTargetIds <- c(1, 2, 4)
exampleOutcomeIds <- 3

If we want to create the dechallenge rechallenge for all our target cohorts and our outcome cohort with a 30 day dechallengeStopInterval and 31 day dechallengeEvaluationWindow:

exampleDechallengeRechallengeSettings <- createDechallengeRechallengeSettings(
  targetIds = exampleTargetIds,
  outcomeIds = exampleOutcomeIds,
  dechallengeStopInterval = 30,
  dechallengeEvaluationWindow = 31
)

We can then run the analysis on the Eunomia data using computeDechallengeRechallengeAnalyses and the settings previously specified:

dc <- computeDechallengeRechallengeAnalyses(
  connectionDetails = connectionDetails,
  targetDatabaseSchema = "main",
  targetTable = "cohort",
  dechallengeRechallengeSettings = exampleDechallengeRechallengeSettings,
  databaseId = "Eunomia"
)
## Inputs checked
## Connecting using SQLite driver
## Computing dechallenge rechallenge results
##   |                                                                              |                                                                      |   0%  |                                                                              |============                                                          |  17%  |                                                                              |=======================                                               |  33%  |                                                                              |===================================                                   |  50%  |                                                                              |===============================================                       |  67%  |                                                                              |==========================================================            |  83%  |                                                                              |======================================================================| 100%
## Executing SQL took 0.00901 secs
## Computing dechallenge rechallenge for 3 target ids and 1outcome ids took 0.138 secs

If you would like to save the results you can use the function saveDechallengeRechallengeAnalyses and this can then be loaded using loadDechallengeRechallengeAnalyses.

The results are Andromeda objects that can we viewed using dplyr. There is just one table named dechallengeRechallenge:

dc$dechallengeRechallenge %>%
  collect() %>%
  kableExtra::kbl()
databaseId dechallengeStopInterval dechallengeEvaluationWindow targetCohortDefinitionId outcomeCohortDefinitionId numExposureEras numPersonsExposed numCases dechallengeAttempt dechallengeFail dechallengeSuccess rechallengeAttempt rechallengeFail rechallengeSuccess pctDechallengeAttempt pctDechallengeSuccess pctDechallengeFail pctRechallengeAttempt pctRechallengeSuccess pctRechallengeFail

Next it is possible to computer and extract the failed rechallenge cases

failed <- computeRechallengeFailCaseSeriesAnalyses(
  connectionDetails = connectionDetails,
  targetDatabaseSchema = "main",
  targetTable = "cohort",
  dechallengeRechallengeSettings = exampleDechallengeRechallengeSettings,
  outcomeDatabaseSchema = "main",
  outcomeTable = "cohort",
  databaseId = "Eunomia"
)
## Inputs checked
## Connecting using SQLite driver
## Computing dechallenge rechallenge results
##   |                                                                              |                                                                      |   0%  |                                                                              |============                                                          |  17%  |                                                                              |=======================                                               |  33%  |                                                                              |===================================                                   |  50%  |                                                                              |===============================================                       |  67%  |                                                                              |==========================================================            |  83%  |                                                                              |======================================================================| 100%
## Executing SQL took 0.0966 secs
## Computing dechallenge failed case series for 3 target IDs and 1 outcome IDs took 0.202 secs

The results are Andromeda objects that can we viewed using dplyr. There is just one table named rechallengeFailCaseSeries:

failed$rechallengeFailCaseSeries %>%
  collect() %>%
  kableExtra::kbl()
databaseId dechallengeStopInterval dechallengeEvaluationWindow targetCohortDefinitionId outcomeCohortDefinitionId personKey subjectId dechallengeExposureNumber dechallengeExposureStartDateOffset dechallengeExposureEndDateOffset dechallengeOutcomeNumber dechallengeOutcomeStartDateOffset rechallengeExposureNumber rechallengeExposureStartDateOffset rechallengeExposureEndDateOffset rechallengeOutcomeNumber rechallengeOutcomeStartDateOffset

Time to Event

To run a ‘Time-to-event’ analysis you need to create a setting object using createTimeToEventSettings. This requires specifying:

  • one or more targetIds (these must be pre-generated in a cohort table)
  • one or more outcomeIds (these must be pre-generated in a cohort table)
exampleTimeToEventSettings <- createTimeToEventSettings(
  targetIds = exampleTargetIds,
  outcomeIds = exampleOutcomeIds
)

We can then run the analysis on the Eunomia data using computeTimeToEventAnalyses and the settings previously specified:

tte <- computeTimeToEventAnalyses(
  connectionDetails = connectionDetails,
  cdmDatabaseSchema = "main",
  targetDatabaseSchema = "main",
  targetTable = "cohort",
  timeToEventSettings = exampleTimeToEventSettings,
  databaseId = "Eunomia"
)
## Connecting using SQLite driver
## Uploading #cohort_settings
## 
## Inserting data took 0.00793 secs
## Computing time to event results
##   |                                                                              |                                                                      |   0%  |                                                                              |===                                                                   |   4%  |                                                                              |======                                                                |   8%  |                                                                              |=========                                                             |  12%  |                                                                              |============                                                          |  17%  |                                                                              |===============                                                       |  21%  |                                                                              |==================                                                    |  25%  |                                                                              |====================                                                  |  29%  |                                                                              |=======================                                               |  33%  |                                                                              |==========================                                            |  38%  |                                                                              |=============================                                         |  42%  |                                                                              |================================                                      |  46%  |                                                                              |===================================                                   |  50%  |                                                                              |======================================                                |  54%  |                                                                              |=========================================                             |  58%  |                                                                              |============================================                          |  62%  |                                                                              |===============================================                       |  67%  |                                                                              |==================================================                    |  71%  |                                                                              |====================================================                  |  75%  |                                                                              |=======================================================               |  79%  |                                                                              |==========================================================            |  83%  |                                                                              |=============================================================         |  88%  |                                                                              |================================================================      |  92%  |                                                                              |===================================================================   |  96%  |                                                                              |======================================================================| 100%
## Executing SQL took 0.0525 secs
## Computing time-to-event for T-O pairs took 0.22 secs

If you would like to save the results you can use the function saveTimeToEventAnalyses and this can then be loaded using loadTimeToEventAnalyses.

The results are Andromeda objects that can we viewed using dplyr. There is just one table named timeToEvent:

tte$timeToEvent %>%
  collect() %>%
  top_n(10) %>%
  kableExtra::kbl()
## Selecting by timeScale
databaseId targetCohortDefinitionId outcomeCohortDefinitionId outcomeType targetOutcomeType timeToEvent numEvents timeScale
Eunomia 1 3 first After last target end 30 109 per 30-day
Eunomia 1 3 first After last target end 60 114 per 30-day
Eunomia 1 3 first After last target end 90 132 per 30-day
Eunomia 2 3 first After last target end 30 46 per 30-day
Eunomia 2 3 first After last target end 60 39 per 30-day
Eunomia 2 3 first After last target end 90 39 per 30-day
Eunomia 4 3 first After last target end 30 155 per 30-day
Eunomia 4 3 first After last target end 60 153 per 30-day
Eunomia 4 3 first After last target end 90 171 per 30-day
Eunomia 1 3 first After last target end 365 355 per 365-day
Eunomia 2 3 first After last target end 365 124 per 365-day
Eunomia 4 3 first After last target end 365 479 per 365-day

Run Multiple

If you want to run multiple analyses (of the three previously shown) you can use createCharacterizationSettings. You need to input a list of each of the settings (or NULL if you do not want to run one type of analysis). To run all the analyses previously shown in one function:

characterizationSettings <- createCharacterizationSettings(
  timeToEventSettings = list(
    exampleTimeToEventSettings
  ),
  dechallengeRechallengeSettings = list(
    exampleDechallengeRechallengeSettings
  ),
  aggregateCovariateSettings = list(
    exampleAggregateCovariateSettings
  )
)

# save the settings using
saveCharacterizationSettings(
  settings = characterizationSettings,
  saveDirectory = file.path(tempdir(), "saveSettings")
)

# the settings can be loaded
characterizationSettings <- loadCharacterizationSettings(
  saveDirectory = file.path(tempdir(), "saveSettings")
)

runCharacterizationAnalyses(
  connectionDetails = connectionDetails,
  cdmDatabaseSchema = "main",
  targetDatabaseSchema = "main",
  targetTable = "cohort",
  outcomeDatabaseSchema = "main",
  outcomeTable = "cohort",
  characterizationSettings = characterizationSettings,
  saveDirectory = file.path(tempdir(), "example"),
  tablePrefix = "c_",
  databaseId = "1"
)

This will create an SQLITE database with all the analyses saved into the saveDirectory. You can export the results as csv files using:

connectionDetailsT <- DatabaseConnector::createConnectionDetails(
  dbms = "sqlite",
  server = file.path(tempdir(), "example", "sqliteCharacterization", "sqlite.sqlite")
)

exportDatabaseToCsv(
  connectionDetails = connectionDetailsT,
  resultSchema = "main",
  targetDialect = "sqlite",
  tablePrefix = "c_",
  saveDirectory = file.path(tempdir(), "csv")
)