vignettes/UsingCharacterizationPackage.Rmd
UsingCharacterizationPackage.Rmd
This vignette describes how you can use the Characterization package for various descriptive studies using OMOP CDM data. The Characterization package currently contains three different types of analyses:
In this vignette we will show working examples using the
Eunomia
R package that contains simulated data. Run the
following code to install the Eunomia
R package:
install.packages("remotes")
remotes::install_github("ohdsi/Eunomia")
Eunomia can be used to create a temporary SQLITE database with the
simulated data. The function getEunomiaConnectionDetails
creates a SQLITE connection to a temporary location. The function
createCohorts
then populates the temporary SQLITE database
with the simulated data ready to be used.
connectionDetails <- Eunomia::getEunomiaConnectionDetails()
Eunomia::createCohorts(connectionDetails = connectionDetails)
## Connecting using SQLite driver
## Creating cohort: Celecoxib
## | | | 0% | |======================================================================| 100%
## Executing SQL took 0.0138 secs
## Creating cohort: Diclofenac
## | | | 0% | |======================================================================| 100%
## Executing SQL took 0.0122 secs
## Creating cohort: GiBleed
## | | | 0% | |======================================================================| 100%
## Executing SQL took 0.0228 secs
## Creating cohort: NSAIDs
## | | | 0% | |======================================================================| 100%
## Executing SQL took 0.0641 secs
## Cohorts created in table main.cohort
## cohortId name
## 1 1 Celecoxib
## 2 2 Diclofenac
## 3 3 GiBleed
## 4 4 NSAIDs
## description
## 1 A simplified cohort definition for new users of celecoxib, designed specifically for Eunomia.
## 2 A simplified cohort definition for new users ofdiclofenac, designed specifically for Eunomia.
## 3 A simplified cohort definition for gastrointestinal bleeding, designed specifically for Eunomia.
## 4 A simplified cohort definition for new users of NSAIDs, designed specifically for Eunomia.
## count
## 1 1844
## 2 850
## 3 479
## 4 2694
We also need to have the Characterization package installed and loaded
remotes::install_github("ohdsi/FeatureExtraction")
remotes::install_github("ohdsi/Characterization")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
To run an ‘Aggregate Covariate’ analysis you need to create a setting
object using createAggregateCovariateSettings
. This
requires specifying:
FeatureExtraction::createCovariateSettings
or by creating
your own custom feature extraction code.Using the Eunomia data were we previous generated four cohorts, we can use cohort ids 1,2 and 4 as the targetIds and cohort id 3 as the outcomeIds:
exampleTargetIds <- c(1, 2, 4)
exampleOutcomeIds <- 3
If we want to get information on the sex assigned at birth, age at
index and Charlson Comorbidity index we can create the settings using
FeatureExtraction::createCovariateSettings
:
exampleCovariateSettings <- FeatureExtraction::createCovariateSettings(
useDemographicsGender = T,
useDemographicsAge = T,
useCharlsonIndex = T
)
If we want to create the aggregate features for all our target cohorts, our outcome cohort and each target cohort restricted to those with a record of the outcome 1 day after target cohort start date until 365 days after target cohort end date, excluding mean values below 0.01, we can run:
exampleAggregateCovariateSettings <- createAggregateCovariateSettings(
targetIds = exampleTargetIds,
outcomeIds = exampleOutcomeIds,
riskWindowStart = 1, startAnchor = "cohort start",
riskWindowEnd = 365, endAnchor = "cohort start",
covariateSettings = exampleCovariateSettings,
minCharacterizationMean = 0.01
)
Next we need to use the
exampleAggregateCovariateSettings
as the settings to
computeAggregateCovariateAnalyses
, we need to use the
Eunomia connectionDetails and in Eunomia the OMOP CDM data and cohort
table are in the ‘main’ schema. The cohort table name is ‘cohort’. The
following code will apply the aggregated covariates analysis using the
previously specified settings on the simulated Eunomia data:
agc <- computeAggregateCovariateAnalyses(
connectionDetails = connectionDetails,
cdmDatabaseSchema = "main",
cdmVersion = 5,
targetDatabaseSchema = "main",
targetTable = "cohort",
aggregateCovariateSettings = exampleAggregateCovariateSettings,
databaseId = "Eunomia",
runId = 1
)
If you would like to save the results you can use the function
saveAggregateCovariateAnalyses
and this can then be loaded
using loadAggregateCovariateAnalyses
.
The results are Andromeda objects that can we viewed using
dplyr
. There are four tables:
databaseId | runId | cohortDefinitionId | covariateId | sumValue | averageValue |
---|---|---|---|---|---|
Eunomia | 1 | 1 | 8507001 | 237 | 0.4947808 |
Eunomia | 1 | 1 | 8532001 | 242 | 0.5052192 |
Eunomia | 1 | 2 | 8507001 | 180 | 0.5070423 |
Eunomia | 1 | 2 | 8532001 | 175 | 0.4929577 |
Eunomia | 1 | 3 | 8507001 | 57 | 0.4596774 |
Eunomia | 1 | 3 | 8532001 | 67 | 0.5403226 |
Eunomia | 1 | 4 | 8507001 | 237 | 0.4947808 |
Eunomia | 1 | 4 | 8532001 | 242 | 0.5052192 |
Eunomia | 1 | 5 | 8507001 | 894 | 0.4966667 |
Eunomia | 1 | 5 | 8532001 | 906 | 0.5033333 |
Eunomia | 1 | 6 | 8507001 | 395 | 0.4759036 |
Eunomia | 1 | 6 | 8532001 | 435 | 0.5240964 |
Eunomia | 1 | 7 | 8507001 | 1289 | 0.4901141 |
Eunomia | 1 | 7 | 8532001 | 1341 | 0.5098859 |
Eunomia | 1 | 8 | 8507001 | 180 | 0.5070423 |
Eunomia | 1 | 8 | 8532001 | 175 | 0.4929577 |
Eunomia | 1 | 9 | 8507001 | 57 | 0.4596774 |
Eunomia | 1 | 9 | 8532001 | 67 | 0.5403226 |
Eunomia | 1 | 10 | 8507001 | 237 | 0.4947808 |
Eunomia | 1 | 10 | 8532001 | 242 | 0.5052192 |
Eunomia | 1 | 11 | 8507001 | 180 | 0.5070423 |
Eunomia | 1 | 11 | 8532001 | 175 | 0.4929577 |
Eunomia | 1 | 12 | 8507001 | 57 | 0.4596774 |
Eunomia | 1 | 12 | 8532001 | 67 | 0.5403226 |
Eunomia | 1 | 13 | 8507001 | 237 | 0.4947808 |
Eunomia | 1 | 13 | 8532001 | 242 | 0.5052192 |
Eunomia | 1 | 14 | 8507001 | 914 | 0.4956616 |
Eunomia | 1 | 14 | 8532001 | 930 | 0.5043384 |
Eunomia | 1 | 15 | 8507001 | 407 | 0.4788235 |
Eunomia | 1 | 15 | 8532001 | 443 | 0.5211765 |
Eunomia | 1 | 16 | 8507001 | 1321 | 0.4903489 |
Eunomia | 1 | 16 | 8532001 | 1373 | 0.5096511 |
Eunomia | 1 | 17 | 8507001 | 237 | 0.4947808 |
Eunomia | 1 | 17 | 8532001 | 242 | 0.5052192 |
Eunomia | 1 | 18 | 8507001 | 180 | 0.5070423 |
Eunomia | 1 | 18 | 8532001 | 175 | 0.4929577 |
Eunomia | 1 | 19 | 8507001 | 57 | 0.4596774 |
Eunomia | 1 | 19 | 8532001 | 67 | 0.5403226 |
Eunomia | 1 | 20 | 8507001 | 237 | 0.4947808 |
Eunomia | 1 | 20 | 8532001 | 242 | 0.5052192 |
databaseId | runId | cohortDefinitionId | covariateId | countValue | minValue | maxValue | averageValue | standardDeviation | medianValue | p10Value | p25Value | p75Value | p90Value |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Eunomia | 1 | 3 | 1901 | 41 | 0 | 2 | 0.4274194 | 0.4606464 | 0 | 0 | 0 | 1 | 1 |
Eunomia | 1 | 9 | 1901 | 41 | 0 | 2 | 0.4274194 | 0.4606464 | 0 | 0 | 0 | 1 | 1 |
Eunomia | 1 | 12 | 1901 | 41 | 0 | 2 | 0.4274194 | 0.4606464 | 0 | 0 | 0 | 1 | 1 |
Eunomia | 1 | 19 | 1901 | 41 | 0 | 2 | 0.4274194 | 0.4606464 | 0 | 0 | 0 | 1 | 1 |
Eunomia | 1 | 8 | 1901 | 275 | 0 | 2 | 0.9549296 | 0.4233403 | 1 | 0 | 1 | 1 | 2 |
Eunomia | 1 | 11 | 1901 | 275 | 0 | 2 | 0.9549296 | 0.4233403 | 1 | 0 | 1 | 1 | 2 |
Eunomia | 1 | 2 | 1901 | 275 | 0 | 2 | 0.9577465 | 0.4256226 | 1 | 0 | 1 | 1 | 2 |
Eunomia | 1 | 18 | 1901 | 275 | 0 | 2 | 0.9577465 | 0.4256226 | 1 | 0 | 1 | 1 | 2 |
Eunomia | 1 | 6 | 1901 | 296 | 0 | 3 | 0.4024096 | 0.3450454 | 0 | 0 | 0 | 1 | 1 |
Eunomia | 1 | 15 | 1901 | 304 | 0 | 3 | 0.4035294 | 0.3446752 | 0 | 0 | 0 | 1 | 1 |
Eunomia | 1 | 10 | 1901 | 316 | 0 | 2 | 0.8183716 | 0.4280688 | 1 | 0 | 0 | 1 | 2 |
Eunomia | 1 | 13 | 1901 | 316 | 0 | 2 | 0.8183716 | 0.4280688 | 1 | 0 | 0 | 1 | 2 |
Eunomia | 1 | 1 | 1901 | 316 | 0 | 2 | 0.8204593 | 0.4299773 | 1 | 0 | 0 | 1 | 2 |
Eunomia | 1 | 4 | 1901 | 316 | 0 | 2 | 0.8204593 | 0.4299773 | 1 | 0 | 0 | 1 | 2 |
Eunomia | 1 | 17 | 1901 | 316 | 0 | 2 | 0.8204593 | 0.4299773 | 1 | 0 | 0 | 1 | 2 |
Eunomia | 1 | 20 | 1901 | 316 | 0 | 2 | 0.8204593 | 0.4299773 | 1 | 0 | 0 | 1 | 2 |
Eunomia | 1 | 5 | 1901 | 935 | 0 | 2 | 0.6144444 | 0.3867813 | 1 | 0 | 0 | 1 | 1 |
Eunomia | 1 | 14 | 1901 | 958 | 0 | 2 | 0.6144252 | 0.3865994 | 1 | 0 | 0 | 1 | 1 |
Eunomia | 1 | 7 | 1901 | 1231 | 0 | 3 | 0.5475285 | 0.3777510 | 0 | 0 | 0 | 1 | 1 |
Eunomia | 1 | 16 | 1901 | 1262 | 0 | 3 | 0.5478842 | 0.3775118 | 0 | 0 | 0 | 1 | 1 |
Eunomia | 1 | 9 | 1002 | 124 | 32 | 46 | 38.8709677 | 3.4000663 | 39 | 34 | 36 | 41 | 44 |
Eunomia | 1 | 12 | 1002 | 124 | 32 | 46 | 38.8709677 | 3.4000663 | 39 | 34 | 36 | 41 | 44 |
Eunomia | 1 | 3 | 1002 | 124 | 32 | 47 | 38.9758065 | 3.4226973 | 39 | 34 | 36 | 41 | 44 |
Eunomia | 1 | 19 | 1002 | 124 | 32 | 47 | 38.9758065 | 3.4226973 | 39 | 34 | 36 | 41 | 44 |
Eunomia | 1 | 8 | 1002 | 355 | 32 | 46 | 38.7746479 | 3.2746121 | 39 | 35 | 36 | 41 | 43 |
Eunomia | 1 | 11 | 1002 | 355 | 32 | 46 | 38.7746479 | 3.2746121 | 39 | 35 | 36 | 41 | 43 |
Eunomia | 1 | 2 | 1002 | 355 | 32 | 46 | 38.9014085 | 3.2449654 | 39 | 35 | 36 | 41 | 43 |
Eunomia | 1 | 18 | 1002 | 355 | 32 | 46 | 38.9014085 | 3.2449654 | 39 | 35 | 36 | 41 | 43 |
Eunomia | 1 | 10 | 1002 | 479 | 32 | 46 | 38.7995825 | 3.3042257 | 39 | 34 | 36 | 41 | 44 |
Eunomia | 1 | 13 | 1002 | 479 | 32 | 46 | 38.7995825 | 3.3042257 | 39 | 34 | 36 | 41 | 44 |
Eunomia | 1 | 1 | 1002 | 479 | 32 | 47 | 38.9206681 | 3.2884308 | 39 | 35 | 36 | 41 | 44 |
Eunomia | 1 | 4 | 1002 | 479 | 32 | 47 | 38.9206681 | 3.2884308 | 39 | 35 | 36 | 41 | 44 |
Eunomia | 1 | 17 | 1002 | 479 | 32 | 47 | 38.9206681 | 3.2884308 | 39 | 35 | 36 | 41 | 44 |
Eunomia | 1 | 20 | 1002 | 479 | 32 | 47 | 38.9206681 | 3.2884308 | 39 | 35 | 36 | 41 | 44 |
Eunomia | 1 | 6 | 1002 | 830 | 31 | 46 | 38.5746988 | 3.2910429 | 39 | 34 | 36 | 41 | 43 |
Eunomia | 1 | 15 | 1002 | 850 | 31 | 46 | 38.5658824 | 3.2816028 | 39 | 34 | 36 | 41 | 43 |
Eunomia | 1 | 5 | 1002 | 1800 | 31 | 47 | 38.6450000 | 3.3212435 | 39 | 34 | 36 | 41 | 43 |
Eunomia | 1 | 14 | 1002 | 1844 | 31 | 47 | 38.6469631 | 3.3192555 | 39 | 34 | 36 | 41 | 43 |
Eunomia | 1 | 7 | 1002 | 2630 | 31 | 47 | 38.6228137 | 3.3112779 | 39 | 34 | 36 | 41 | 43 |
Eunomia | 1 | 16 | 1002 | 2694 | 31 | 47 | 38.6213808 | 3.3070275 | 39 | 34 | 36 | 41 | 43 |
databaseId | runId | covariateId | covariateName | analysisId | conceptId |
---|---|---|---|---|---|
Eunomia | 1 | 8507001 | gender = MALE | 1 | 8507 |
Eunomia | 1 | 8532001 | gender = FEMALE | 1 | 8532 |
Eunomia | 1 | 1901 | Charlson index - Romano adaptation | 901 | 0 |
Eunomia | 1 | 1002 | age in years | 2 | 0 |
databaseId | runId | analysisId | analysisName | domainId | startDay | endDay | isBinary | missingMeansZero |
---|---|---|---|---|---|---|---|---|
Eunomia | 1 | 1 | DemographicsGender | Demographics | NA | NA | Y | NA |
Eunomia | 1 | 901 | CharlsonIndex | Condition | NA | 0 | N | Y |
Eunomia | 1 | 2 | DemographicsAge | Demographics | NA | NA | N | Y |
To run a ‘Dechallenge Rechallenge’ analysis you need to create a
setting object using createDechallengeRechallengeSettings
.
This requires specifying:
Using the Eunomia data were we previous generated four cohorts, we can use cohort ids 1,2 and 4 as the targetIds and cohort id 3 as the outcomeIds:
exampleTargetIds <- c(1, 2, 4)
exampleOutcomeIds <- 3
If we want to create the dechallenge rechallenge for all our target cohorts and our outcome cohort with a 30 day dechallengeStopInterval and 31 day dechallengeEvaluationWindow:
exampleDechallengeRechallengeSettings <- createDechallengeRechallengeSettings(
targetIds = exampleTargetIds,
outcomeIds = exampleOutcomeIds,
dechallengeStopInterval = 30,
dechallengeEvaluationWindow = 31
)
We can then run the analysis on the Eunomia data using
computeDechallengeRechallengeAnalyses
and the settings
previously specified:
dc <- computeDechallengeRechallengeAnalyses(
connectionDetails = connectionDetails,
targetDatabaseSchema = "main",
targetTable = "cohort",
dechallengeRechallengeSettings = exampleDechallengeRechallengeSettings,
databaseId = "Eunomia"
)
## Inputs checked
## Connecting using SQLite driver
## Computing dechallenge rechallenge results
## | | | 0% | |============ | 17% | |======================= | 33% | |=================================== | 50% | |=============================================== | 67% | |========================================================== | 83% | |======================================================================| 100%
## Executing SQL took 0.00901 secs
## Computing dechallenge rechallenge for 3 target ids and 1outcome ids took 0.138 secs
If you would like to save the results you can use the function
saveDechallengeRechallengeAnalyses
and this can then be
loaded using loadDechallengeRechallengeAnalyses
.
The results are Andromeda objects that can we viewed using
dplyr
. There is just one table named
dechallengeRechallenge:
databaseId | dechallengeStopInterval | dechallengeEvaluationWindow | targetCohortDefinitionId | outcomeCohortDefinitionId | numExposureEras | numPersonsExposed | numCases | dechallengeAttempt | dechallengeFail | dechallengeSuccess | rechallengeAttempt | rechallengeFail | rechallengeSuccess | pctDechallengeAttempt | pctDechallengeSuccess | pctDechallengeFail | pctRechallengeAttempt | pctRechallengeSuccess | pctRechallengeFail |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Next it is possible to computer and extract the failed rechallenge cases
failed <- computeRechallengeFailCaseSeriesAnalyses(
connectionDetails = connectionDetails,
targetDatabaseSchema = "main",
targetTable = "cohort",
dechallengeRechallengeSettings = exampleDechallengeRechallengeSettings,
outcomeDatabaseSchema = "main",
outcomeTable = "cohort",
databaseId = "Eunomia"
)
## Inputs checked
## Connecting using SQLite driver
## Computing dechallenge rechallenge results
## | | | 0% | |============ | 17% | |======================= | 33% | |=================================== | 50% | |=============================================== | 67% | |========================================================== | 83% | |======================================================================| 100%
## Executing SQL took 0.0966 secs
## Computing dechallenge failed case series for 3 target IDs and 1 outcome IDs took 0.202 secs
The results are Andromeda objects that can we viewed using
dplyr
. There is just one table named
rechallengeFailCaseSeries:
databaseId | dechallengeStopInterval | dechallengeEvaluationWindow | targetCohortDefinitionId | outcomeCohortDefinitionId | personKey | subjectId | dechallengeExposureNumber | dechallengeExposureStartDateOffset | dechallengeExposureEndDateOffset | dechallengeOutcomeNumber | dechallengeOutcomeStartDateOffset | rechallengeExposureNumber | rechallengeExposureStartDateOffset | rechallengeExposureEndDateOffset | rechallengeOutcomeNumber | rechallengeOutcomeStartDateOffset |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
To run a ‘Time-to-event’ analysis you need to create a setting object
using createTimeToEventSettings
. This requires
specifying:
exampleTimeToEventSettings <- createTimeToEventSettings(
targetIds = exampleTargetIds,
outcomeIds = exampleOutcomeIds
)
We can then run the analysis on the Eunomia data using
computeTimeToEventAnalyses
and the settings previously
specified:
tte <- computeTimeToEventAnalyses(
connectionDetails = connectionDetails,
cdmDatabaseSchema = "main",
targetDatabaseSchema = "main",
targetTable = "cohort",
timeToEventSettings = exampleTimeToEventSettings,
databaseId = "Eunomia"
)
## Connecting using SQLite driver
## Uploading #cohort_settings
##
## Inserting data took 0.00793 secs
## Computing time to event results
## | | | 0% | |=== | 4% | |====== | 8% | |========= | 12% | |============ | 17% | |=============== | 21% | |================== | 25% | |==================== | 29% | |======================= | 33% | |========================== | 38% | |============================= | 42% | |================================ | 46% | |=================================== | 50% | |====================================== | 54% | |========================================= | 58% | |============================================ | 62% | |=============================================== | 67% | |================================================== | 71% | |==================================================== | 75% | |======================================================= | 79% | |========================================================== | 83% | |============================================================= | 88% | |================================================================ | 92% | |=================================================================== | 96% | |======================================================================| 100%
## Executing SQL took 0.0525 secs
## Computing time-to-event for T-O pairs took 0.22 secs
If you would like to save the results you can use the function
saveTimeToEventAnalyses
and this can then be loaded using
loadTimeToEventAnalyses
.
The results are Andromeda objects that can we viewed using
dplyr
. There is just one table named timeToEvent:
## Selecting by timeScale
databaseId | targetCohortDefinitionId | outcomeCohortDefinitionId | outcomeType | targetOutcomeType | timeToEvent | numEvents | timeScale |
---|---|---|---|---|---|---|---|
Eunomia | 1 | 3 | first | After last target end | 30 | 109 | per 30-day |
Eunomia | 1 | 3 | first | After last target end | 60 | 114 | per 30-day |
Eunomia | 1 | 3 | first | After last target end | 90 | 132 | per 30-day |
Eunomia | 2 | 3 | first | After last target end | 30 | 46 | per 30-day |
Eunomia | 2 | 3 | first | After last target end | 60 | 39 | per 30-day |
Eunomia | 2 | 3 | first | After last target end | 90 | 39 | per 30-day |
Eunomia | 4 | 3 | first | After last target end | 30 | 155 | per 30-day |
Eunomia | 4 | 3 | first | After last target end | 60 | 153 | per 30-day |
Eunomia | 4 | 3 | first | After last target end | 90 | 171 | per 30-day |
Eunomia | 1 | 3 | first | After last target end | 365 | 355 | per 365-day |
Eunomia | 2 | 3 | first | After last target end | 365 | 124 | per 365-day |
Eunomia | 4 | 3 | first | After last target end | 365 | 479 | per 365-day |
If you want to run multiple analyses (of the three previously shown)
you can use createCharacterizationSettings
. You need to
input a list of each of the settings (or NULL if you do not want to run
one type of analysis). To run all the analyses previously shown in one
function:
characterizationSettings <- createCharacterizationSettings(
timeToEventSettings = list(
exampleTimeToEventSettings
),
dechallengeRechallengeSettings = list(
exampleDechallengeRechallengeSettings
),
aggregateCovariateSettings = list(
exampleAggregateCovariateSettings
)
)
# save the settings using
saveCharacterizationSettings(
settings = characterizationSettings,
saveDirectory = file.path(tempdir(), "saveSettings")
)
# the settings can be loaded
characterizationSettings <- loadCharacterizationSettings(
saveDirectory = file.path(tempdir(), "saveSettings")
)
runCharacterizationAnalyses(
connectionDetails = connectionDetails,
cdmDatabaseSchema = "main",
targetDatabaseSchema = "main",
targetTable = "cohort",
outcomeDatabaseSchema = "main",
outcomeTable = "cohort",
characterizationSettings = characterizationSettings,
saveDirectory = file.path(tempdir(), "example"),
tablePrefix = "c_",
databaseId = "1"
)
This will create an SQLITE database with all the analyses saved into the saveDirectory. You can export the results as csv files using:
connectionDetailsT <- DatabaseConnector::createConnectionDetails(
dbms = "sqlite",
server = file.path(tempdir(), "example", "sqliteCharacterization", "sqlite.sqlite")
)
exportDatabaseToCsv(
connectionDetails = connectionDetailsT,
resultSchema = "main",
targetDialect = "sqlite",
tablePrefix = "c_",
saveDirectory = file.path(tempdir(), "csv")
)