clinChar_demo.Rmd
When conducting real-world evidence (RWE) studies, analysts often
need to populate table-shells defined by clinical colleagues. In
principle, these table-shells should be simple to generate, as they
consist primarily of counts and percentages for categorical variables
and summary statistics for continuous variables. Surprisingly, this task
proves to be non-trivial when using the available OHDSI tools. The best
available tool for characterization is FeatureExtraction
which is a tool designed to extract as many clinical features as
possible to use as covariates for high-dimensional modelling. While
helpful for modelling, populating table-shells is awkward because it
pulls all concepts from a domain and performs characterization at the
individual concept level. One could also use the web application ATLAS
to characterize, but it does not fit programmatic driven pipelines. With
some of these gaps in mind, we designed a new OHDSI tool titled
ClinicalCharacteristics
, which is a table shell approach to
OMOP characterization.
A characterization requires the identification of a clinical event during a specified window of time relative to the index date of the cohort of interest. For example, if we wish to look at baseline characteristics, we anchor our enumeration on the date at which the person enters the cohort and consider a window of time as baseline (say -365 days to -1 days relative to index). There are a few assumptions we need to make from this construction: a) what type of event are we looking for, b) when are we looking, relative to index, and must it be during observed time and c) how do we want to summarize this information.
ClinicalCharacteristics
can summarize the following
types of OHDSI objects:
Beyond types of domains to summarize, there are different ways of
performing an aggregation of information.
ClinicalCharacteristics
can create tables based on these
value types:
Another consideration is whether to summarize occurrence based on any
time or observed time. When we define a time window for characterization
this does not consider whether the persons were under continuous
observation. To add this feature in we need to specify to count only
persons who were actively followed during the interval in both the
numerator and denominator. ClinicalCharacteristics
specifies if we enumerate at any time (naively apply
the time window) or observed (apply the time window in
the context of observed time). It is recommended to use
any time for most characterization, as it is the
simplest to interpret.
As do all OHDSI tools we need to setup the database credentials of
the dbms hosting the OMOP data we wish to use with
ClinicalCharacteristics
. We use the HADES package DatabaseConnector
to create a connectionDetails
object that can be used to
establish a database connection. Below we provide an example of setting
up connection details, where users can find more references from the
DatabaseConnector
documentation.
# set a environment variable defining the jar folder path
#Sys.setenv("DATABASECONNECTOR_JAR_FOLDER" = "c:/temp/jdbcDrivers")
# download the jdbc driver
DatabaseConnector::downloadJdbcDrivers("postgresql")
connectionDetails <- DatabaseConnector::createConnectionDetails(
dbms = "postgresql",
user = "joe",
password = "secret",
connectionString = "jdbc:postgresql://localhost:5432/postgres"
)
# test the connection
con <- DatabaseConnector::connect(connectionDetails)
DatabaseConnector::disconnect(con)
Something we are introducing within the
ClinicalCharacteristics
package is a class called
ExecutionSettings
. This object maintains all the settings
needed to execute a query on your dbms. Key parameters include:
In ClinicalCharacteristics
we can set up our execution
settings object as follows:
library(ClinicalCharacteristics)
executionSettings <- createExecutionSettings(
connectionDetails = connectionDetails,
cdmDatabaseSchema = "omop_schema",
workDatabaseSchema = "my_schema",
tempEmulationSchema = "my_schema",
cohortTable = "my_cohorts",
cdmSourceName = "my_omop"
)
The convenience of this object is that it serves as a container to pass parameters to steps that execute the sql. Note that users also do not need to worry about opening and closing connections when using these tools.
Replacing parameters in the executionSettings object is done through simple assignment:
executionSettings$cdmSourceName <- "my_other_omop"
executionSettings$workDatabaseSchema <- "my_other_schema"
ClinicalCharacteristics
works under the assumption that
cohorts have already been instantiated in the cohort table. For new
OHDSI users, please review the CohortGenerator
package
on how to instantiate cohorts in the cohort table.
Our key distinguishing feature to the FeatureExtraction
package is the ability to characterize using concept sets as opposed to
individual concept ids. There are two ways to import concept sets into
the ClinicalCharacteristics
work flow: a) import from json
or b) create on the fly. Both methods apply the R package Capr
.
Users can export a concept set definition json from ATLAS and save as a file with json extension. To import the concept set into R as an object do as follows:
conceptSetFile <- here::here("my_concept_set.json")
cs1 <- Capr::readConceptSet(
path = conceptSetFile,
name = "My Concept Set"
)
We recommend using this method if you have complex concept sets.
Users can also use Capr
to create a simple concept set
using the standard concept id of interest.
cs2 <- Capr::cs(Capr::descendants(201826), name = "t2d")
This function creates a concept set class in Capr. We take the standard concept for type 2 diabetes and include its descendants to make a concept set. We recommend this method when building simple concept sets.
Now that we have setup the requirements for
ClinicalCharacteristics
, we can begin to use the program to
develop table shells. The goal is to populate a TableShell
class object with the details of how you want to populate your table.
The key elements of the table shell are: the title, the target cohorts
to use and the table line items.
First we must specify which cohorts we want to characterize in our shell. Assuming they have been instantiated in a specific cohort table, we can load them as follows:
targetCohorts <- list(
createCohortInfo(id = 1, name = "T2D"),
createCohortInfo(id = 2, name = "T1D")
)
We list all the cohorts we want to use. You may add as many
instantiated cohorts as you would like into this list and
ClinicalCharacteristics
will render the same table shell
per target cohort.
Each element of the table is called a line item. The line item uniquely summarizes events with a particular statistic type at a specified time window. For example, if we want to characterize CKD we need to specify the line item:
ckd_cs <- Capr::cs(Capr::descendants(46271022), name = "ckd")
createConceptSetLineItem(
sectionLabel = "CKD: Baseline",
domain = "condition_occurrence",
statistic = anyPresenceStat(),
conceptSet = ckd_cs,
timeInterval = tw1
)
The section label specifies how we want to identify the line item within the table as a section. The domain must specify which domain table in the CDM we should check for events. Because CKD is a condition, we want to check the condition_occurrence table. If we were characterizing drugs we would check the drug_exposure table. Review the OMOP CDM to become familiar with the different types of domains.
The statistic determines how to summarize the event; here are examples:
PresenceStat
: determine who had an occurrence of an
event during an intervalCountCtsStat
: determine the continuous distribution of
an event during an intervalCountBreaksStat
: convert the count to breaks of an
event during an intervalTimeToFirst
: determine the continuous distribution of
the time to the occurrence of the first eventTo distinguish between any and
observed, add the keyword as a prefix
(i.e. anyPresenceStat()
or
observedPresenceStat()
).
Let’s use the following details for an example table shell, assuming a cohort for type 2 diabetes has been created:
library(ClinicalCharacteristics)
# Define Execution Settings
executionSettings <- createExecutionSettings(
connectionDetails = connectionDetails,
cdmDatabaseSchema = "omop_schema",
workDatabaseSchema = "my_schema",
tempEmulationSchema = "my_schema",
cohortTable = "my_cohorts",
cdmSourceName = "my_omop"
)
# create each concept set
ckd_cs <- Capr::cs(Capr::descendants(46271022), name = "ckd")
hf_cs <- Capr::cs(Capr::descendants(316139), name = "hf")
sglt2_cs <- Capr::cs(Capr::descendants(1123627), name = "sglt2")
# make each time window
tw1 <- timeInterval(lb = -365, rb = -1)
tw2 <- timeInterval(lb = 0, rb = 90)
# define table shell
ts <- createTableShell(
title = "Demo 1",
targetCohorts = list(
createCohortInfo(id = 1, name = "Type 2 Diabetes")
),
lineItems = lineItems(
createDemographicLineItem(maleGender()),
createDemographicLineItem(ageCharCts()),
createConceptSetLineItem(
sectionLabel = "CKD: Baseline",
domain = "condition_occurrence",
statistic = anyPresenceStat(),
conceptSet = ckd_cs,
timeInterval = tw1
),
createConceptSetLineItem(
sectionLabel = "HF: Baseline",
domain = "condition_occurrence",
statistic = anyPresenceStat(),
conceptSet = hf_cs,
timeInterval = tw1
),
createConceptSetLineItem(
sectionLabel = "SGLT2: Post Index",
domain = "drug_exposure",
statistic = anyPresenceStat(),
conceptSet = sglt2_cs,
timeInterval = tw2
)
)
)
res <- generateTableShell(ts, executionSettings)
res$categorical
res$continuous
Line items are an important feature of
ClinicalCharacteristics
in order to track the table shell
in a consistent matter. However, from a UI perspective this becomes very
“typing” intensive. So we have added certain convenience functions to
make it easier to design the table shell. Taking the same example from
above, what if we added the following to our table shell:
library(ClinicalCharacteristics)
# Define Execution Settings
executionSettings <- createExecutionSettings(
connectionDetails = connectionDetails,
cdmDatabaseSchema = "omop_schema",
workDatabaseSchema = "my_schema",
tempEmulationSchema = "my_schema",
cohortTable = "my_cohorts",
cdmSourceName = "my_omop"
)
# create each concept set
condition_cs <- list(
Capr::cs(Capr::descendants(46271022), name = "ckd"),
Capr::cs(Capr::descendants(316139), name = "hf")
)
drug_cs <- list(
Capr::cs(Capr::descendants(1123627), name = "sglt2"),
Capr::cs(Capr::descendants(1123618), name = "glp1")
)
# make each time window
tw1 <- list(
timeInterval(lb = -365, rb = -1)
)
tw2 <- list(
timeInterval(lb = 0, rb = 90),
timeInterval(lb = 0, rb = 365)
)
# define table shell
ts2 <- createTableShell(
title = "Demo 1",
targetCohorts = list(
createCohortInfo(id = 1, name = "Type 2 Diabetes")
),
lineItems = lineItems(
addDefaultGenderLineItems(),
addDefaultEthnicityLineItems(),
createDemographicLineItem(raceCategory()),
createDemographicLineItem(ageCharCts()),
createDemographicLineItem(ageCharBreaks(breaks = age5yrGrp())),
createConceptSetLineItemBatch(
sectionLabel = "Baseline Conditions",
domain = "condition_occurrence",
statistic = anyPresenceStat(),
conceptSets = condition_cs,
timeIntervals = tw1
),
createConceptSetLineItemBatch(
sectionLabel = "Post-Index Drugs",
domain = "drug_exposure",
statistic = anyPresenceStat(),
conceptSets = drug_cs,
timeIntervals = tw2
)
)
)
res <- generateTableShell(ts2, executionSettings)
res$categorical
res$continuous
In this version of the table shell we have grouped drugs and conditions as a batch to deploy all combinations of the concept set and time interval combinations. As for the demographics we add a convenience function to include all basic gender items (male and female), ethnicity categories (Hispanic, not Hispanic, and not reported), and race items (“American Indian or Alaska Native”, “Asian”, “Black or African American”, “Native Hawaiian or Other Pacific Islander”, “White”, and “Unknown or Other”). In addition to the continuous age statistic from the example above, we apply our first instance of a breaks strategy, turning age into 5-year age groups. A convenience function is used that automatically sets the breaks for the user.
Note: when a user generates the table shell a log of the shell is printed out for review. Users may also review the table shell meta information to understand the aspects of each line item using the following method of the table shell class, which returns a tibble of information.
ts2$getTableShellMeta()
When using multiple concepts and/or multiple time intervals within a line item, we must use one of the Batch functions:
createCohortLineItemBatch
createConceptSetLineItemBatch
createSourceConceptSetLineItemBatch
The batch functions create combinations of each of the concept sets
and/or time intervals for the corresponding statistic. For example, the
createConceptSetLineItemBatch
for “Post-Index Drugs” in
table shell ts2 above will calculate anyPresenceStat
measures for every combination of the two drug concept sets in drug_cs,
“sglt2” and “glp1”, and the two time intervals in tw2. This results in 4
rows of measures in total for “Post-Index Drugs”.
Sometimes we are interested in cohorts as items to characterize in a table shell. For example, say we wanted to describe persons with heart failure who also have ckd (at baseline). This is more easily done as a cohort rather than a concept set because we have to apply logic to the population.
In order to use cohorts in characterization, users must generate the cohort in a cohort table in their workDatabaseSchema prior to usage. Assuming that is done, adding this feature into the table shell is quite simple.
library(ClinicalCharacteristics)
# Define Execution Settings
executionSettings <- createExecutionSettings(
connectionDetails = connectionDetails,
cdmDatabaseSchema = "omop_schema",
workDatabaseSchema = "my_schema",
tempEmulationSchema = "my_schema",
cohortTable = "my_cohorts",
cdmSourceName = "my_omop"
)
# create each concept set
condition_cs <- list(
Capr::cs(Capr::descendants(46271022), name = "ckd"),
Capr::cs(Capr::descendants(316139), name = "hf")
)
drug_cs <- list(
Capr::cs(Capr::descendants(1123627), name = "sglt2"),
Capr::cs(Capr::descendants(1123618), name = "glp1")
)
covCohorts <- list(
createCohortInfo(id = 3, name = "HF with CKD at baseline"),
createCohortInfo(id = 4, name = "Obesity with CKD at baseline")
)
# make each time window
tw1 <- list(
timeInterval(lb = -365, rb = -1)
)
tw2 <- list(
timeInterval(lb = 0, rb = 90),
timeInterval(lb = 0, rb = 365)
)
# define table shell
ts3 <- createTableShell(
title = "Demo 1",
targetCohorts = list(
createCohortInfo(id = 1, name = "Type 2 Diabetes")
),
lineItems = lineItems(
addDefaultGenderLineItems(),
addDefaultEthnicityLineItems(),
createDemographicLineItem(raceCategory()),
createDemographicLineItem(ageCharCts()),
createDemographicLineItem(ageCharBreaks(breaks = age5yrGrp())),
createConceptSetLineItemBatch(
sectionLabel = "Baseline Conditions",
domain = "condition_occurrence",
statistic = anyPresenceStat(),
conceptSets = condition_cs,
timeIntervals = tw1
),
createConceptSetLineItemBatch(
sectionLabel = "Post-Index Drugs",
domain = "drug_exposure",
statistic = anyPresenceStat(),
conceptSets = drug_cs,
timeIntervals = tw2
),
createCohortLineItemBatch(
sectionLabel = "Baseline Cohorts",
covariateCohorts = covCohorts,
cohortTable = "my_cohort_table",
timeIntervals = tw1,
statistic = anyPresenceStat()
)
)
)
res <- generateTableShell(ts3, executionSettings)
res$categorical
res$continuous
We use createCohortInfo
to make CohortInfo class
objects, then assign the CohortInfo objects to the variable covCohorts
as a list. Then we set the covariateCohorts parameter of
createCohortLineItemBatch
equal to our list of CohortInfo
objects (called covCohorts in the above example). In the
createCohortlineItem
or
CreateCohortLineItemBatch
calls we must specify into which
table these cohorts are instantiated.
Sometimes clinical events can be described in multiple domains. Smoking, for example, has concepts in the observation and procedure occurrence tables. It is easiest to handle multiple domains as a concept set group where we specify each domain in which to look for our concept set. The code below creates a concept set group line item that can be added to a table shell.
smoking_cs <- list(
Capr::cs(Capr::descendants(761009), name = "Smoking Assessment"),
Capr::cs(Capr::descendants(44804450), name = "Smoking Cessation"),
)
csg <- createConceptSetGroupLineItem(
groupLabel = "Smoking",
conceptSets = smoking_cs,
domainTables = c("procedure_occurrence", "observation"),
statistic = anyPresenceStat(),
timeInterval = timeInterval(lb = -365, rb = -1)
)
Notice we specify the domain for each element of the concept set list. When enumerating this will collapse into one category instead of two items.
It is very common to want to convert a continuous summary into a categorical summary. Say we want to know how many persons took 1, 2, 3, or 4+ of a medication like sglt2 or glp1s, instead of how many took at least one of these drugs at any time. To do this we must make a breaks strategy.
drug_cs <- list(
Capr::cs(Capr::descendants(1123627), name = "sglt2"),
Capr::cs(Capr::descendants(1123618), name = "glp1")
)
tw2 <- list(
timeInterval(lb = 0, rb = 90),
timeInterval(lb = 0, rb = 365)
)
br <- newValueBreaks(
name = "Medication Groups",
breaks = list(1,2,3,4),
labels = c("one", "two", "three", "four or more")
)
drugBreaks <- createConceptSetLineItemBatch(
sectionLabel = "Drug CountBreaks",
domain = "drug_exposure",
statistic = anyCountBreaksStat(breaks = br),
conceptSets = drug_cs,
timeIntervals = tw2
)
Once you have built this line item batch it can be added into the
table shell. Instead of providing a continuous summary, these will now
be broken into categories as specified in the
newValueBreaks
function. If we actually wanted the
continuous summary the code would look as follows:
drug_cs <- list(
Capr::cs(Capr::descendants(1123627), name = "sglt2"),
Capr::cs(Capr::descendants(1123618), name = "glp1")
)
tw2 <- list(
timeInterval(lb = 0, rb = 90),
timeInterval(lb = 0, rb = 365)
)
drugCts <- createConceptSetLineItemBatch(
sectionLabel = "Drug Count Continuous",
domain = "drug_exposure",
statistic = anyCountCtsStat(),
conceptSets = drug_cs,
timeIntervals = tw2
)
ClinicalCharacteristics
works by serializing the input
of the TableShell object into a sql script that is run against the
database connection. Sometimes users may want to review the sql to
either check its quality or modify it slightly. We provide a function to
save the script designed by the program as a .sql
that can
be opened in any code or text editor.
reviewTableShellSql(ts, executionSettings, saveName = "my_ts", savePath = here::here("output"))
In certain circumstances, researchers might need to use source concepts instead of standardized codes. However, the recommendation is always to use standardized codes.
On occasion we can find the OHDSI concept id using the function
lookupSourceConcepts
. For instance, we would like to use
the ICD10CM codes: ‘I45.81’, ‘I47’, ‘I49’, ’I49.40 to build a
Cardiac Arrhythmia concept set as follows:
ca_source <- lookupSourceConcepts(
codes = c('I45.81', 'I47', 'I49', 'I49.40'),
vocabulary = "ICD10CM",
executionSettings
) |>
sourceConceptSet(name = "Cardiac Arrhythmia")
Then the table shell will look like the following:
tw1 <- timeInterval(lb = -365, rb = -1)
tssc <- createTableShell(
title = "Demo 1",
targetCohorts = list(
createCohortInfo(id = 1, name = "Type 2 Diabetes")
),
lineItems = lineItems(
addDefaultGenderLineItems(),
addDefaultEthnicityLineItems(),
createDemographicLineItem(ageCharCts()),
createDemographicLineItem(ageCharBreaks(breaks = age5yrGrp())),
createSourceConceptSetLineItem(
sectionLabel = "Source Concept Conditions Cardiac Arrhythmia",
domain = "condition_occurrence",
sourceConceptSets = ca_source,
timeIntervals = tw1,
statistic = anyPresenceStat()
)
)
)
It is also possible to build the source concept set in ATLAS in a Json format. Then, we can read the file using Capr’s functions as follows:
ca_source <- Capr::readConceptSet(path = paste0(inputFolder, "Cardiac Arrhythmia_in_icd.json"), name = "Cardiac Arrhythmia")
ca_source <- Capr::as.data.frame(ca_source) |>
dplyr::select(conceptId, conceptName, conceptCode, vocabularyId) |>
sourceConceptSet(name = "Cardiac Arrhythmia")