
Phenotype diagnostics
a01_PhenotypeDiagnostics.Rmd
Introduction: Run PhenotypeDiagnostics
In this vignette, we are going to present how to run
PhenotypeDiagnostics()
. We are going to use the following
packages and mock data:
library(CohortConstructor)
library(PhenotypeR)
library(dplyr)
con <- DBI::dbConnect(duckdb::duckdb(),
CDMConnector::eunomiaDir("synpuf-1k", "5.3"))
cdm <- CDMConnector::cdmFromCon(con = con,
cdmName = "Eunomia Synpuf",
cdmSchema = "main",
writeSchema = "main",
achillesSchema = "main")
cdm
Note that we have included achilles tables in our cdm reference, which will be used to speed up some of the analyses.
Create a cohort
First, we are going to use the package CohortConstructor to generate three cohorts of warfarin, acetaminophen and morphine users.
# Create a codelist
codes <- list("warfarin" = c(1310149, 40163554),
"acetaminophen" = c(1125315, 1127078, 1127433, 40229134, 40231925, 40162522, 19133768),
"morphine" = c(1110410, 35605858, 40169988))
# Instantiate cohorts with CohortConstructor
cdm$my_cohort <- conceptCohort(cdm = cdm,
conceptSet = codes,
exit = "event_end_date",
overlap = "merge",
name = "my_cohort")
Run PhenotypeDiagnostics
Now we will proceed to run phenotypeDiagnotics()
. This
function will run the following analyses:
- Database diagnostics: This includes information about the size of the data, the time period covered, the number of people in the data, and other meta-data of the CDM object. See Database diagnostics vignette for more details.
- Codelist diagnostics: This includes information on the concepts included in our cohorts’ codelist. See Codelist diagnostics vignette for further details.
- Cohort diagnostics: This summarises the attrition of our cohorts, as well as overlapping between cohorts. See Cohort diagnostics vignette for further details.
- Matched diagnostics: This matched our study cohorts to people with similar age and sex in the database and performs a large-scale characterisation on both. See Matched diagnostics vignette for further details.
- Population diagnostics: Calculates the frequency of our study cohorts in the database in terms of their incidence rates and prevalence. See Population diagnostics vignette for further details.
We can specify which analysis we want to perform by setting to TRUE or FALSE each one of the corresponding arguments:
result <- phenotypeDiagnostics(
cohort = cdm$my_cohort,
databaseDiagnostics = TRUE,
codelistDiagnostics = TRUE,
cohortDiagnostics = TRUE,
populationDiagnostics = TRUE,
populationSample = 1e+06,
populationDateRange = as.Date(c(NA, NA)),
matchedDiagnostics = TRUE,
matchedSample = 1000
)
result |> glimpse()
#> Rows: 492,502
#> Columns: 13
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,…
#> $ cdm_name <chr> "Eunomia Synpuf", "Eunomia Synpuf", "Eunomia Synpuf",…
#> $ group_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ group_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name <chr> "general", "general", "observation_period", "cdm", "g…
#> $ variable_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ estimate_name <chr> "snapshot_date", "person_count", "count", "source_nam…
#> $ estimate_type <chr> "date", "integer", "integer", "character", "character…
#> $ estimate_value <chr> "2025-03-28", "1000", "1048", "Synpuf", "v5.0 06-AUG-…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
Notice that we have three additional arguments:
-
populationSample
: It allows to specify a number of people that randomly will be extracted from the CDM to perform the Population diagnostics analysis. If NULL, all the participants in the CDM will be included. It helps to reduce the computational time. -
populationDateRange
: We can use it to specify the time period when we want to perform our Population diagnostics analysis. -
matchedSample
: Similar to populationSample, this arguments subsets a random sample of people to perform the Matching diagnostics.
Save the results
To save the results, we can use exportSummarisedResult function from omopgenerics R Package:
exportSummarisedResult(result, directory = here::here(), minCellCount = 5)
Visualisation of the results
Once we get our Phenotype diagnostics result, we can
use shinyDiagnostics
to easily create a shiny app and
visualise our results:
result <- shinyDiagnostics(result,
directory = tempdir(),
minCellCount = 5,
open = TRUE)
Notice that we have specified the minimum number of counts
(minCellCount
) for suppression to be shown in the shiny
app, and also that we want the shiny to be launched in a new R session
(open
). You can see the shiny app generated for this
example in here.See
Shiny
diagnostics vignette for a full explanation of the shiny app.