How to Use PhenotypeLibrary R Package

Installation

This is an installable R-package that may be installed as follows:

remotes::install_github("OHDSI/PhenotypeLibrary")

Retrieval

The list of cohort definitions available may be retrieved as follows:

PhenotypeLibrary::getPhenotypeLog()

#> # A tibble: 599 × 88
#>    cohortId cohortName        cohortNameAtlas cohortNameFormatted cohortNameLong
#>       <dbl> <chr>             <chr>           <chr>               <chr>         
#>  1        3 Cough or Sputum   [P] Cough or S… Cough or Sputum     Cough or Sput…
#>  2        4 Diarrhea          [P] Diarrhea    Diarrhea            Diarrhea      
#>  3        5 Dyspnea           [P] Dyspnea     Dyspnea             Dyspnea       
#>  4        6 Fever             [P] Fever       Fever               Fever         
#>  5        7 Headache or Head… [P] Headache o… Headache or Headac… Headache or H…
#>  6        8 Altered smell or… [P] Altered sm… Altered smell or t… Altered smell…
#>  7        9 Sore throat       [P] Sore throat Sore throat         Sore throat   
#>  8       10 Nausea or Vomiti… [P] Nausea or … Nausea or Vomiting  Nausea or Vom…
#>  9       11 Malaise and or f… [P] Malaise an… Malaise and or fat… Malaise and o…
#> 10       12 Rhinitis or comm… [P] Rhinitis o… Rhinitis or common… Rhinitis or c…
#> # ℹ 589 more rows
#> # ℹ 83 more variables: librarian <chr>, status <chr>, addedVersion <chr>,
#> #   logicDescription <chr>, hashTag <chr>, isCirceJson <dbl>,
#> #   contributors <chr>, contributorOrcIds <chr>,
#> #   contributorOrganizations <chr>, peerReviewers <chr>,
#> #   peerReviewerOrcIds <dbl>, recommendedReferentConceptIds <chr>,
#> #   ohdsiForumPost <chr>, createdDate <date>, modifiedDate <date>, …

You can extract one or more cohort definitions into a cohortDefinitionSet object as

cohortDefinitionSet <- PhenotypeLibrary::getPlCohortDefinitionSet(cohortIds = c(1, 2, 3))

cohortDefinitionSet

#> # A tibble: 2 × 4
#>   cohortId cohortName                                   json               sql  
#>      <dbl> <chr>                                        <chr>              <chr>
#> 1        2 COVID-19 diagnosis or SARS-CoV-2 test (1pos) "{\n\t\"cdmVersio… "CRE…
#> 2        3 Cough or Sputum                              "{\n\t\"cdmVersio… "CRE…

cohortDefinitionSet is now a data.frame with specifications for the cohort ids 1, 2 and 3. For cohorts that conform to OHDSI Circe specifications, the field json is the cohort json specification that may be posted into your Atlas instance. The SQL is the SQL rendered from the JSON. For cohorts that do not conform to OHDSI Circe specification, only the SQL is provided and the json is left empty.

Use

You can instantiate the cohorts in your environment as follows using (OHDSI/CohortGenerator)[https://github.com/OHDSI/CohortGenerator].

connectionDetails <-
  DatabaseConnector::createConnectionDetails(
    dbms = "postgresql",
    server = "some.server.com/ohdsi",
    user = "joe",
    password = "secret"
  )
cdmDatabaseSchema <- "cdm_synpuf"
cohortDatabaseSchema <- "scratch.dbo"
cohortTables <- CohortGenerator::getCohortTableNames()
CohortGenerator::generateCohortSet(
  connectionDetails = connectionDetails,
  cdmDatabaseSchema = cdmDatabaseSchema,
  cohortDatabaseSchema = cohortDatabaseSchema,
  cohortTableNames = cohortTables,
  cohortDefinitionSet = cohortDefinitionSet
)

You can also run cohort diagnostics on this cohortDefinitionSet object as follows:

databaseId <- "synpuf"

databaseName <-
  "Medicare Claims Synthetic Public Use Files (SynPUFs)"

databaseDescription <-
  "Medicare Claims Synthetic Public Use Files (SynPUFs) were created to allow interested parties to gain familiarity using Medicare claims data while protecting beneficiary privacy. These files are intended to promote development of software and applications that utilize files in this format, train researchers on the use and complexities of Centers for Medicare and Medicaid Services (CMS) claims, and support safe data mining innovations. The SynPUFs were created by combining randomized information from multiple unique beneficiaries and changing variable values. This randomization and combining of beneficiary information ensures privacy of health information."

CohortDiagnostics::executeDiagnostics(
  cohortDefinitionSet = cohortDefinitionSet,
  exportFolder = outputFolder,
  databaseId = databaseId,
  databaseName = databaseName,
  databaseDescription = databaseDescription,
  cohortDatabaseSchema = cohortDatabaseSchema,
  cdmDatabaseSchema = cdmDatabaseSchema,
  connectionDetails = connectionDetails,
  cohortTableNames = cohortTableNames
)

Gowtham A. Rao

2024-10-02

Installation

Retrieval

Use