Introduction

In the realm of observational research, where data heterogeneity and complexity are common, assessing and diagnosing the characteristics of cohorts is fundamental to ensuring the reliability and credibility of research findings. This is also an essential step in phenotype development. The OHDSI community has developed an R package, CohortDiagnostics, which provides researchers with a systematic approach to examine various facets of cohorts, enabling them to identify potential biases, assess data completeness, and validate the suitability of cohorts for analysis. This tool is crucial for researchers working within the Observational Health Data Sciences and Informatics (OHDSI) ecosystem, enabling them to ensure the accuracy and reliability of cohort definitions through a detailed examination of incidence rates, cohort characteristics, and the specific codes triggering cohort inclusion criteria. CohortDiagnostics streamlines the process of cohort evaluation by:

  1. Generating a broad spectrum of diagnostics against a CDM database - see more details here: Features and Functionalities
  2. Providing an interactive R Shiny application within the package for an intuitive exploration and visualization of these diagnostics. For more information on R Shiny, see here.

Features and Functionalities

CohortDiagnostics offers a suite of features designed to deepen the understanding of cohort dynamics and the intricacies of cohort definitions, including:

  1. Cohort Definition: Facilitates the examination and validation of the logic behind cohort definitions, ensuring they accurately capture the intended population.
  2. Concepts in Data Source: Identifies the specific concepts present within the data source that are relevant to the cohort definitions, enabling a deeper understanding of data coverage and content.
  3. Orphan Concepts: Highlights concepts that, despite their relevance, are not captured within a cohort’s definition. This helps in refining concept sets and cohort criteria to ensure comprehensiveness and relevance.
  4. Cohort Counts: Provides counts of individuals and records within cohorts, offering a basic measure of cohort size and scope.
  5. Incidence Rate: Calculates the incidence rate of cohorts, stratified by various demographic and temporal factors such as age, sex, and calendar year, to assess the frequency of patients/records in the cohort and potential patterns over these strata.
  6. Time Distributions: Examines the distribution of time-related variables within cohorts, such as observation time before and after cohort index date as well as cohort duration, offering insights into cohort dynamics over time and available observation time.
  7. Index Event Breakdown: Breaks down the specific events that qualify individuals for cohort inclusion, providing clarity on how inclusion criteria are met.
  8. Visit Context: Analyzes the healthcare context (e.g., inpatient, outpatient) of the index events, offering insights into where and how cohort members are identified within the healthcare system.
  9. Cohort Overlap: Assesses the degree of overlap between cohorts, which can inform on potential biases, errors, or redundancies in cohort construction, as well as shared characteristics between cohorts of patients.
  10. Cohort Characterization: Characterizes cohorts by detailing prevalent conditions, medication use, procedures, and more, to understand the clinical profile of cohort members over various time periods relative to index.
  11. Compare Cohort Characterization: Enables the direct comparison of characteristics between cohorts, facilitating the identification of unique or shared features across different cohorts and across time points.
  12. Meta Data: Provides meta-information about the data and analyses conducted, ensuring transparency and reproducibility of the cohort diagnostics process.

Together, these features equip researchers with the tools necessary for a thorough examination of cohort definitions, enhancing the quality and reliability of observational health research.

Utility and Application

CohortDiagnostics significantly contributes to the field of observational health research by providing a robust framework for the evaluation and validation of cohort definitions. Its utility spans several critical areas:

  1. Enhancing Cohort Definition Confidence: By offering detailed diagnostics, CohortDiagnostics helps researchers refine their cohort definitions, ensuring they accurately capture the intended population. This is a critical step in phenotype development, which is a cornerstone of modern observational health data research.
  2. Identifying Data Quality Issues: Through the identification of orphan concepts and the detailed breakdown of index events, researchers can pinpoint data quality issues or gaps in cohort definitions. Iterating over multiple potential cohort definitions after analyzing these diagnostics is an encouraged and common practice.
  3. Facilitating the Ideas Behind Comparative Analyses: The package’s capabilities to characterize and compare cohorts, as well as to analyze cohort overlaps, are invaluable for researchers looking to understand the nuances and dynamics of their study populations. These diagnostics can help inform comparative studies in the future, after the cohorts and phenotypes are refined and finalized.
  4. Supporting Transparent Research: By enabling the listing of source codes, data source information, and providing a platform for detailed diagnostics exploration, CohortDiagnostics fosters a culture of transparency and reproducibility in observational research.