Cohorts.Rmd
Cohort identification within real-world evidence (RWE) analyses of data adhering to the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) is a foundational step in understanding healthcare outcomes and treatment effects. Observational health data, sourced from diverse sources including Electronic Health Records (EHRs), health insurance claims, registries, and patient-generated data, offer valuable insights into patient health status and healthcare delivery. However, these data were not originally collected for research purposes, leading to the need for sophisticated methods to infer relevant clinical information for research objectives. This summary explores the importance of cohort identification, the methods endorsed by the Observational Health Data Sciences and Informatics (OHDSI) community, and the tools available for creating cohorts within OMOP CDM datasets.
Cohort identification is crucial for conducting meaningful analyses in healthcare research. A cohort, defined as a set of individuals meeting one or more inclusion criteria over a duration of time, serves as the basis for studying healthcare outcomes, treatment effects, and disease epidemiology. In the context of OHDSI research, cohorts are fundamental building blocks used for executing research questions, developing phenotypes, and conducting comparative effectiveness studies. OHDSI’s approach emphasizes the independence and reusability of cohort definitions, allowing researchers to define cohorts tailored to specific research questions while ensuring consistency and reproducibility across studies.
OHDSI endorses standardized methods for cohort definition, ensuring transparency, and reproducibility in research. Two main approaches are employed for constructing cohorts:
Rule-based cohort definitions rely on explicitly stated inclusion criteria to define cohort membership. These criteria are typically based on domain expertise and clinical knowledge, allowing researchers to specify cohort attributes such as clinical conditions, procedures, medications, and temporal relationships. OHDSI provides standardized components for assembling these criteria, including domains, concept sets, domain-specific attributes, and temporal logic.
Probabilistic cohort definitions leverage machine learning techniques to compute the probability of cohort inclusion based on patient characteristics and clinical events. These models are trained on example data to automatically identify relevant patient characteristics predictive of cohort membership. The resulting probabilities can be used to classify patients into cohorts or as inputs for certain study designs.
OHDSI offers a suite of open-source tools to support cohort identification within OMOP CDM datasets, such as:
In the Cohorts tab of the OHDSI Analysis Viewer, there are 3 main sections, each with their own tab, the user can explore: