vignettes/SettingKeeperParameters.Rmd
SettingKeeperParameters.Rmd
This vignette describes how one sets input parameters for Keeper to generate patient summaries.
Keeper extracts and summarizes patient level data for patients in a given cohort to enable examination of patient summaries. Such examination can be used to interatively develop a phenotype (cohort definition) of a disease or (primary use case) determine if patients have the disease and subsequently calculate positive predictive value. The review should be done by a person familiar with the disease of interest. Alternatively, it can be done by LLM, which additionally enables calculation of sensitivity and specificity. Please refer to the Using Keeper with LLMs vignette for the latter use case.
First step is to create a phenotype of interest and execute it against database(s). Cohorts can be created using ATLAS, R or SQL. The cohort table should contain cohort_id (number of cohort), subject_id (patient identifier), cohort_start_date and cohort_end_date and will be subsequently fed into KEEPER input. More information on creating cohorts can be found here. One can then specify a sample size to randomly select a given number of patients from the cohort (parameter sample_size) or input a comma-separated vector of person_ids to select specific patients (parameter personIds). One can further de-identify patients by replacing OMOP personId with new random ids (paramether assignNewId = TRUE).
KEEPER extracts data based on user input. If a code is found in patient data, KEEPER will extract it along with the date relative to the index date (cohort_start_date). Therefore, code selection is very important.
We will use example of Type I Diabetes Mellitus (T1DM) to illustrate a strategy for code selection. The first step is to create a clinical definition. For this exercise, we will use a brief version of it. T1DM is an autoimmune condition characterized by decreased production of insulin by pancreas. Common onset is in childhood or adolescence but can occurr in adults. Symptoms include weight loss, polyuria and polydipsia, fatigue and others. Common differential diagnoses (those that need to be ruled out) include type II diabetes, pancreatic disorders such as cystic fibrosis, pancreanecrosis, steroid-induced diabetes, renal glucosuria and other conditions. Diagnostic procedures include glucose measurements, C-peptide, pancreatic and insulin antibodies as well as HbA1C. It is primarily treated with insulin. Complications include hypo and hyperglycemia, neuropathy, nephrophaty, cerebrovascular disease and peripheral arthery disease. We will use this definition to construct our inputs. The notion of differential diagnosis is important as for each input except for disease of interest we will consider T1DM and differential diagnoses to be able to see evidence for and rule out other diagnoses.
The full iput we will use as an example looks as follows:
Doi or disease of interest is the disease or state of interest itself. Here, we select two concepts with their descendants:
First code is the code of T1DM itself and second code is the code denoting the diseases occurring due to T1DM which implies the patients have T1DM as well. Common strategy is to select codes used the index event criteria in the phenotype. If useAncestor is set to TRUE (default behaviour), KEEPER will use the hierarchy to pull in descendants of selected concepts.
DOI is looked up in CONDITION_OCCURRENCE table.
Here we input symptoms typically occurring in T1DM and differential diagnoses. These are signs and symptoms occuring in a short time window before the disease onset.
Based on our clinical definition, we selected the following codes:
These codes are SNOMED codes that represent broad codes for the symptoms we are intrested in and source codes of the corresponding conditions map either to them directly or to their descendants. The decendants will be pulled in as we set useAncestor to TRUE. A good approach for selecting codes for this section as well as for the subsequent sections is to input your term in Atlas Search and click on the green shopping cart (Phoebe initial code selection) to get the starting point and use Phoebe (Recommend tab in Atlas Concept Set tab) to explore recommendations. Instructions on how to use Phoebe can be found here. Alternatively, you can explore your data to find appropriate SNOMED codes for symptoms using string search. It should be noted that with local data exploration you are more likely to miss the relevant codes.
Symptoms are looked up in OBSERVATION and CONDITION_OCCURRENCE tables within 30 days prior to the index date.
For this and subsequent categories we want to select the codes relevant to the doi as well as to other alternative (competing, differential) diagnoses. For example, cough is a symptom of cystic fibrosis. Similarly, hereon observing symptoms and other categories for T1DM will increase our confidence in the diagnosis while observing symptoms and other catogories for differential diagnoses will decrease our confidence in the diagnosis of T1DM.
Comorbidities are conditions associated with the disease of interest or differential diagnoses. As opposed to symptoms, they can occurr within a longer time period before the disease onset.
For T1DM we selected the following comorbidities:
Comorbidities are looked up in OBSERVATION and CONDITION_OCCURRENCE tables any time prior to the index date.
We selected drugs (ancestor terms with descendants) that are used to treat T1DM as well as differential diagnoses:
Drugs are looked up in DRUG_ERA table any time prior and any time after (displayed as two different columns).
Diagnostic procedures are procedure codes used for diagnosis of the disease of interest or alternative disease(s). For T1DM we did not select any procedures. One could think of relevant procedures such as ultrasound of pancreas for pancreanecrosis or CT of lungs for cyctic fibrosis.
Diagnostic procedures are looked up in PROCEDURE_OCCURRENCE table within 30 days prior and after the index date.
Measurements are lab tests used to diagnose T1DM and diffential diagnoses:
Alternative diagnosis is where we put the diagnoses we rule out. As we already discused, differential for T1DM are the following conditions:
Alternative diagnosis codes are looked up in CONDITION_OCCURRENCE table within 90 days before and after the index date.
Treatment procedures are procedure codes corresponding to treatment of the disease of interest or alternative disease(s). We selected the following code denoting a broad group of procedures for pancrease partial or full removal for pancreonecrosis:
We did not identify any tratement procedures for T1DM.
Treatment procedures are looked up in PROCEDURE_OCCURRENCE table any time after the index date.
Complications are other conditions occurring due to the disease. We selected the following codes with descendants:
Complications are looked up in CONDITION_OCCURRENCE table any time before or after the index date (displayed as two separate colums).
The output and it looks as follows: