Export person level data from OMOP CDM tables for eligible persons in the cohort.

Use useAncestor = TRUE to switch from verbatim string of concept_ids vs ancestors. In latter case, the app will take you concept_ids and include them along with their descendants.

Use sampleSize to specify desired number of patients to be selected.

Use assignNewId = TRUE to replace person_id with a new sequence.

Explanation of categories:

instantiated cohort with patients of interest in COHORT table or in another table that has the same fields as COHORT;
doi: string for disease of interest (ex.: diabetes type I). Hereon, assume a string of concept_ids;
symptoms: symptoms of disease of interest or alternative/competing diagnoses (those that you want to see to be able to distinguish your doi from another close disease, ex.: polyuria, weight gain or loss, vision disturbances);
comorbidities: relevant diseases that co-occur with doi or alternative/competing diagnoses (ex.: obesity, metabolic syndrome, pancreatic disorders, pregnancy);
drugs: drugs, relevant to the disease of interest or those that can be used to treat alternative/competing diagnoses (ex.: insulin, oral glucose lowering drugs);
diagnosticProcedures: relevant diagnostic procedures (ex.: ultrasound of pancreas);
measurements: relevant lab tests (ex.: islet cell ab, HbA1C, glucose measurement in blood, insulin ab);
alternativeDiagnosis: alternative/competing diagnoses (ex.: diabetes type 2, cystic fibrosis, gestational diabetes, renal failure, pancreonecrosis)
treatmentProcedures: relevant treatment procedures (ex.: operative procedures on pancreas);
complications: relevant complications (ex.: retinopathy, CKD).

*note: if no suitable concept_ids exists for an input string, input c(0)

createKeeper(
  connectionDetails = NULL,
  connection = NULL,
  cohortDatabaseSchema = NULL,
  cdmDatabaseSchema,
  tempEmulationSchema = getOption("sqlRenderTempEmulationSchema"),
  cohortTable = "cohort",
  cohortDefinitionId,
  cohortName = NULL,
  sampleSize = 20,
  personIds = NULL,
  databaseId,
  assignNewId = FALSE,
  useAncestor = TRUE,
  doi,
  comorbidities,
  symptoms,
  alternativeDiagnosis,
  drugs,
  diagnosticProcedures,
  measurements,
  treatmentProcedures,
  complications
)

Arguments

connectionDetails: An R object of type connectionDetails created using the DatabaseConnector::createConnectionDetails() function. Not required of connection is provided.
connection: The connection to the database server created using DatabaseConnector::connect(). Not required if connectionDetails is provided.
cohortDatabaseSchema: The name of the database schema that is the location where the cohort to review is stored.
cdmDatabaseSchema: The name of the database schema that contains the OMOP CDM instance. Requires read permissions to this database. On SQL Server, this should specify both the database and the schema, so for example 'cdm_instance.dbo'.
tempEmulationSchema: Some database platforms like Oracle and Impala do not truly support temp tables. To emulate temp tables, provide a schema with write privileges where temp tables can be created.
cohortTable: The tablename that contains the cohort to review.
cohortDefinitionId: The cohort id to extract records.
cohortName: (optional) Cohort Name
sampleSize: (Optional, default = 20) The number of persons to randomly sample. Ignored, if personId is given.
personIds: (Optional) A vector of personId's to look for in Cohort table and CDM.
databaseId: A short string for identifying the database (e.g. 'Synpuf'). This will be displayed in shiny app to toggle between databases. Should not have space or underscore (_).
assignNewId: (Default = FALSE) Do you want to assign a newId for persons. This will replace the personId in the source with a randomly assigned newId.
useAncestor: keeperOutput: a switch for using concept_ancestor to retrieve relevant terms vs using verbatim strings of codes
doi: keeperOutput: input vector of concept_ids for disease of interest
comorbidities: keeperOutput: input vector of concept_ids for comorbidities associated with the disease of interest (such as smoking or hyperlipidemia for diabetes)
symptoms: keeperOutput: input vector of concept_ids for symptoms associated with the disease of interest (such as weight gain or loss for diabetes)
alternativeDiagnosis: keeperOutput: input vector of concept_ids for competing diagnosis within a month after the index date
drugs: keeperOutput: input vector of concept_ids for drug exposures relevant to the disease of interest, to be used for prior exposures and treatment after the index date. You may input drugs that are used to treat disease of interest and drugs used to treat alternative diagnosis
diagnosticProcedures: keeperOutput: input vector of concept_ids for diagnostic procedures relevant to the condition of interest within a month prior and after the index date
measurements: keeperOutput: input vector of concept_ids for lab tests relevant to the disease of interest within a month prior and after the index date
treatmentProcedures: keeperOutput: input vector of concept_ids for treatment procedures relevant to the disease of interest within a month after the index date
complications: keeperOutput: input vector of concept_ids for complications of the disease of interest within a year after the index date

Value

Output is a data frame with one row per patient, with the following information per patient:

demographics (age, gender);
visit_context: information about visits overlapping with the index date (day 0) formatted as the type of visit and its duration;
observation_period: information about overlapping OBSERVATION_PERIOD formatted as days prior - days after the index date;
presentation: all records in CONDITION_OCCURRENCE on day 0 with corresponding type and status;
comorbidities: records in CONDITION_ERA and OBSERVATION that were selected as comorbidities and risk factors within all time prior excluding day 0. The list does not inlcude symptoms, disease of interest and complications;
symptoms: records in CONDITION_ERA that were selected as symptoms 30 days prior excluding day 0. The list does not include disease of interest and complications. If you want to see symptoms outside of this window, please place them in complications;
prior_disease: records in CONDITION_ERA that were selected as disease of interest or complications all time prior excluding day 0;
prior_drugs: records in DRUG_ERA that were selected as drugs of interest all time prior excluding day 0 formatted as day of era start and length of drug era;
prior_treatment_procedures: records in PROCEDURE_OCCURRENCE that were selected as treatments of interest within all time prior excluding day 0;
diagnostic_procedures: records in PROCEDURE_OCCURRENCE that were selected as diagnostic procedures within all time prior excluding day 0;
measurements: records in MEASUREMENT that were selected as measurements (lab tests) of interest within 30 days before and 30 days after day 0 formatted as value and unit (if exists) and assessment compared to the reference range provided in MEASUREMENT table (normal, abnormal high and abnormal low);
alternative_diagnosis: records in CONDITION_ERA that were selected as alternative (competing) diagnosis within 90 days before and 90 days after day 0. The list does not include disease of interest;
after_disease: same as prior_disease but after day 0;
after_drugs: same as prior_drugs but after day 0;
after_treatment_procedures: same as prior_treatment_procedures but after day 0;
death: death record any time after day 0.

Examples

if (FALSE) {
connectionDetails <- createConnectionDetails(
  dbms = 'postgresql',
  server = 'ohdsi.com',
  port = 5432,
  user = 'me',
  password = 'secure'
)

keeper <- createKeeper(
 connectionDetails = connectionDetails,
 databaseId = "Synpuf",
 cdmDatabaseSchema = "dbo",
 cohortDatabaseSchema = "results",
 cohortTable = "cohort",
 cohortDefinitionId = 1234,
 cohortName = "DM type I",
 sampleSize = 100,
 assignNewId = TRUE,
 useAncestor = TRUE,
 doi = c(435216, 201254),
 symptoms = c(79936, 432454, 4232487, 4229881, 254761),
 comorbidities = c(141253, 432867, 436670, 433736, 255848),
 drugs = c(21600712, 21602728, 21603531),
 diagnosticProcedures = c(0),
 measurements  = c(3004410,3005131,3005673,3010084,3033819,4149519,4229110),
 alternativeDiagnosis = c(192963,201826,441267,40443308),
 treatmentProcedures = c(4242748),
 complications =  c(201820,375545,380834,433968,442793,4016045,4209145,4299544)                             
)
}