Compute cohort operating characteristics based on a Reference Cohort — computeCohortOperatingCharacteristics • Keeper

Computes operating characteristics (sensitivity, specificity, positive predictive value, AUC, and Cohen's kappa) of a cohort definition by comparing it against a reference cohort created from LLM review of KEEPER profiles. Metrics are computed separately for high-certainty reviews, low-certainty reviews, and all reviews combined.

computeCohortOperatingCharacteristics(
  connectionDetails = NULL,
  connection = NULL,
  cohortDatabaseSchema,
  cohortTable,
  cohortDefinitionId,
  referenceCohortDatabaseSchema,
  referenceCohortTableNames,
  referenceCohortDefinitionId,
  type = "incident",
  washoutPeriod = 0,
  stratifyByCertainty = FALSE
)

Arguments

connectionDetails

An R object of type connectionDetails created using the DatabaseConnector::createConnectionDetails() function. Not required of connection is provided.

connection

The connection to the database server created using DatabaseConnector::connect(). Not required if connectionDetails is provided.

cohortDatabaseSchema

The name of the database schema containing the cohort to evaluate.

cohortTable

The table name containing the cohort to evaluate.

cohortDefinitionId

The cohort definition ID of the cohort to evaluate.

referenceCohortDatabaseSchema

The name of the database schema containing the reference cohort (as uploaded by uploadReferenceCohort()).

referenceCohortTableNames

The table names where the reference cohort and metadata are stored. Should be created using [createReferenceCohortTableNames())].

[createReferenceCohortTableNames())]: R:createReferenceCohortTableNames())

referenceCohortDefinitionId

The cohort definition ID of the reference cohort.

type

If type = "incident", phenotypes are evaluated to also get the right cohort start date. If type = "prevalent", the evaluation only asks whether the phentype correctly classified a person as case or non-case.

washoutPeriod

The minimum required continuous observation time prior to index date for a person to be included in the cohort. People with a cohort start date within the washout period will be removed before computing performance metrics.

stratifyByCertainty

Stratify the output by LLM certainty level?

Value

A tibble with one row per certainty level ("high", "low", "all") and columns for true positives, false positives, true negatives, false negatives, sensitivity, specificity, PPV, NPV (each with lower and upper confidence bounds), AUC, kappa, disease prevalence and certainty.

Specificity, NPV, and prevalence are computed both within the reference cohort, and, based on the prevalence of the highly- sensitive cohort, also in the overall population.