R/SensitiveCohort.R
computeCohortOperatingCharacteristics.RdComputes operating characteristics (sensitivity, specificity, positive predictive value, AUC, and Cohen's kappa) of a cohort definition by comparing it against a reference cohort created from LLM review of KEEPER profiles. Metrics are computed separately for high-certainty reviews, low-certainty reviews, and all reviews combined.
computeCohortOperatingCharacteristics(
connectionDetails = NULL,
connection = NULL,
cohortDatabaseSchema,
cohortTable,
cohortDefinitionId,
referenceCohortDatabaseSchema,
referenceCohortTableNames,
referenceCohortDefinitionId,
type = "incident",
washoutPeriod = 0,
stratifyByCertainty = FALSE
)An R object of type connectionDetails created using the
DatabaseConnector::createConnectionDetails() function. Not
required of connection is provided.
The connection to the database server created using
DatabaseConnector::connect(). Not required if connectionDetails
is provided.
The name of the database schema containing the cohort to evaluate.
The table name containing the cohort to evaluate.
The cohort definition ID of the cohort to evaluate.
The name of the database schema containing the reference
cohort (as uploaded by uploadReferenceCohort()).
The table names where the reference cohort and metadata are stored. Should be created using [createReferenceCohortTableNames())].
[createReferenceCohortTableNames())]: R:createReferenceCohortTableNames())
The cohort definition ID of the reference cohort.
If type = "incident", phenotypes are evaluated to also get the right cohort
start date. If type = "prevalent", the evaluation only asks whether the
phentype correctly classified a person as case or non-case.
The minimum required continuous observation time prior to index date for a person to be included in the cohort. People with a cohort start date within the washout period will be removed before computing performance metrics.
Stratify the output by LLM certainty level?
A tibble with one row per certainty level ("high", "low", "all") and columns for
true positives, false positives, true negatives, false negatives, sensitivity, specificity,
PPV (each with lower and upper confidence bounds), AUC, kappa, disease prevalence and certainty.
Specificity and prevalence are computed both within the reference cohort, and, based on the prevalence of the highly- sensitive cohort, also in the overall population.