Build a concept set from a clinical condition and an initial list of concept ids.

createConceptSet(
  conceptName,
  originalConceptList,
  excludedConditions = "none",
  llmClient,
  connectionDetails,
  cdmDatabaseSchema,
  tempEmulationSchema = getOption("sqlRenderTempEmulationSchema"),
  minCount = 0,
  belowMinimumCountApproach = "TEST ALL",
  outputDirectory,
  tries = 1,
  successes = 1,
  additionalInformation = "",
  excludedVocabularies = c("ICDO3"),
  clinicalContext = ""
)

Arguments

conceptName

Character. Name of the concept pointing to the clinical condition.

originalConceptList

Integer or character vector. List of concept ids to use as a starting point.

excludedConditions

Character. Names of conditions to be excluded from the concept set.

llmClient

connection object for the LLM client (see ellmer package for object details)

connectionDetails

An R object of type connectionDetails created using the function createConnectionDetails in the DatabaseConnector package.

cdmDatabaseSchema

The name of the database schema that contains the OMOP CDM instance. Requires read permissions to this database. On SQL Server, this should specify both the database and the schema, so for example 'cdm_instance.dbo'.

tempEmulationSchema

Some database platforms like Oracle and Impala do not truly support temp tables. To emulate temp tables, provide a schema with write privileges where temp tables can be created.

minCount

Integer. Minimum cell subject count to use for concepts.

belowMinimumCountApproach

Character. How to treat concepts below the minimum count. One of "TEST ALL", "TEST PHOEBE", "EXCLUDE ALL", "INCLUDE ALL". "TEST ALL" = test all the concepts below the minimum count; "TEST PHOEBE" = only test the concepts below minimum count that were recommended by PHOEBE; "EXCLUDE ALL" = exclude from the concept set any concept below the minimum count; "INCLUDE ALL" = include all the concepts below the minimum count.

outputDirectory

Character. Directory to save output artifacts.

tries

Integer. Number of attempts to try for each concept. The package allows for multiple runs of the same concept to get a consensus vote from multiple LLM iterations.

successes

Integer. How many successes required to include a concept. The package allows for multiple runs of the same concept to get a consensus vote from multiple LLM iterations.

additionalInformation

Character. Additional information for concept development. This may include any specific details that are desired for the concepts, for example, "only in women"

excludedVocabularies

Vocabularies not to be included in the condensing function

clinicalContext

Character. Optional clinical context for the LLM to determine appropriateness of a concept, for example, "following surgery" would include concepts whose name indicates it happened post-surgery.

Value

Final results set as a list of two elements 1) a data frame of the LLM results for each tested concept and 2) a JSON object ready for porting into ATLAS if successful, FALSE if unsuccessful.

Details

This function will create a concept set starting from a clinical condition and a concept id.