Build a concept set from a clinical condition and an initial list of concept ids.
createConceptSet(
conceptName,
originalConceptList,
excludedConditions = "none",
llmClient,
connectionDetails,
cdmDatabaseSchema,
tempEmulationSchema = getOption("sqlRenderTempEmulationSchema"),
minCount = 0,
belowMinimumCountApproach = "TEST ALL",
outputDirectory,
tries = 1,
successes = 1,
additionalInformation = "",
excludedVocabularies = c("ICDO3"),
clinicalContext = ""
)Character. Name of the concept pointing to the clinical condition.
Integer or character vector. List of concept ids to use as a starting point.
Character. Names of conditions to be excluded from the concept set.
connection object for the LLM client (see ellmer package for object details)
An R object of type connectionDetails created using the function createConnectionDetails in the DatabaseConnector package.
The name of the database schema that contains the OMOP CDM instance. Requires read permissions to this database. On SQL Server, this should specify both the database and the schema, so for example 'cdm_instance.dbo'.
Some database platforms like Oracle and Impala do not truly support temp tables. To emulate temp tables, provide a schema with write privileges where temp tables can be created.
Integer. Minimum cell subject count to use for concepts.
Character. How to treat concepts below the minimum count. One of "TEST ALL", "TEST PHOEBE", "EXCLUDE ALL", "INCLUDE ALL". "TEST ALL" = test all the concepts below the minimum count; "TEST PHOEBE" = only test the concepts below minimum count that were recommended by PHOEBE; "EXCLUDE ALL" = exclude from the concept set any concept below the minimum count; "INCLUDE ALL" = include all the concepts below the minimum count.
Character. Directory to save output artifacts.
Integer. Number of attempts to try for each concept. The package allows for multiple runs of the same concept to get a consensus vote from multiple LLM iterations.
Integer. How many successes required to include a concept. The package allows for multiple runs of the same concept to get a consensus vote from multiple LLM iterations.
Character. Additional information for concept development. This may include any specific details that are desired for the concepts, for example, "only in women"
Vocabularies not to be included in the condensing function
Character. Optional clinical context for the LLM to determine appropriateness of a concept, for example, "following surgery" would include concepts whose name indicates it happened post-surgery.
Final results set as a list of two elements 1) a data frame of the LLM results for each tested concept and 2) a JSON object ready for porting into ATLAS if successful, FALSE if unsuccessful.
This function will create a concept set starting from a clinical condition and a concept id.