vignettes/CreatingLLMConceptSets.rmd
CreatingLLMConceptSets.rmdThe Phenelope package enables evaluating the creating of
a health condition concept set. The output of the process is a JSON file
that can be imported directly into ATLAS to a concept set.
This vignette describes how to run the Phenelope process from start
to end in the Phenelope package.
There is only a small amount of information needed to create a concept set:
Each of these steps is described in detail below. For this vignette, we will describe the request for a concept set for type 2 diabetes mellitus (T2DM).
The health condtion name to be used for the process would be something like “Type 2 Diabetes Mellitus”. After examining the database, an appropriate SNOMED concept id is 201826 for “Type 2 diabetes mellitus”.
Quick Tip: Phenelope uses PHOEBE to determine possible other concepts that are not direct children of the selected concept. You could, for instance select, 201820 for “Diabetes mellitus”, the parent of 201826. This will expand the number of concepts that the LLM will test. Depending on your health condition, this may or may not be desirable. The only cost to using a parent concept is the number of concepts that are eventually tested, which could increase the amount of time to create the concept set.
The information you will need to gather for connecting to the database is:
Below is an example for creating a database connection string:
# create database connection details
connectionDetails <- DatabaseConnector::createConnectionDetails(
dbms = "postgresql",
server = "localhost/ohdsi",
user = "joe",
password = "supersecret"
)The information you will need is documented in the ellmer package. See, for example, the ellmer function chat_azure_openai. Required fields are LLM dependent, but, as an example, you may need: * endpoint * api_version * model * credentials
For example:
# create LLM client for ellmer
llmClient <- ellmer::chat_azure_openai(
endpoint = gsub("/openai/deployments.*", "", keyring::key_get("genai_gpt4o_endpoint")),
api_version = "2023-03-15-preview",
model = "gpt-4o",
credentials = function() keyring::key_get("genai_api_gpt4_key")
)The example above creates an ellmer object that will connect to an Azure Open AI instance, version 4o. We recommend using a non-reasoning model for best results.
Once you have the necessary information, you are ready to create you concept set by calling the createConceptSet() function as shown below:
# run the
finalSet <- Penelope::createConceptSet(
conceptName = "Type 2 Diabetes Mellitus",
originalConceptList = c(201826),
llmClient = llmClient,
connectionDetails = connectionDetails,
cdmDatabaseSchema = "yourCDMSchema",
outputDirectory = "c:/llmConceptSets/type2Dm"
)The following are additional parameters that may be set but aren’t required:
The function returns a list with 2 elements:
Element 1: a dataframe that has all the concepts that were tested by the LLM and whether the LLM included them in the final concept set or decided they should not be included
Element 2: a json object for a concept set definition that may be imported into ATLAS
In your output folder you will have the following artifacts:
The csv file contains the following: