Start with a Task Group: a unit of work of orders related to a component of oncology research. If you’d like to contribute to a specific Task Group, add yourself as an assignee and change the status from Outstanding to In Progress
Within the top comment on the Task Group issue, the Problem Space section contextualizes the problem and the Tasks section links to subissues that outline the steps for solving it.
Within each Task Group, there are typically five subissues: Resolve decision points, Investigate & create tasks, Complete outstanding work, Validate and ingest, and Document.
This subissue lists the decision points that need to be finalized before work can begin on the Task Group. Decision points must be finalized by the OHDSI Oncology WG and often require input from oncologists and informaticists before they are introduced as discussion topics at a WG meeting. Generally, all decision point issues should be finalized and closed before work begins on a task group.
The next task in the Task Group involves scoping the work necessary to solve the problems laid out in the Problem Space. In the context of vocabulary development, this task may be broken into two main steps. The first involves outlining the preparatory work necessary to enable the Task Group’s main work. This means identifying vocabularies with problematic terms and creating a checklist for the preparatory work required:
After outlining the preparatory work, the second step of this task involves compiling the lists of terms from problematic vocabularies that will be addressed and repaired in the following task. Each of these lists of terms should become their own individual issue which is then linked to next task (Complete outstanding work):
These issues, which contain the first actual work orders, should be structured as shown above: a brief description of the work that needs to be done and a set of tables containing all of the concepts involved. In the task pictured above, the tables involved are a comprehensive “source” table of the concepts that need to be altered (likely destandardized and mapped to standard concepts), and the table of “target” concepts that should be mapped to.
Notice that checkboxes for task lists are created using “- [ ]” (note the space between the dash and open bracket, and the space between brackets). Issues are linked to checkboxes by referencing their issue number.
After all work order issues are created, linked to the “Complete outstanding work” issue, and referenced in a comment on this issue, all checkboxes in this issue can be checked and the issue closed:
In the last task, you worked to add content to the top comment in this issue:
At this point, you are ready to start preparing the tables to create changes in the OHDSI Development Vocabulary.
The example above demonstrated the structure of the work order: a description of work to be done and a set of tables necessary to do the work. To complete the work order, we will download the linked CSV files, create empty concept and concept_relationship files, and start destandardizing
It’s a good idea to adopt a naming convention that works for your data management style. Here, concept_SNOMEDtoCM_sg indicates that these are the SNOMED stage group concepts that are being destandardized and will be mapped to the Cancer Modifier stage group concepts.
The delta concept table (bottom) is fairly easy to create in this case. Since we are destandardizing all of these SNOMED concepts (left), we simply need to copy all of the terms to our new concept table and remove the ‘S’ from the standard_concept column.
Now that the SNOMED concepts are destandardized, they need to be mapped to the standard Cancer Modifier concepts. This step involves combing through the table of Cancer Modifier concepts (bottom left) to find the “matching” term and adding a row to delta concept_relationship table (right). Remember that every “Maps to” relationship should have a reciprocal “Mapped from” relationship (not shown).
As you work, it can be very useful to leave comments on your thought process or anything out of the ordinary directly on the work order ticket:
After you’ve completed the work for a work order ticket, attach your output files to a comment and close the issue:
Closing the tickets will automatically check the boxes in the Task issue. Once all of the listed tasks have been completed, the entire Task issue can be closed:
A small number of community members (who are familiar with this process) will handle the majority of validation and ingestion requests thus it is not required that these processes be fully understood by the majority. For more information see the Validation tooling readme
You will need the vocabTools directory and its contents downloaded to your computer to make use of the tools in the following steps. The easiest way to download vocabTools (and subsequently push Oncology Development Vocabulary changes back to GitHub) is by cloning the OncologyWG:
git clone https://github.com/OHDSI/OncologyWG.git
Part of the validate and ingest process involves standing up a “vocab” database. The tools presented here expect and require a PostgreSQL database with schemas “prod” (an “official” OMOP Vocabulary) and “dev” (a copy of prod that will be altered and used for staging). You can assume these tools will work for you if you have a prod and dev schema set up in a Postgres database, or follow detailed instructions for setting them up here.
After your database is set up with a prod and a dev schema, update the config.txt file to the correct connection information.
In the last task, you created concept and concept_relationship table fragments to complete the vocabulary work orders. In this step, we will validate these locally by integrating them into the dev schema of the database we have set up and comparing the augmented dev schema and prod schema.
First, move all of the concept_.csv and concept_relationship_.csv files to the concept and concept_relationship directories in your local copy of vocabTools:
With your concept and concept_relationship CSV files in the
appropriate folders, you can start ingesting them using the Batch files
in the vocabTools directory. Start with updateConcept.bat
to update the concept table in the dev schema. After that has run,
execute updateConceptRelationship.bat
to update the
concept_relationship table. You can visually verify that these scripts
ran correctly by executing and examining the output from the
getConceptDiffs.bat
and
getConceptRelationshipDiffs.bat
scripts. These scripts
output all rows from concept and concept_relationship table in the dev
schema that don’t exist in the prod schema version.
Once the concept and concept_relationship tables have been updated
from the CSV files you created, you can update the concept_ancestor
hierarchy by running updateConceptAncestor.bat
.
This process may take an hour or more to complete.
After you’ve updated the concept_ancestor table, you can visually
verify that the changes were made by running
getConceptAncestorDiffs.bat
. Again, this script may take a
few minutes to run.
The Oncology Development Vocabulary is persisted as two CSV files that represent the set of changes that should be made to the concept and concept_relationship table. These files are stored on the OncologyWG GitHub repository in the deltaVocab directory. More information on “deltatVocab” can be found here.
To “ingest” the changes that you have made to the Oncology Development Vocabulary, you will need to: 1. Add your changes to the deltaConcept.csv and deltaConceptRelationship.csv files 2. Add a deltaSummary file to summarize the changes that your additions make to the official OMOP Vocabulary 3. Create a pull request from a fork or branch of the OncologyWG GH to the master branch
These steps are detailed below.
Changes can be easily incorporated into the deltaVocab files by
running updateDelta.bat
. This script will look for, and
ingest, any files in the vocabTools/concept and
vocabTools/concept_relationship directories. It will also ensure that
the most up-to-date deltaVocab is used as a starting point, and that
only new changes are being added. Once this script has run successfully,
the deltaVocab files on your computer will reflect your changes to the
official OMOP Vocabulary, as well as all previously ingested
changes.
A standardized script is used to create a summary of the changes
made. You can use getSummary.bat
to generate a table that
summarizes the difference between what is in your dev schema and your
prod schema across the concept, concept_relationship, and
concept_ancestor tables. This script will output the summary table as a
file called deltaSummary
Note: The summary table will not be accurate if you have made any additional changes to the dev or prod schema since running the updateConcept*.bat scripts.
You can also choose to update the main deltaSummary file
(deltaSummary.txt with no date in the filename), though this requires
you to run the updateConcept*.bat scripts on the updated deltaVocab
files. To update the main deltaSummary file: 1. Copy the deltaConcept
and deltaConceptRelationship files to the vocabTools/concept and
vocabTools/concept_relationship folders, respectively. 2. Rerun each
update script (updateConcept.bat, updateConceptRelationship.bat,
updateConceptAncestor.bat), thus updating your dev schema to the most
up-to-date version of the Oncology Development Vocabulary (as of your
changes). 3. Run getSummary.bat full
. Specifying the
argument “full” will update the main getSummary.txt file in
deltaVocab.
Changes should be “ingested” via a Pull Request from your branch or fork of the OncologyWG to the master branch of OncologyWG:
If you’ve been working on your own branch/fork of the repo, simply commit and push your changes, select your branch/fork on a PR to OncologyWG GitHub, and click Create Pull Request.
If you’ve been working directly on the master branch, you will need to first move your changes to a new branch, commit, and push the branch to GitHub:
git checkout -b your_branch_name
git add .
git commit -m "Your commit message here"
git push origin your_branch_name
After that, you can select your new branch on a PR to OncologyWG GitHub, and click Create Pull Request.
Note: Keep an eye out for feedback from the Oncology Development Vocabulary maintainers in the form of comments on your PR. They will let you if any changes need to be made before the PR can be accepted.
There are three methods for contributing to our documentation documentation.
Option 1) Standard approach: Use RStudio
# start in rmd/ directory
setwd(".../rmd")
# generate html from rmd
rmarkdown::render_site()
Option 2) Use Github.Dev or other markdown editor
Steps:
Tips:
Option 3) If neither above options are feasible, you can write the documentation in whatever format you’re comfortable with and
submit the content as a comment in the github ticket
one of the repository maintainers will convert it into the proper format