Git and GitHub offer a collaborative environment for proposing, discussing, and implementing changes to a reference vocabulary such as the OMOP Vocabulary.
However, due to licensing and volume issues, it is not possible to maintain and develop the entire OMOP vocabulary in a GitHub repository as flat files.
To work around this, a group of collaborators can maintain and contribute to a growing list of edits to the OMOP Vocabulary. We call this list of edits the “delta vocab”.
The delta vocab, which is literally a collection of concept and concept_relationship records exactly as they would represented in the OMOP Vocabulary table, provides a lightweight representation of any deviations from the official OMOP Vocabulary. From these tables, the concept_ancestor table is then programmatically generated.
Maintaining the change between the official OMOP Vocabulary release and the Oncology Development Vocabulary allows for rapid development of OHDSI Oncology studies that are untethered from the official OMOP Vocabulary release cadence. By preserving only the changed elements, instead of the entire Oncology Development Vocabulary, this method provides a lightweight, GitHub-friendly solution, that is also respectful of (by way of avoiding) the licensed vocabulary terms.
The simplicity of maintaining as little of the vocabulary as possible and using scripted logic to apply changes to the existing vocabulary makes this method easy to implement and ideal for the core use case - establishing standard concepts and remapping newly destandardized terms.
Three steps are necessary to deploy the delta vocabularies to your local database:
Download source vocab data and tools
Configure your local database
Ingest delta vocabulary files
To create the Oncology Development Vocabulary, you must download the vocabTools and deltaVocab folders from the OHDSI/OncologyWG repository. It may be simplest to clone the OHDSI/OncologyWG and work from there:
git clone https://github.com/OHDSI/OncologyWG.git
These methods assume you have the latest official release of the OMOP Vocabulary in two identical schemas in a Postgres database: - prod: The prod schema contains the official (“production”) OMOP Vocabulary. This vocabulary will not be changed but can be used to refresh the dev schema. - dev: The dev schema begins as an exact copy of the official OMOP Vocabulary, but will be transformed into the Oncology Development Vocabulary using the deltaVocab files and the scripts in vocabTools.
To enable the scripts in vocabTools, enter your database connection details into the config.txt file.
Create two folders in the vocabTools folder: concept and concept_relationship.
Move the deltaConcept and deltaConceptRelationship files to the new concept and concept_relationship folders, respectively.
Run updateConcept.bat to implement the changes from deltaConcept to the dev schema in your database.
Run updateConceptRelationship.bat to implement the changes from deltaConceptRelationship to the dev schema in your database.
Run updateConceptAncestor.bat to rebuild concept_ancestor based on the new concept and concept_relationship tables in the dev schema.
Using the delta vocab and helper scripts, a developer with an official OMOP Vocabulary database can quickly create a full, working version of the OMOP Vocabulary with all proposed changes implemented, allowing for advanced testing and use of existing OHDSI tools with a development version of the vocabulary.
See README of the vocabTools directory for instructions for contributing to the Oncology Delta Vocabulary
A GitHub Project has been created and customized to enable collaborative and dynamic project management. Notably this project exists at the organization level, not the repository level, thus enabling extended functionality including issue triage across multiple repositories.
Orientation and Onramp: GitHub Project Orientation
GitHub Project: Oncology Maturity Sprint
We leverage the RMarkdown R Package to create content in Rmd files and generate them as HTML. Through GitHub Pages, these HTML files can be easily deployed as a project website. There are several options varying in technical complexity to contribute to this documentation.
See here for more details
Provide a semi-automated and extensible framework for generating, visualizing, and sharing an assessment of an OMOP-shaped database’s adherence to the OHDSI Oncology Standard (tables, vocabulary) and the availabilty and types of oncology data it contains.
Assessments can be executed against an OMOP-shaped database to create a characterization and quality report. They are created using specificications.
Specifications are JSON files that describe an assessment. They are composed by compiling analyses together with threshhold values.
Analyses execute a query and return a row count or proportion describing the contents in the database. For example, analysis_id=1234 returns “the number of cancer diagnosis records derived from Tumor Registry source data”.
Threshholds provide study specific context to the results of analyses. An analysis asks how many cancer diagnoses derived from tumor registry data are in the database. Using threshholds, an assessment author can give ranges for “bad”, “questionable”, and “good” analysis results as they pertain to their study. An example threshhold, which would be encoded as JSON, could express the sentiment “A database with 0-200 diagnoses from tumor registry data would be unfit for this study, 201-500 diagnoses may be suitable, and over 500 diagnoses will be more enough.”
The R package provides functionality for the four major processes involved in the framework:
See README of the validationScripts directory for instructions for contributing to the Oncology Validation Framework