The OHDSI Oncology Development Subgroup has created a standardized ETL to ingest NAACCR data into the Oncology CDM Extension. The ETL is a SQL script that assumes your NAACCR data has been transformed into a common EAV input format. The SQL script uses the common NAACCR data dictionary input format in conjunction with the ICDO-3, NAACCR, and Hemonc.org vocabularies within the OMOP vocabulary tables to perform the following tasks:
North American Association of Central Cancer Registries (NAACCR) is the organization that governs the format that is used to standardize the encoding and transmission of cancer registry data in the United States. All healthcare facilities in the United States that diagnose or treat cancer patients are mandated by law to track and collect cancer data and submit it in the NAACCR data dictionary format for all first-course diagnosed/treated primary neoplasms.
The NAACCR data dictionary standard is used by multiple cancer registry aggregators:
The NAACCR data dictionary format most importantly covers the following areas:
NAACCR data is generally considered a gold-standard source for the following areas:
NAACCR data is generally considered to be a valuable but not gold-standard source for the following area:
The NAACCR data dictionary format is a question/answer or EAV format that mixes: * De novo definition of data points. * Sourcing of data points from existing standard bodies: cancer diagnosis (site/histology) from ICDO-3 via WHO; staging variables and values from AJCC.
The NAACCR data dictionary format and the source ICDO-3 vocabulary
have been ingested into the OMOP vocabulary.
* See here
for details of how the NAACCR dictionary format has been ingested into
the OMOP vocabulary tables. * See here
for details of how the ICDO-3 vocabulary has been ingested into the OMOP
vocabulary tables. * Presently, only the source AJCC Staging Edition 7
vocabulary has been ingested into the OMOP vocabulary tables. The OHDSI
vocabulary team is working with AJCC to cover AJCC Edition 8 and prior
editions.
The Hemonc.org oncology drug regimen ontology has been ingested into the OMOP vocabulary. Some treatment NAACCR item coded values are mapped to Hemonc.org ‘Modality’ concepts. * See here for details of how the Hemonc.org oncology drug regimen ontology has been ingested into the OMOP vocabulary tables.
The NAACCR data is natively a flat or pivoted format, typically available to ETL developers in either the native NAACCR fixed-width file format, XML, or in a custom relational structure determined by local tumor registry software.
Currently the OHDSI Oncology Development Subgroup supports two methods to convert and populate the NAACCR_DATA_POINTS input format from native NAACCR data.
All methods of transforming the NAACCR data to the NAACCR_DATA_POINTS input format will need to include a method to populate the NAACCR_DATA_POINTS.person_id column. Normally, this will be done by mapping NAACCR item 2300 -‘Medical Record Number’ to a medical record number in a local EHR or Enterprise Master Patient Index (EMPI). The aforementioned R package contains a function to populate the person identifier which assumes a database table exists that maps MRN to person_id.
The NAACCR ETL SQL, which translates the EAV into OMOP, has been written in vanilla SQL to facilitate it being run on multiple different database platforms. Currently, the OHDSI Oncology Development Subgroup uses the SQLRender OHDSI package to translate the NAACCR ETL to the four supported database platforms (PostgreSQL, Sql Server, Oracle, and Redshift). The NAACCR ETL SQL is wrapped in a database transaction to support the complete rollback of data changes. To execute, grab the NAACCR SQL ETL from the OncologyWG Github repository. Find the SQL script relevant to your database platform (PostgreSQL, Sql Server, Oracle and Redshift). See NAACCR SQL ETL folder here.
The NAACCR ETL SQL has a full-coverage unit test suite. See here to inspect the NAACCR ETL’s unit tests.. The NAACCR ETL SQL uses a dummy Ruby on Rails application to set up a unit testing environment. If you would like to help develop the NAACCR SQL ETL by making pull requests and writing unit tests to cover your changes, please read the instructions for setting up the unit testing environment locally. See here instructions for setting up the NAACCR ETL unit testing environment.