Summary

Level: FIELD
Context: Verification
Category: Completeness
Subcategory:
Severity: CDM convention ⚠

Description

The number and percent of distinct source values in the @cdmFieldName field of the @cdmTableName table mapped to 0.

Definition

  • Numerator: Distinct _source_value entries where the corresponding standard _concept_id field is 0.
  • Denominator: Total distinct _source_value entries, including NULL, in the respective event table.
  • Related CDM Convention(s): The OMOP Common Data Model specifies that codes that are present in a native database should be mapped to standard concepts using either the intrinsic mappings defined in the standard vocabularies or extrinsic mappings defined by the data owner or ETL development team. Note also that variations of this check logic are also used in the EHDEN CDM Inspection Report package, as well as the AresIndexer package for generating indices of unmapped codes.
  • CDM Fields/Tables: Runs on all event tables that have _source_value fields.
  • Default Threshold Value:
    • 10% for _source_value fields in condition, measurement, procedure, drug, visit.
    • 100% for all other fields

User Guidance

This check will look at all distinct source values in the specified field and calculate how many are mapped to a standard concept of 0. This check should be used in conjunction with the standardConceptRecordCompleteness check to identify potential mapping issues in the ETL.

This check is a good measure of the overall mapping rate within each domain. For example, a table may have high standardConceptRecordCompleteness (that is, a large percentage of records with a non-zero standard concept) but a low score on this check. This would indicate that the “long tail” of rarer codes have not been mapped while more common codes have good mapping coverage. It is always important to interrogate the results of these two checks together to ensure complete understanding of vocabulary mapping in your CDM.

The following SQL can be used to summarize unmapped source values by record count in a given CDM table:

Violated rows query

SELECT DISTINCT 
  cdmTable.@cdmFieldName,
  COUNT(*)
FROM @cdmDatabaseSchema.@cdmTableName cdmTable
WHERE cdmTable.@standardConceptFieldName = 0
GROUP BY 1
ORDER BY 2 DESC

ETL Developers

Fails of this check are (most often) related directly to semantic mapping. First, the ETL developer should investigate if a source vocabulary is present in the native data that was not accounted for in the ETL document and/or code. This is most likely if the unmapped source values are codes rather than text values. Second, the source-to-concept-map file or table should be updated to link the unmapped source values with domain-appropriate concepts.

Data Users

When this check fails, source data granularity is being lost; not all of the information related to a particular event or modifier is being captured in OMOP CDM format. Although the information about an event may exist in the source value field, it cannot easily be used in downstream analytics processes that rely on standard OMOP concepts.

Please see the standardConceptRecordCompleteness page for a much more detailed overview of handling mapping quality issues in your OMOP CDM.