vignettes/checks/sourceValueCompleteness.Rmd
sourceValueCompleteness.Rmd
Level: FIELD
Context: Verification
Category: Completeness
Subcategory:
Severity: CDM convention ⚠
The number and percent of distinct source values in the @cdmFieldName field of the @cdmTableName table mapped to 0.
_source_value
entries
where the corresponding standard _concept_id
field is
0._source_value
entries, including NULL, in the respective event table._source_value
fields._source_value
fields in condition, measurement,
procedure, drug, visit.This check will look at all distinct source values in the specified
field and calculate how many are mapped to a standard concept of 0. This
check should be used in conjunction with the standardConceptRecordCompleteness check
to identify potential mapping issues in the ETL.
This check is a good measure of the overall mapping rate within each domain. For example, a table may have high standardConceptRecordCompleteness (that is, a large percentage of records with a non-zero standard concept) but a low score on this check. This would indicate that the “long tail” of rarer codes have not been mapped while more common codes have good mapping coverage. It is always important to interrogate the results of these two checks together to ensure complete understanding of vocabulary mapping in your CDM.
The following SQL can be used to summarize unmapped source values by record count in a given CDM table:
Fails of this check are (most often) related directly to semantic mapping. First, the ETL developer should investigate if a source vocabulary is present in the native data that was not accounted for in the ETL document and/or code. This is most likely if the unmapped source values are codes rather than text values. Second, the source-to-concept-map file or table should be updated to link the unmapped source values with domain-appropriate concepts.
When this check fails, source data granularity is being lost; not all of the information related to a particular event or modifier is being captured in OMOP CDM format. Although the information about an event may exist in the source value field, it cannot easily be used in downstream analytics processes that rely on standard OMOP concepts.
Please see the standardConceptRecordCompleteness page for a much more detailed overview of handling mapping quality issues in your OMOP CDM.