vignettes/checks/standardConceptRecordCompleteness.Rmd
standardConceptRecordCompleteness.Rmd
Level: FIELD
Context: Verification
Category: Completeness
Subcategory:
Severity: CDM convention ⚠
The number and percent of records with a value of 0 in the standard concept field @cdmFieldName in the @cdmTableName table.
_concept_id
standard concept field. In the case of
MEASUREMENT.unit_concept_id
and
OBSERVATION.unit_concept_id
, the number of rows with a
value of 0 in the _concept_id
standard concept field AND a
non-NULL value_as_number
.MEASUREMENT.unit_concept_id
and
OBSERVATION.unit_concept_id
, the number of rows with a
non-NULL value_as_number
._concept_id
) columns in all event tables.place_of_service_concept_id
,
modifier_concept_id
)Standard concept mapping is one of the most fundamental conventions of the OMOP CDM. It enables standardized analysis across diverse data sources and allows users to abstract away the tedium of traversing source vocabularies when building phenotypes. As such, it is highly recommended to map as many concepts in your source as possible. Failures of this check should be well-understood and documented so that data users can plan accordingly in the case missing data might impact their analysis.
A failure of this check usually indicates a failure to map a source value to a standard OMOP concept. In some cases, such a failure can and should be remediated in the concept-mapping step of the ETL. In other cases, it may represent a mapping that currently is not possible to implement.
To investigate the failure, run the following query:
SELECT
concept_name,
cdmTable._source_concept_id, -- source concept ID field for the table
cdmTable._source_value, -- source value field for the table
COUNT(*)
FROM @cdmDatabaseSchema.@cdmTableName cdmTable
LEFT JOIN @vocabDatabaseSchema.concept ON concept.concept_id = cdmTable._source_concept_id
WHERE cdmTable.@cdmFieldName = 0
-- AND cdmTable.value_as_number IS NOT NULL -- uncomment for unit_concept_id checks
GROUP BY 1,2,3
ORDER BY 4 DESC
This will give you a summary of the source codes which failed to map to an OMOP standard concept. Inspecting this data should give you an initial idea of what might be going on.
SELECT
concept_id AS standard_concept_mapping
FROM @vocabDatabaseSchema.concept_relationship
JOIN @vocabDatabaseSchema.concept ON concept.concept_id = c oncept_relationship.concept_id_2
AND relationship_id = ‘Maps to’
WHERE concept_relationship.concept_id_1 = <source concept ID>
If no results are returned, consider whether the source concept ID is part of the OMOP vocabularies. If it is, then there is likely a vocabulary issue which should be reported. If it is not (i.e., it is a local concept), then there is likely an issue with your local source-to-concept mapping
If the investigation query returns a source value and source concept ID but no concept name, this indicates the source concept ID does not exist in your concept table. This may be expected if your ETL includes local source-to-concept mappings. If not, then your ETL has assigned a malformed source concept ID and will need to be debugged
If the investigation query returns a source value but no source concept ID (or a source concept ID of 0), run the following query to search for the source value in the OMOP vocabulary (note that if your ETL includes local mappings and the code in question is known not to exist in OMOP, you should search your local mapping table/config instead):
-- may return false positives if the same value exists in multiple vocabularies
-- only applicable in the case where the source value column is populated only with a vocabulary code
SELECT
*
FROM @vocabDatabaseSchema.concept
WHERE concept_code = <source value>
It is important to note that records with a 0 standard concept ID field will be unusable in standard OHDSI analyses and thus should only be preserved if there is truly no standard concept ID for a given record. Depending on the significance of the records in question, one should consider removing them from the dataset; however, this choice will depend on a variety of context-specific factors and should be made carefully. Either way, the presence/absence of these unmappable records and an explanation for why they could not be mapped should be clearly documented in the ETL documentation.
Since unmapped records will not be picked up in standard OHDSI analytic workflows, this is an important check failure to understand. Utilize the investigation queries above to understand the scope and impact of the mapping failures on your specific analytic use case. If none of the affected codes seem to be relevant for your analysis, it may be acceptable to ignore the failure. However, since it is not always possible to understand exactly what a given source value represents, you should proceed with caution and confirm any findings with your ETL provider if possible.
In the case where the source concept ID column is populated with a legitimate OMOP concept, it will be possible to query this column instead of the standard concept column in your analyses. However, doing so will require building source concept sets and as such losing the power of the OMOP standard vocabularies in defining comprehensive, generalizable cohort definitions.