vignettes/checks/measureValueCompleteness.Rmd
measureValueCompleteness.Rmd
Level: FIELD
Context: Verification
Category: Completeness
Subcategory:
Severity: Characterization ✔
The number and percent of records with a NULL value in the @cdmFieldName of the @cdmTableName.
This check’s primary purpose is to characterize completeness of non-required fields in the OMOP CDM. It is most useful when the failure threshold for each non-required field is customized to expectations based on the source data being transformed into OMOP. In this case, the check can be used to catch unexpected missingness due to ETL errors. However, in all cases, this check will serve as a useful characterization to help data users understand if a CDM contains the right data for a given analysis.
While the failure threshold is set to 0 for required fields, note
that this is duplicative with the isRequired
check - and
fixing one failure will resolve the other!
Failures of this check on required fields are redundant with failures
of isRequired
. See isRequired
documentation for more information.
ETL developers have 2 main options for the use of this check on non-required fields:
Unexpectedly missing values should be investigated for a potential root cause in the ETL. If a threshold has been adjusted to account for expected missingness, this should be clearly communicated to data users so that they can know when and when not to expect data to be present in each field.
This check informs you of the level of missing data in each column of
the CDM. If data is missing in a required column, see the
isRequired
documentation for more information.
The interpretation of a check failure on a non-required column will depend on the context. In some cases, the threshold for this check will have been very deliberately set, and any failure should be cause for concern unless justified and explained by your ETL provider. In other cases, even if the check fails it may not be worrisome if the check result is in line with your expectations given the source of the data. When in doubt, utilize the inspection query above to ensure you can explain the missing values.
Of course, if there is a failure on a non-required field you know that you will not need in your analysis (for example, missing drug quantity in an analysis not utilizing drug data), the check failure may be safely ignored.