This section contains detailed descriptions of the data quality checks included in the DataQualityDashboard package. Each check is described on its own page; click on the check name in the list below or in the dropdown menu above to navigate to the check’s documentation page.
The DataQualityDashboard functions by applying over 20 parameterized check types to a CDM instance, resulting in thousands of resolved, executed, and evaluated individual data quality checks. For example, one check type might be written as
The number and percent of records with a value in the cdmFieldName field of the cdmTableName table less than plausibleValueLow.
This would be considered an atemporal plausibility verification check because we are looking for implausibly low values in some field based on internal knowledge. We can use this check type to substitute in values for cdmFieldName, cdmTableName, and plausibleValueLow to create a unique data quality check. If we apply it to PERSON.YEAR_OF_BIRTH here is how that might look:
The number and percent of records with a value in the year_of_birth field of the PERSON table less than 1850.
And, since it is parameterized, we can similarly apply it to DRUG_EXPOSURE.days_supply:
The number and percent of records with a value in the days_supply field of the DRUG_EXPOSURE table less than 0.
Version 1 of the tool includes over 20 different check types organized into Kahn contexts and categories (link to paper). Additionally, each data quality check type is considered either a table check, field check, or concept-level check. Table-level checks are those evaluating the table at a high-level without reference to individual fields, or those that span multiple event tables. These include checks making sure required tables are present or that at least some of the people in the PERSON table have records in the event tables. Field-level checks are those related to specific fields in a table. The majority of the check types in version 1 are field-level checks. These include checks evaluating primary key relationship and those investigating if the concepts in a field conform to the specified domain. Concept-level checks are related to individual concepts. These include checks looking for gender-specific concepts in persons of the wrong gender and plausible values for measurement-unit pairs.
This article will detail each check type, its name, check level, description, definition, and to which Kahn context, category, and subcategory it belongs.
/*violatedRowsBegin*/
and
/*violatedRowsEnd*/
) from the SQL query displayed in the
DQD results viewer for a given check to inspect rows that failed the
check