NEWS.md
This release includes:
There is now a parameter, checkSeverity
, which can be used to limit the execution of DQD to fatal
, convention
, and/or characterization
checks. Fatal checks are checks that should never fail, under any circumstance, as they relate to the relational integrity of the CDM. Convention checks are checks on critical OMOP CDM conventions for which failures should be resolved whenever possible; however, some level of failure is unavoidable (i.e., standard concept mapping for source concepts with no suitable standard concept). Characterization checks provide users with an understanding of the quality of the underlying data and generally will need their thresholds modified to match expectations of the source.
This release includes:
plausibleStartBeforeEnd
was failing if SOURCE_RELEASE_DATE was before CDM_RELEASE_DATE in the CDM_SOURCE table. This is the opposite of the correct logic! The check is now updated to fail if the CDM_RELEASE_DATE is before the SOURCE_RELEASE_DATEplausibleTemporalAfter
was throwing a syntax error in BigQuery due to the format of a hardcoded date in the SQL query. This query has now been updated to be compliant with SqlRender and the issue has been resolvedviewDqDashboard
to error out in newer versions of R. This has now been resolvedSqlOnly
mode was failing due to the format of the new check plausibleGenderUseDescendants
, which takes multiple concepts as an input. This has now been fixedexecutionTimeSeconds
. This field stores the execution time in seconds of each check in numeric format. (The existing executionTime
field stores execution time as a string, making it difficult to use in analysis.)The default thresholds for 2 checks were discovered to be inconsistently populated and occasionally set to illogical levels. These have now been fixed as detailed below.
sourceValueCompleteness
have been updated as follows:
_source_value
columns in condition_occurrence, measurement, procedure_occurrence, drug_exposure, and visit_occurrence tables_source_value
columnssourceConceptRecordCompleteness
have been updated as follows:
_source_concept_id
columns in condition_occurrence, drug_exposure, measurement, procedure_occurrence, device_exposure, and observation tables_source_concept_id
columnsWe have continued (and nearly completed) our initiative to add more comprehensive user documentation at the data quality check level. A dedicated documentation page is being created for each check type. Each check’s page includes detailed information about how its result is generated and what to do if it fails. Guidance is provided for both ETL developers and data users.
Check out the newly added pages here and please reach out with feedback as we continue improving our documentation!
This release includes:
4 new data quality check types have been added in this release:
plausibleStartBeforeEnd
: The number and percent of records with a value in the cdmFieldName field of the cdmTableName that occurs after the date in the plausibleStartBeforeEndFieldName.plausibleAfterBirth
: The number and percent of records with a date value in the cdmFieldName field of the cdmTableName table that occurs prior to birth.plausibleBeforeDeath
: The number and percent of records with a date value in the cdmFieldName field of the cdmTableName table that occurs after death.plausibleGenderUseDescendants
: For descendants of CONCEPT_ID conceptId (conceptName), the number and percent of records associated with patients with an implausible gender (correct gender = plausibleGenderUseDescendants).The 3 temporal plausibilty checks are intended to replace plausibleTemporalAfter
and plausibleDuringLife
, for a more comprehensive and clear approach to various temporality scenarios. plausibleGenderUseDescendants
is intended to replace plausibleGender
, to enhance readability of the DQD results and improve performance. The replaced checks are still available and enabled by default in DQD; however, in a future major release, these checks will be deprecated. Please plan accordingly.
For more information on the new checks, please check the Check Type Definitions documentation page. If you’d like to disable the deprecated checks, please see the suggested check exclusion workflow in our Getting Started code here.
plausibleUnitConceptIds
has been reduced, and the lists of plausible units for those measurements have been re-reviewed and updated for accuracy. This change is intended to improve performance and reliability of this check. Please file an issue if you would like to contribute additional measurements + plausible units to be checked in the futureplausibleValueLow
thresholds have been corrected to prevent false positive failures from occurringWe have begun an initiative to add more comprehensive user documentation at the data quality check level. A dedicated documentation page is being created for each check type. Each check’s page will include detailed information about how its result is generated and what to do if it fails. Guidance is provided for both ETL developers and data users.
9 pages have been added so far, and the rest will come in a future release. Check them out here and please reach out with feedback as we continue improving our documentation!
This release includes:
A new function writeDBResultsToJson
which can be used to write DQD results previously written to a database table (by setting writeToTable
= TRUE in executeDqChecks
or by using the writeJsonResultsToTable
function) into a JSON file in the standard DQD JSON format.
vocabDatabaseSchema
where appropriateThis release includes:
vroom
This release includes:
The following changes involve updates to the default data quality check threshold files. If you are currently using an older version of DQD and update to v2.4.0, you may see changes in your DQD results. The failure threshold changes are fixes to incorrect thresholds in the v5.4 files and thus should result in more accurate, easier to interpret results. The unit concept ID changes ensure that long-invalid concepts will no longer be accepted as plausible measurement units.
measurePersonCompleteness
and measureValueCompleteness
were fixed in the v5.4 table & field level threshold files. This issue has existed since v5.4 support was initially added in March 2022
measurePersonCompleteness
checks had a threshold of 0 when it should have been 95 or 100measureValueCompleteness
checks had a threshold of 100 when it should have been 0, and many had no threshold (defaulting to 0) when it should have been 100measurePersonCompleteness
for the DEATH table has been toggled to Yes
, with a threshold of 100plausibleUnitConceptIds
have been updated to 720870. Concept 9117 became non-standard and was replaced with concept 720870, on 28-Mar-2022plausibleUnitConceptIds
have been removed. These concepts were deprecated on 05-May-2022convertJsonResultsFileCase
in Shiny app was appended with DataQualityDashboard::
. This prevents potential issues related to package loading and function naming conflictsSome minor refactoring of testthat files and package build configuration and some minor documentation updates were also added in this release.
This release includes:
sqlOnly
and sqlOnlyIncrementalInsert
to TRUE in executeDqChecks
will return (but not run) a set of SQL queries that, when executed, will calculate the results of the DQ checks and insert them into a database table. Additionally, sqlOnlyUnionCount
can be used to specify a number of SQL queries to union for each check type, allowing for parallel execution of these queries and potentially large performance gains. See the SqlOnly vignette for detailsconvertJsonResultsFileCase
can be used to convert the keys in a DQD results JSON file between snakecase and camelcase. This allows reading of v2.1.0+ JSON files in older DQD versions, and other conversions which may be necessary for secondary use of the DQD results file. See function documentation for detailsviewDqDashboard
will now automatically convert the case of pre-v2.1.0 results files to camelcase so that older results files may be viewed in v2.3.0+This release includes:
cohortTableName
parameter added to executeDqChecks
. Allows user to specify the name of the cohort table when running DQD on a cohort. Defaults to "cohort"
YYYYMMDD
to conform to SqlRender standardvocabDatabaseSchema
and cohortDatabaseSchema
where appropriateoutputFile
parameter from DQD setup vignette (variable not set in script)And some minor documentation updates for clarity/accuracy.
This release includes:
offset
column name in v5.4 thresholds file so that this column is skipped by DQD in all cases (use of reserved word causes failures in some SQL dialects)This release includes:
outputFolder
parameter for the executeDqChecks
function is now REQUIRED and no longer has a default value. This may be a breaking change for users who have not specified this parameter in their script to run DQD.
No material changes from v1.4, this adds a correct DESCRIPTION
file with the correct DQD version
This release provides support for CDM v5.4
and incorporates minor bug fixes related to incorrectly assigned checks in the control files.
This fixes a small bug and removes a duplicate record in the concept level checks that was throwing an error.
This release includes additional concept level checks to support the OHDSI Symposium 2020 study-a-thon and bug fixes to the writeJSONToTable
function. This is the release that study-a-thon data partners should use.
This is a bug fix release that updates how notes are viewed in the UI and adds CDM table, field, and check name to the final table.