vignettes/checks/plausibleAfterBirth.Rmd
plausibleAfterBirth.Rmd
Level: Field check
Context: Verification
Category: Plausibility
Subcategory: Temporal
Severity: Characterization ✔
The number and percent of records with a date value in the
cdmFieldName field of the cdmTableName
table that occurs prior to birth. Note that this check replaces the
previous plausibleTemporalAfter
check.
This check verifies that events happen after birth. The birthdate is
taken from the person
table, either the
birth_datetime
or composed from year_of_birth
,
month_of_birth
, day_of_birth
(taking 1st
month/1st day if missing).
There might be valid reasons why a record has a date value that occurs prior to birth. For example, prenatal observations might be captured or procedures on the mother might be added to the file of the child. Therefore, some failing records are expected and the default threshold of 1% accounts for that.
However, if more records violate this check, there might be an issue with incorrect birthdates or events with a default date. It is recommended to investigate the records that fail this check to determine the cause of the error and set proper dates. If it is impossible to fix, then implement one of these:
Make sure to clearly document the choices in your ETL specification.
You may also use the “violated rows” SQL query to inspect the violating rows and help diagnose the potential root cause of the issue:
SELECT
p.birth_datetime,
cdmTable.*
FROM @cdmDatabaseSchema.@cdmTableName cdmTable
JOIN @cdmDatabaseSchema.person p ON cdmTable.person_id = p.person_id
WHERE cdmTable.@cdmFieldName < p.birth_datetime,
or, when birth_datetime is missing change to year, month, day columns:
SELECT
p.year_of_birth,
p.month_of_birth,
p.day_of_birth,
cdmTable.*
FROM @cdmDatabaseSchema.@cdmTableName cdmTable
JOIN @cdmDatabaseSchema.person p ON cdmTable.person_id = p.person_id
WHERE cdmTable.@cdmFieldName < CAST(CONCAT(
p.year_of_birth,
COALESCE(
RIGHT('0' + CAST(p.month_of_birth AS VARCHAR), 2),
'01'
),
COALESCE(
RIGHT('0' + CAST(p.day_of_birth AS VARCHAR), 2),
'01'
)
) AS DATE)
Also, the length of the time interval between these dates might give you a hint of why the problem appears.
select
date_difference,
COUNT(*)
FROM (
SELECT DATEDIFF(
DAY,
@cdmFieldName,
COALESCE(
CAST(p.birth_datetime AS DATE),
CAST(CONCAT(p.year_of_birth,'-01-01') AS DATE))
) AS date_difference
FROM @cdmTableName ct
JOIN person p ON ct.person_id = p.person_id
) cte
WHERE date_difference > 0
GROUP BY date_difference
ORDER BY COUNT(*) DESC
;