
Cohort-specific measurement diagnostics
Source:vignettes/a02_summariseCohortMeasurementUse.Rmd
a02_summariseCohortMeasurementUse.RmdIntroduction
This vignette demonstrates how to use
summariseCohortMeasurementUse() from
MeasurementDiagnostics to perform measurement
diagnostics restricted to a cohort.
The function computes the same three diagnostic checks available for
full-dataset summaries (measurement_summary,
measurement_value_as_number, and
measurement_value_as_concept) but limits the analysis to
measurements recorded for subjects in a specified cohort, and optionally
to specific times relative to cohort entry.
We use package provided mock data for the examples.
Basic usage
We begin by running diagnostics for a simple measurement codelist
within an example cohort. Diagnostics are performed on the measurement
concepts provided in codes, restricted to measurement
records observed among subjects while they are part of the cohort.
result <- summariseCohortMeasurementUse(
codes = list("measurement_codelist" = c(3001467L, 45875977L)),
cohort = cdm$my_cohort
)
# Inspect structure
result |> glimpse()
#> Rows: 12,436
#> Columns: 13
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name <chr> "mock database", "mock database", "mock database", "m…
#> $ group_name <chr> "cohort_name &&& codelist_name", "cohort_name &&& cod…
#> $ group_level <chr> "cohort_1 &&& measurement_codelist", "cohort_1 &&& me…
#> $ strata_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name <chr> "cohort_records", "cohort_subjects", "cohort_records"…
#> $ variable_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ estimate_name <chr> "count", "count", "count", "count", "count", "count",…
#> $ estimate_type <chr> "integer", "integer", "integer", "integer", "integer"…
#> $ estimate_value <chr> "100", "62", "100", "61", "21", "10", "8", "67", "50"…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…Results are returned as a summarised_result object (see
omopgenerics
package).
As an example, the table below shows the
measurement_value_as_concept results.
From this output, we can see that for this codelist and subject in our cohort, some measurement values are recorded using concepts for “Low” and “High”, while others are missing a concept value.
tableMeasurementValueAsConcept(result)| CDM name | Cohort name | Concept name | Concept ID | Source concept name | Source concept ID | Domain ID | Value as concept name | Value as concept ID | Estimate name | Estimate value |
|---|---|---|---|---|---|---|---|---|---|---|
| measurement_codelist | ||||||||||
| mock database | cohort_1 | overall | overall | overall | overall | overall | Low | 4267416 | N (%) | 5 (18.52%) |
| cohort_2 | overall | overall | overall | overall | overall | Low | 4267416 | N (%) | 4 (30.77%) | |
| cohort_1 | overall | overall | overall | overall | overall | High | 4328749 | N (%) | 12 (44.44%) | |
| cohort_2 | overall | overall | overall | overall | overall | High | 4328749 | N (%) | 3 (23.08%) | |
| cohort_1 | overall | overall | overall | overall | overall | NA | NA | N (%) | 10 (37.04%) | |
| cohort_2 | overall | overall | overall | overall | overall | NA | NA | N (%) | 6 (46.15%) | |
| cohort_1 | Alkaline phosphatase.bone [Enzymatic activity/volume] in Serum or Plasma | 3001467 | NA | NA | Measurement | Low | 4267416 | N (%) | 5 (18.52%) | |
| cohort_2 | Alkaline phosphatase.bone [Enzymatic activity/volume] in Serum or Plasma | 3001467 | NA | NA | Measurement | Low | 4267416 | N (%) | 4 (30.77%) | |
| cohort_1 | Alkaline phosphatase.bone [Enzymatic activity/volume] in Serum or Plasma | 3001467 | NA | NA | Measurement | High | 4328749 | N (%) | 12 (44.44%) | |
| cohort_2 | Alkaline phosphatase.bone [Enzymatic activity/volume] in Serum or Plasma | 3001467 | NA | NA | Measurement | High | 4328749 | N (%) | 3 (23.08%) | |
| cohort_1 | Alkaline phosphatase.bone [Enzymatic activity/volume] in Serum or Plasma | 3001467 | NA | NA | Measurement | NA | NA | N (%) | 10 (37.04%) | |
| cohort_2 | Alkaline phosphatase.bone [Enzymatic activity/volume] in Serum or Plasma | 3001467 | NA | NA | Measurement | NA | NA | N (%) | 6 (46.15%) | |
Next, we examine the measurement_value_as_number
results. This table shows the range of numeric measurement values for
the overall codelist and for each individual concept, stratified by unit
where available.
In the following results we see some numeric values referring to kilograms (unit concept), while other are not associated with any unit, and lastly there are 4 records with missing values as numbers.
The table shows results for the overall codelist, and for each concept separately.
tableMeasurementValueAsNumber(result)| CDM name | Cohort name | Concept name | Concept ID | Source concept name | Source concept ID | Domain ID | Unit concept name | Unit concept ID | Estimate name | Estimate value |
|---|---|---|---|---|---|---|---|---|---|---|
| measurement_codelist | ||||||||||
| mock database | cohort_1 | overall | overall | overall | overall | overall | kilogram | 9529 | N | 16 |
| Median [Q25 – Q75] | 7.83 [6.78 – 9.79] | |||||||||
| Q05 – Q95 | 5.47 – 11.67 | |||||||||
| Q01 – Q99 | 5.38 – 11.85 | |||||||||
| Range | 5.36 to 11.89 | |||||||||
| Missing value, N (%) | 0 (0.00%) | |||||||||
| NA | - | N | 11 | |||||||
| Median [Q25 – Q75] | 7.18 [6.60 – 8.77] | |||||||||
| Q05 – Q95 | 5.84 – 10.34 | |||||||||
| Q01 – Q99 | 5.52 – 10.71 | |||||||||
| Range | 5.44 to 10.80 | |||||||||
| Missing value, N (%) | 2 (18.18%) | |||||||||
| Alkaline phosphatase.bone [Enzymatic activity/volume] in Serum or Plasma | 3001467 | NA | - | Measurement | kilogram | 9529 | N | 16 | ||
| Median [Q25 – Q75] | 7.83 [6.78 – 9.79] | |||||||||
| Q05 – Q95 | 5.47 – 11.67 | |||||||||
| Q01 – Q99 | 5.38 – 11.85 | |||||||||
| Range | 5.36 to 11.89 | |||||||||
| Missing value, N (%) | 0 (0.00%) | |||||||||
| NA | - | N | 11 | |||||||
| Median [Q25 – Q75] | 7.18 [6.60 – 8.77] | |||||||||
| Q05 – Q95 | 5.84 – 10.34 | |||||||||
| Q01 – Q99 | 5.52 – 10.71 | |||||||||
| Range | 5.44 to 10.80 | |||||||||
| Missing value, N (%) | 2 (18.18%) | |||||||||
| cohort_2 | overall | overall | overall | overall | overall | kilogram | 9529 | N | 7 | |
| Median [Q25 – Q75] | 7.10 [6.67 – 8.12] | |||||||||
| Q05 – Q95 | 5.81 – 9.51 | |||||||||
| Q01 – Q99 | 5.57 – 9.79 | |||||||||
| Range | 5.51 to 9.86 | |||||||||
| Missing value, N (%) | 0 (0.00%) | |||||||||
| NA | - | N | 6 | |||||||
| Median [Q25 – Q75] | 9.57 [7.39 – 10.88] | |||||||||
| Q05 – Q95 | 6.27 – 11.27 | |||||||||
| Q01 – Q99 | 6.07 – 11.36 | |||||||||
| Range | 6.02 to 11.38 | |||||||||
| Missing value, N (%) | 0 (0.00%) | |||||||||
| Alkaline phosphatase.bone [Enzymatic activity/volume] in Serum or Plasma | 3001467 | NA | - | Measurement | kilogram | 9529 | N | 7 | ||
| Median [Q25 – Q75] | 7.10 [6.67 – 8.12] | |||||||||
| Q05 – Q95 | 5.81 – 9.51 | |||||||||
| Q01 – Q99 | 5.57 – 9.79 | |||||||||
| Range | 5.51 to 9.86 | |||||||||
| Missing value, N (%) | 0 (0.00%) | |||||||||
| NA | - | N | 6 | |||||||
| Median [Q25 – Q75] | 9.57 [7.39 – 10.88] | |||||||||
| Q05 – Q95 | 6.27 – 11.27 | |||||||||
| Q01 – Q99 | 6.07 – 11.36 | |||||||||
| Range | 6.02 to 11.38 | |||||||||
| Missing value, N (%) | 0 (0.00%) | |||||||||
Timing options
The timing argument controls which measurement records
are considered:
"any"— any measurement record for subjects in the cohort (no timing restriction)"during"— measurements while the subject is in the cohort (default)"cohort_start_date"— measurements recorded on the cohort start date
The following example shows measurement summary results when using
timing = "any" and
timing = "cohort_start_date". As expected, when using “any”
timing we get much more measurements than when restricting to
measurements occurring on “cohort_start_date”.
result_any <- summariseCohortMeasurementUse(
codes = list("measurement_codelist" = c(3001467L, 45875977L)),
cohort = cdm$my_cohort,
timing = "any"
)
result_cohort_start <- summariseCohortMeasurementUse(
codes = list("measurement_codelist" = c(3001467L, 45875977L)),
cohort = cdm$my_cohort,
timing = "cohort_start_date"
)
tableMeasurementSummary(result_any)| CDM name | Cohort name | Codelist name | Variable name | Estimate name | Estimate value |
|---|---|---|---|---|---|
| mock database | cohort_1 | measurement_codelist | Cohort records | N | 100 |
| Cohort subjects | N | 62 | |||
| Number subjects | N (%) | 59 (95.16%) | |||
| Days between measurements | Median [Q25 – Q75] | 198 [58 – 921] | |||
| Range | 8 to 2,886 | ||||
| Measurements per subject | Median [Q25 – Q75] | 1.00 [1.00 – 2.00] | |||
| Range | 1.00 to 4.00 | ||||
| cohort_2 | measurement_codelist | Cohort records | N | 100 | |
| Cohort subjects | N | 61 | |||
| Number subjects | N (%) | 37 (60.66%) | |||
| Days between measurements | Median [Q25 – Q75] | 71 [41 – 710] | |||
| Range | 8 to 2,098 | ||||
| Measurements per subject | Median [Q25 – Q75] | 1.00 [1.00 – 1.00] | |||
| Range | 1.00 to 4.00 |
tableMeasurementSummary(result_cohort_start)| CDM name | Cohort name | Codelist name | Variable name | Estimate name | Estimate value |
|---|---|---|---|---|---|
| mock database | cohort_1 | measurement_codelist | Cohort records | N | 100 |
| Cohort subjects | N | 62 | |||
| Number subjects | N (%) | 1 (1.61%) | |||
| Days between measurements | Median [Q25 – Q75] | – | |||
| Range | – | ||||
| Measurements per subject | Median [Q25 – Q75] | 1.00 [1.00 – 1.00] | |||
| Range | 1.00 to 1.00 | ||||
| cohort_2 | measurement_codelist | Number subjects | N (%) | 0 (0.00%) |
Measurement cohorts
If no explicit codelist is provided (codes = NULL), the
function will use the concept set associated with the cohort (if exists)
to perform diagnostics.
For example, using CohortConstructor, we can create a cohort based on measurement concepts. This cohort stores the codelist used to define it as an attribute.
cdm$measurement_cohort <- conceptCohort(
cdm = cdm,
conceptSet = list("measurement_codelist" = c(3001467L, 45875977L)),
name = "measurement_cohort"
)
cohortCodelist(cdm$measurement_cohort)
#>
#> - measurement_codelist (2 codes)We can then call summariseCohortMeasurementUse() without
specifying codes. In this case, the function automatically uses the
codelist associated with the cohort. The example below runs diagnostics
on the measurement records used to define cohort entry:
result <- summariseCohortMeasurementUse(
cohort = cdm$measurement_cohort,
timing = "cohort_start_date"
)
tableMeasurementValueAsNumber(result)| CDM name | Cohort name | Concept name | Concept ID | Source concept name | Source concept ID | Domain ID | Unit concept name | Unit concept ID | Estimate name | Estimate value |
|---|---|---|---|---|---|---|---|---|---|---|
| measurement_codelist | ||||||||||
| mock database | measurement_codelist | overall | overall | overall | overall | overall | kilogram | 9529 | N | 50 |
| Median [Q25 – Q75] | 8.77 [7.07 – 10.48] | |||||||||
| Q05 – Q95 | 5.70 – 11.84 | |||||||||
| Q01 – Q99 | 5.43 – 12.11 | |||||||||
| Range | 5.36 to 12.18 | |||||||||
| Missing value, N (%) | 2 (4.00%) | |||||||||
| NA | - | N | 50 | |||||||
| Median [Q25 – Q75] | 8.77 [7.10 – 10.44] | |||||||||
| Q05 – Q95 | 5.77 – 11.77 | |||||||||
| Q01 – Q99 | 5.50 – 12.04 | |||||||||
| Range | 5.44 to 12.11 | |||||||||
| Missing value, N (%) | 3 (6.00%) | |||||||||
| Alkaline phosphatase.bone [Enzymatic activity/volume] in Serum or Plasma | 3001467 | NA | - | Measurement | kilogram | 9529 | N | 50 | ||
| Median [Q25 – Q75] | 8.77 [7.07 – 10.48] | |||||||||
| Q05 – Q95 | 5.70 – 11.84 | |||||||||
| Q01 – Q99 | 5.43 – 12.11 | |||||||||
| Range | 5.36 to 12.18 | |||||||||
| Missing value, N (%) | 2 (4.00%) | |||||||||
| NA | - | N | 50 | |||||||
| Median [Q25 – Q75] | 8.77 [7.10 – 10.44] | |||||||||
| Q05 – Q95 | 5.77 – 11.77 | |||||||||
| Q01 – Q99 | 5.50 – 12.04 | |||||||||
| Range | 5.44 to 12.11 | |||||||||
| Missing value, N (%) | 3 (6.00%) | |||||||||
Stratifications
In the following example, we restrict diagnostics to the
measurement_summary check and stratify results by
sex. The resulting table shows, for each stratum, the
number of subjects with measurements, the number of measurements per
subject, and the time between measurements.
*Note that the percentage of subjects with measurements
(Number subjects) is calculated relative to the total
number of subjects in the cohort, independent of stratification
variables such as sex, age, or year.
result <- summariseCohortMeasurementUse(
cohort = cdm$measurement_cohort,
bySex = TRUE,
byConcept = FALSE,
timing = "any",
checks = "measurement_summary"
)
tableMeasurementSummary(result)| CDM name | Cohort name | Codelist name | Variable name | Estimate name |
Sex
|
||
|---|---|---|---|---|---|---|---|
| overall | Female | Male | |||||
| mock database | measurement_codelist | measurement_codelist | Cohort records | N | 100 | – | – |
| Cohort subjects | N | 67 | – | – | |||
| Number subjects | N (%) | 67 (100.00%) | 40 (59.70%) | 27 (40.30%) | |||
| Days between measurements | Median [Q25 – Q75] | 249 [67 – 645] | 240 [53 – 1,133] | 267 [81 – 415] | |||
| Range | 8 to 2,886 | 8 to 2,886 | 8 to 2,743 | ||||
| Measurements per subject | Median [Q25 – Q75] | 1.00 [1.00 – 2.00] | 1.00 [1.00 – 2.00] | 1.00 [1.00 – 2.00] | |||
| Range | 1.00 to 4.00 | 1.00 to 4.00 | 1.00 to 3.00 | ||||
Other arguments
Additional arguments allow users to further stratify results,
restrict the date range of measurement records, customise the set of
summary estimates, and obtain counts to plot histograms These options
behave in the same way as in summariseMeasurementUse(), and
are described in more detail in the “Summarising measurement use in
a dataset” vignette.