Cohort-specific measurement diagnostics • MeasurementDiagnostics

Introduction

This vignette demonstrates how to use summariseCohortMeasurementUse() from MeasurementDiagnostics to perform measurement diagnostics restricted to a cohort.

The function computes the same three diagnostic checks available for full-dataset summaries (measurement_summary, measurement_value_as_number, and measurement_value_as_concept) but limits the analysis to measurements recorded for subjects in a specified cohort, and optionally to specific times relative to cohort entry.

We use package provided mock data for the examples.

library(MeasurementDiagnostics)
library(dplyr)
library(omopgenerics) 
library(CohortConstructor)

cdm <- mockMeasurementDiagnostics()
cdm

Basic usage

We begin by running diagnostics for a simple measurement codelist within an example cohort. Diagnostics are performed on the measurement concepts provided in codes, restricted to measurement records observed among subjects while they are part of the cohort.

result <- summariseCohortMeasurementUse(
  codes = list("measurement_codelist" = c(3001467L, 45875977L)),
  cohort = cdm$my_cohort
)

# Inspect structure
result |> glimpse()
#> Rows: 12,436
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name         <chr> "mock database", "mock database", "mock database", "m…
#> $ group_name       <chr> "cohort_name &&& codelist_name", "cohort_name &&& cod…
#> $ group_level      <chr> "cohort_1 &&& measurement_codelist", "cohort_1 &&& me…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "cohort_records", "cohort_subjects", "cohort_records"…
#> $ variable_level   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ estimate_name    <chr> "count", "count", "count", "count", "count", "count",…
#> $ estimate_type    <chr> "integer", "integer", "integer", "integer", "integer"…
#> $ estimate_value   <chr> "100", "62", "100", "61", "21", "10", "8", "67", "50"…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…

Results are returned as a summarised_result object (see omopgenerics package).

As an example, the table below shows the measurement_value_as_concept results.

From this output, we can see that for this codelist and subject in our cohort, some measurement values are recorded using concepts for “Low” and “High”, while others are missing a concept value.

tableMeasurementValueAsConcept(result)

CDM name	Cohort name	Concept name	Concept ID	Source concept name	Source concept ID	Domain ID	Value as concept name	Value as concept ID	Estimate name	Estimate value
measurement_codelist
mock database	cohort_1	overall	overall	overall	overall	overall	Low	4267416	N (%)	5 (18.52%)
	cohort_2	overall	overall	overall	overall	overall	Low	4267416	N (%)	4 (30.77%)
	cohort_1	overall	overall	overall	overall	overall	High	4328749	N (%)	12 (44.44%)
	cohort_2	overall	overall	overall	overall	overall	High	4328749	N (%)	3 (23.08%)
	cohort_1	overall	overall	overall	overall	overall	NA	NA	N (%)	10 (37.04%)
	cohort_2	overall	overall	overall	overall	overall	NA	NA	N (%)	6 (46.15%)
	cohort_1	Alkaline phosphatase.bone [Enzymatic activity/volume] in Serum or Plasma	3001467	NA	NA	Measurement	Low	4267416	N (%)	5 (18.52%)
	cohort_2	Alkaline phosphatase.bone [Enzymatic activity/volume] in Serum or Plasma	3001467	NA	NA	Measurement	Low	4267416	N (%)	4 (30.77%)
	cohort_1	Alkaline phosphatase.bone [Enzymatic activity/volume] in Serum or Plasma	3001467	NA	NA	Measurement	High	4328749	N (%)	12 (44.44%)
	cohort_2	Alkaline phosphatase.bone [Enzymatic activity/volume] in Serum or Plasma	3001467	NA	NA	Measurement	High	4328749	N (%)	3 (23.08%)
	cohort_1	Alkaline phosphatase.bone [Enzymatic activity/volume] in Serum or Plasma	3001467	NA	NA	Measurement	NA	NA	N (%)	10 (37.04%)
	cohort_2	Alkaline phosphatase.bone [Enzymatic activity/volume] in Serum or Plasma	3001467	NA	NA	Measurement	NA	NA	N (%)	6 (46.15%)

Next, we examine the measurement_value_as_number results. This table shows the range of numeric measurement values for the overall codelist and for each individual concept, stratified by unit where available.

In the following results we see some numeric values referring to kilograms (unit concept), while other are not associated with any unit, and lastly there are 4 records with missing values as numbers.

The table shows results for the overall codelist, and for each concept separately.

tableMeasurementValueAsNumber(result)

CDM name	Cohort name	Concept name	Concept ID	Source concept name	Source concept ID	Domain ID	Unit concept name	Unit concept ID	Estimate name	Estimate value
measurement_codelist
mock database	cohort_1	overall	overall	overall	overall	overall	kilogram	9529	N	16
									Median [Q25 – Q75]	7.83 [6.78 – 9.79]
									Q05 – Q95	5.47 – 11.67
									Q01 – Q99	5.38 – 11.85
									Range	5.36 to 11.89
									Missing value, N (%)	0 (0.00%)
							NA	-	N	11
									Median [Q25 – Q75]	7.18 [6.60 – 8.77]
									Q05 – Q95	5.84 – 10.34
									Q01 – Q99	5.52 – 10.71
									Range	5.44 to 10.80
									Missing value, N (%)	2 (18.18%)
		Alkaline phosphatase.bone [Enzymatic activity/volume] in Serum or Plasma	3001467	NA	-	Measurement	kilogram	9529	N	16
									Median [Q25 – Q75]	7.83 [6.78 – 9.79]
									Q05 – Q95	5.47 – 11.67
									Q01 – Q99	5.38 – 11.85
									Range	5.36 to 11.89
									Missing value, N (%)	0 (0.00%)
							NA	-	N	11
									Median [Q25 – Q75]	7.18 [6.60 – 8.77]
									Q05 – Q95	5.84 – 10.34
									Q01 – Q99	5.52 – 10.71
									Range	5.44 to 10.80
									Missing value, N (%)	2 (18.18%)
	cohort_2	overall	overall	overall	overall	overall	kilogram	9529	N	7
									Median [Q25 – Q75]	7.10 [6.67 – 8.12]
									Q05 – Q95	5.81 – 9.51
									Q01 – Q99	5.57 – 9.79
									Range	5.51 to 9.86
									Missing value, N (%)	0 (0.00%)
							NA	-	N	6
									Median [Q25 – Q75]	9.57 [7.39 – 10.88]
									Q05 – Q95	6.27 – 11.27
									Q01 – Q99	6.07 – 11.36
									Range	6.02 to 11.38
									Missing value, N (%)	0 (0.00%)
		Alkaline phosphatase.bone [Enzymatic activity/volume] in Serum or Plasma	3001467	NA	-	Measurement	kilogram	9529	N	7
									Median [Q25 – Q75]	7.10 [6.67 – 8.12]
									Q05 – Q95	5.81 – 9.51
									Q01 – Q99	5.57 – 9.79
									Range	5.51 to 9.86
									Missing value, N (%)	0 (0.00%)
							NA	-	N	6
									Median [Q25 – Q75]	9.57 [7.39 – 10.88]
									Q05 – Q95	6.27 – 11.27
									Q01 – Q99	6.07 – 11.36
									Range	6.02 to 11.38
									Missing value, N (%)	0 (0.00%)

Timing options

The timing argument controls which measurement records are considered:

"any" — any measurement record for subjects in the cohort (no timing restriction)
"during" — measurements while the subject is in the cohort (default)
"cohort_start_date" — measurements recorded on the cohort start date

The following example shows measurement summary results when using timing = "any" and timing = "cohort_start_date". As expected, when using “any” timing we get much more measurements than when restricting to measurements occurring on “cohort_start_date”.

result_any <- summariseCohortMeasurementUse(
  codes = list("measurement_codelist" = c(3001467L, 45875977L)),
  cohort = cdm$my_cohort,
  timing = "any"
)

result_cohort_start <- summariseCohortMeasurementUse(
  codes = list("measurement_codelist" = c(3001467L, 45875977L)),
  cohort = cdm$my_cohort,
  timing = "cohort_start_date"
)

tableMeasurementSummary(result_any)

CDM name	Cohort name	Codelist name	Variable name	Estimate name	Estimate value
mock database	cohort_1	measurement_codelist	Cohort records	N	100
			Cohort subjects	N	62
			Number subjects	N (%)	59 (95.16%)
			Days between measurements	Median [Q25 – Q75]	198 [58 – 921]
				Range	8 to 2,886
			Measurements per subject	Median [Q25 – Q75]	1.00 [1.00 – 2.00]
				Range	1.00 to 4.00
	cohort_2	measurement_codelist	Cohort records	N	100
			Cohort subjects	N	61
			Number subjects	N (%)	37 (60.66%)
			Days between measurements	Median [Q25 – Q75]	71 [41 – 710]
				Range	8 to 2,098
			Measurements per subject	Median [Q25 – Q75]	1.00 [1.00 – 1.00]
				Range	1.00 to 4.00

tableMeasurementSummary(result_cohort_start)

CDM name	Cohort name	Codelist name	Variable name	Estimate name	Estimate value
mock database	cohort_1	measurement_codelist	Cohort records	N	100
			Cohort subjects	N	62
			Number subjects	N (%)	1 (1.61%)
			Days between measurements	Median [Q25 – Q75]	–
				Range	–
			Measurements per subject	Median [Q25 – Q75]	1.00 [1.00 – 1.00]
				Range	1.00 to 1.00
	cohort_2	measurement_codelist	Number subjects	N (%)	0 (0.00%)

Measurement cohorts

If no explicit codelist is provided (codes = NULL), the function will use the concept set associated with the cohort (if exists) to perform diagnostics.

For example, using CohortConstructor, we can create a cohort based on measurement concepts. This cohort stores the codelist used to define it as an attribute.

cdm$measurement_cohort <- conceptCohort(
  cdm = cdm,
  conceptSet = list("measurement_codelist" = c(3001467L, 45875977L)),
  name = "measurement_cohort"
)
cohortCodelist(cdm$measurement_cohort)
#> 
#> - measurement_codelist (2 codes)

We can then call summariseCohortMeasurementUse() without specifying codes. In this case, the function automatically uses the codelist associated with the cohort. The example below runs diagnostics on the measurement records used to define cohort entry:

result <- summariseCohortMeasurementUse(
  cohort = cdm$measurement_cohort,
  timing = "cohort_start_date"
)
tableMeasurementValueAsNumber(result)

CDM name	Cohort name	Concept name	Concept ID	Source concept name	Source concept ID	Domain ID	Unit concept name	Unit concept ID	Estimate name	Estimate value
measurement_codelist
mock database	measurement_codelist	overall	overall	overall	overall	overall	kilogram	9529	N	50
									Median [Q25 – Q75]	8.77 [7.07 – 10.48]
									Q05 – Q95	5.70 – 11.84
									Q01 – Q99	5.43 – 12.11
									Range	5.36 to 12.18
									Missing value, N (%)	2 (4.00%)
							NA	-	N	50
									Median [Q25 – Q75]	8.77 [7.10 – 10.44]
									Q05 – Q95	5.77 – 11.77
									Q01 – Q99	5.50 – 12.04
									Range	5.44 to 12.11
									Missing value, N (%)	3 (6.00%)
		Alkaline phosphatase.bone [Enzymatic activity/volume] in Serum or Plasma	3001467	NA	-	Measurement	kilogram	9529	N	50
									Median [Q25 – Q75]	8.77 [7.07 – 10.48]
									Q05 – Q95	5.70 – 11.84
									Q01 – Q99	5.43 – 12.11
									Range	5.36 to 12.18
									Missing value, N (%)	2 (4.00%)
							NA	-	N	50
									Median [Q25 – Q75]	8.77 [7.10 – 10.44]
									Q05 – Q95	5.77 – 11.77
									Q01 – Q99	5.50 – 12.04
									Range	5.44 to 12.11
									Missing value, N (%)	3 (6.00%)

Stratifications

In the following example, we restrict diagnostics to the measurement_summary check and stratify results by sex. The resulting table shows, for each stratum, the number of subjects with measurements, the number of measurements per subject, and the time between measurements.

*Note that the percentage of subjects with measurements (Number subjects) is calculated relative to the total number of subjects in the cohort, independent of stratification variables such as sex, age, or year.

result <- summariseCohortMeasurementUse(
  cohort = cdm$measurement_cohort,
  bySex = TRUE,
  byConcept = FALSE,
  timing = "any",
  checks = "measurement_summary"
)
tableMeasurementSummary(result)

CDM name	Cohort name	Codelist name	Variable name	Estimate name	Sex
CDM name	Cohort name	Codelist name	Variable name	Estimate name	overall	Female	Male
mock database	measurement_codelist	measurement_codelist	Cohort records	N	100	–	–
			Cohort subjects	N	67	–	–
			Number subjects	N (%)	67 (100.00%)	40 (59.70%)	27 (40.30%)
			Days between measurements	Median [Q25 – Q75]	249 [67 – 645]	240 [53 – 1,133]	267 [81 – 415]
				Range	8 to 2,886	8 to 2,886	8 to 2,743
			Measurements per subject	Median [Q25 – Q75]	1.00 [1.00 – 2.00]	1.00 [1.00 – 2.00]	1.00 [1.00 – 2.00]
				Range	1.00 to 4.00	1.00 to 4.00	1.00 to 3.00

Other arguments

Additional arguments allow users to further stratify results, restrict the date range of measurement records, customise the set of summary estimates, and obtain counts to plot histograms These options behave in the same way as in summariseMeasurementUse(), and are described in more detail in the “Summarising measurement use in a dataset” vignette.