
Summarise observation period
Source:vignettes/summarise_observation_period.Rmd
summarise_observation_period.RmdIntroduction
In this vignette, we will explore the OmopSketch functions
designed to provide an overview of the observation_period
table. Specifically, there are six key functions that facilitate
this:
-
summariseObservationPeriod(),plotObservationPeriod()andtableObservationPeriod(): Use them to get some overall statistics describing theobservation_periodtable -
summariseInObservation(),plotInObservation(),tableInObservation(): Use them to summarise the trend in the number of records, individuals, person-days and females in observation during specific intervals of time and how the median age varies.
Create a mock cdm
Let’s see an example of its functionalities. To start with, we will load essential packages and create a mock cdm using the mockOmopSketch() database.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(OmopSketch)
# Connect to mock database
cdm <- mockOmopSketch()Summarise observation periods
Let’s now use the summariseObservationPeriod() function
from the OmopSketch package to help us have an overview of one of the
observation_period table, including some statistics such as
the Number of subjects and Duration in days
for each observation period (e.g., 1st, 2nd)
summarisedResult <- summariseObservationPeriod(cdm = cdm)
summarisedResult
#> # A tibble: 3,126 × 13
#> result_id cdm_name group_name group_level strata_name strata_level
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 mockOmopSketch observation_pe… all overall overall
#> 2 1 mockOmopSketch observation_pe… all overall overall
#> 3 1 mockOmopSketch observation_pe… all overall overall
#> 4 1 mockOmopSketch observation_pe… all overall overall
#> 5 1 mockOmopSketch observation_pe… all overall overall
#> 6 1 mockOmopSketch observation_pe… all overall overall
#> 7 1 mockOmopSketch observation_pe… all overall overall
#> 8 1 mockOmopSketch observation_pe… all overall overall
#> 9 1 mockOmopSketch observation_pe… all overall overall
#> 10 1 mockOmopSketch observation_pe… all overall overall
#> # ℹ 3,116 more rows
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> # estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> # additional_name <chr>, additional_level <chr>Notice that the output is in the summarised result format.
We can use the arguments to specify which statistics we want to
perform. For example, use the argument estimates to
indicate which estimates you are interested regarding the
Duration in days of the observation period.
summarisedResult <- summariseObservationPeriod(
cdm = cdm,
estimates = c("mean", "sd", "q05", "q95")
)
summarisedResult |>
filter(variable_name == "Duration in days") |>
select(group_level, variable_name, estimate_name, estimate_value)
#> # A tibble: 8 × 4
#> group_level variable_name estimate_name estimate_value
#> <chr> <chr> <chr> <chr>
#> 1 all Duration in days mean 3459.34
#> 2 all Duration in days sd 3586.96925956871
#> 3 all Duration in days q05 45
#> 4 all Duration in days q95 9766
#> 5 1st Duration in days mean 3459.34
#> 6 1st Duration in days sd 3586.96925956871
#> 7 1st Duration in days q05 45
#> 8 1st Duration in days q95 9766Additionally, you can stratify the results by sex and age groups, and specify a date range of interest:
summarisedResult <- summariseObservationPeriod(
cdm = cdm,
estimates = c("mean", "sd", "q05", "q95"),
sex = TRUE,
ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf)),
dateRange = as.Date(c("1970-01-01", "2010-01-01"))
)Notice that, by default, the “overall” group will be also included, as well as crossed strata (that means, sex == “Female” and ageGroup == “>35”).
Tidy the summarised object
tableObservationPeriod() will help you to create a table
(see supported types with: visOmopResults::tableType()). By default it
creates a gt table.
summarisedResult <- summariseObservationPeriod(
cdm = cdm,
estimates = c("mean", "sd", "q05", "q95"),
sex = TRUE
)
summarisedResult |>
tableObservationPeriod()| Observation period ordinal | Variable name | Variable level | Estimate name |
CDM name
|
|---|---|---|---|---|
| mockOmopSketch | ||||
| overall | ||||
| all | Number records | - | N | 100 |
| Number subjects | - | N | 100 | |
| Subjects not in person table | - | N (%) | 0 (0.00%) | |
| Records per person | - | Mean (SD) | 1.00 (0.00) | |
| Duration in days | - | Mean (SD) | 3,459.34 (3,586.97) | |
| Type concept id | Unknown type concept: na | N (%) | 100 (100.00%) | |
| Start date before birth date | - | N (%) | 0 (0.00%) | |
| End date before start date | - | N (%) | 0 (0.00%) | |
| Column name | Observation period end date | N missing data (%) | 0 (0.00%) | |
| Observation period id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| Observation period start date | N missing data (%) | 0 (0.00%) | ||
| Period type concept id | N missing data (%) | 100 (100.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| Person id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| 1st | Number subjects | - | N | 100 |
| Duration in days | - | Mean (SD) | 3,459.34 (3,586.97) | |
| Female | ||||
| all | Number records | - | N | 57 |
| Number subjects | - | N | 57 | |
| Records per person | - | Mean (SD) | 1.00 (0.00) | |
| Duration in days | - | Mean (SD) | 3,886.02 (3,922.09) | |
| Type concept id | Unknown type concept: na | N (%) | 57 (100.00%) | |
| Column name | Observation period end date | N missing data (%) | 0 (0.00%) | |
| Observation period id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| Observation period start date | N missing data (%) | 0 (0.00%) | ||
| Period type concept id | N missing data (%) | 57 (100.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| Person id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| 1st | Number subjects | - | N | 57 |
| Duration in days | - | Mean (SD) | 3,886.02 (3,922.09) | |
| Male | ||||
| all | Number records | - | N | 43 |
| Number subjects | - | N | 43 | |
| Records per person | - | Mean (SD) | 1.00 (0.00) | |
| Duration in days | - | Mean (SD) | 2,893.74 (3,040.21) | |
| Type concept id | Unknown type concept: na | N (%) | 43 (100.00%) | |
| Column name | Observation period end date | N missing data (%) | 0 (0.00%) | |
| Observation period id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| Observation period start date | N missing data (%) | 0 (0.00%) | ||
| Period type concept id | N missing data (%) | 43 (100.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| Person id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| 1st | Number subjects | - | N | 43 |
| Duration in days | - | Mean (SD) | 2,893.74 (3,040.21) | |
Visualise the results
Finally, we can visualise the result using
plotObservationPeriod().
summarisedResult <- summariseObservationPeriod(cdm = cdm)
plotObservationPeriod(
result = summarisedResult,
variableName = "Number subjects",
plotType = "barplot"
)
Note that either Number subjects or
Duration in days can be plotted. For
Number of subjects, the plot type can be
barplot, whereas for Duration in days, the
plot type can be barplot, boxplot, or
densityplot.”
Additionally, if results were stratified by sex or age group, we can
further use facet or colour arguments to
highlight the different results in the plot. To help us identify by
which variables we can colour or facet by, we can use visOmopResult
package.
summarisedResult <- summariseObservationPeriod(cdm = cdm, sex = TRUE)
plotObservationPeriod(
result = summarisedResult,
variableName = "Duration in days",
plotType = "boxplot",
facet = "sex"
)
summarisedResult <- summariseObservationPeriod(
cdm = cdm,
sex = TRUE,
ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf))
)
plotObservationPeriod(summarisedResult,
colour = "sex",
facet = "age_group"
)
Summarise in observation
OmopSketch can also help you to summarise the number of records in observation during specific intervals of time.
summarisedResult <- summariseInObservation(cdm$observation_period,
interval = "years"
)
summarisedResult |>
select(variable_name, estimate_name, estimate_value, additional_name, additional_level)
#> # A tibble: 132 × 5
#> variable_name estimate_name estimate_value additional_name additional_level
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Number of reco… count 1 time_interval 1955-01-01 to 1…
#> 2 Number of reco… percentage 1.00 time_interval 1955-01-01 to 1…
#> 3 Number of reco… count 2 time_interval 1956-01-01 to 1…
#> 4 Number of reco… percentage 2.00 time_interval 1956-01-01 to 1…
#> 5 Number of reco… count 3 time_interval 1957-01-01 to 1…
#> 6 Number of reco… percentage 3.00 time_interval 1957-01-01 to 1…
#> 7 Number of reco… count 4 time_interval 1958-01-01 to 1…
#> 8 Number of reco… percentage 4.00 time_interval 1958-01-01 to 1…
#> 9 Number of reco… count 4 time_interval 1959-01-01 to 1…
#> 10 Number of reco… percentage 4.00 time_interval 1959-01-01 to 1…
#> # ℹ 122 more rowsNote that you can adjust the time interval period using the
interval argument, which can be set to either “years”,
“quarters”, “months” or “overall” (default value).
summarisedResult <- summariseInObservation(cdm$observation_period,
interval = "months"
)Along with the number of records in observation, you can also
calculate the number of person-days by setting the output
argument to c(“record”, “person-days”).
summarisedResult <- summariseInObservation(cdm$observation_period,
output = c("record", "person-days")
)
summarisedResult |>
select(variable_name, estimate_name, estimate_value, additional_name, additional_level)
#> # A tibble: 4 × 5
#> variable_name estimate_name estimate_value additional_name additional_level
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Number of recor… count 100 overall overall
#> 2 Person-days count 345934 overall overall
#> 3 Number of recor… percentage 100.00 overall overall
#> 4 Person-days percentage 100.00 overall overallWe can further stratify our counts by sex (setting argument
sex = TRUE) or by age (providing an age group). Notice that
in both cases, the function will automatically create a group called
overall with all the sex groups and all the age groups. We can
also define a date range of interest to filter the
observation_period table accordingly.
summarisedResult <- summariseInObservation(cdm$observation_period,
output = c("record", "person-days"),
interval = "quarters",
sex = TRUE,
ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf)),
dateRange = as.Date(c("1970-01-01", "2010-01-01"))
)
summarisedResult |>
select(strata_level, variable_name, estimate_name, estimate_value, additional_name, additional_level)You can include additional output metrics by them to the output argument:
If output = "person", the trend in the number of
individuals in observation is returned.
summarisedResult <- summariseInObservation(cdm$observation_period,
output = c("person"),
interval = "years",
sex = TRUE,
ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf)),
)
summarisedResult |>
select(strata_level, variable_name, estimate_name, estimate_value, additional_name, additional_level)
#> # A tibble: 968 × 6
#> strata_level variable_name estimate_name estimate_value additional_name
#> <chr> <chr> <chr> <chr> <chr>
#> 1 overall Number of subjects count 1 time_interval
#> 2 <35 Number of subjects count 1 time_interval
#> 3 Male Number of subjects count 1 time_interval
#> 4 Male &&& <35 Number of subjects count 1 time_interval
#> 5 overall Number of subjects percentage 1.00 time_interval
#> 6 <35 Number of subjects percentage 1.00 time_interval
#> 7 Male Number of subjects percentage 1.00 time_interval
#> 8 Male &&& <35 Number of subjects percentage 1.00 time_interval
#> 9 overall Number of subjects count 2 time_interval
#> 10 <35 Number of subjects count 2 time_interval
#> # ℹ 958 more rows
#> # ℹ 1 more variable: additional_level <chr>If output = "sex", the trend in the number of females in
observation is returned. If sex = TRUE is specified, this
stratification is ignored.
summarisedResult <- summariseInObservation(cdm$observation_period,
output = c("sex"),
interval = "years",
sex = TRUE,
ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf)),
)
summarisedResult |>
select(strata_level, variable_name, estimate_name, estimate_value, additional_name, additional_level)
#> # A tibble: 314 × 6
#> strata_level variable_name estimate_name estimate_value additional_name
#> <chr> <chr> <chr> <chr> <chr>
#> 1 overall Number of females count 1 time_interval
#> 2 <35 Number of females count 1 time_interval
#> 3 overall Number of females percentage 1.00 time_interval
#> 4 <35 Number of females percentage 1.00 time_interval
#> 5 overall Number of females count 1 time_interval
#> 6 <35 Number of females count 1 time_interval
#> 7 overall Number of females percentage 1.00 time_interval
#> 8 <35 Number of females percentage 1.00 time_interval
#> 9 overall Number of females count 1 time_interval
#> 10 <35 Number of females count 1 time_interval
#> # ℹ 304 more rows
#> # ℹ 1 more variable: additional_level <chr>If output = "age, the trend in the median age of the
population in observation is calculated. If ageGroup and
interval are both specified, the age is computed at the
beginning of the interval or of the observation period, whichever is
more recent.
summarisedResult <- summariseInObservation(
observationPeriod = cdm$observation_period,
output = c("age"),
interval = "years",
ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf))
)
summarisedResult |>
select(strata_level, variable_name, estimate_name, estimate_value, additional_name, additional_level)
#> # A tibble: 165 × 6
#> strata_level variable_name estimate_name estimate_value additional_name
#> <chr> <chr> <chr> <chr> <chr>
#> 1 overall Age median 1 time_interval
#> 2 <35 Age median 1 time_interval
#> 3 overall Age median 1 time_interval
#> 4 <35 Age median 1 time_interval
#> 5 overall Age median 2 time_interval
#> 6 <35 Age median 2 time_interval
#> 7 overall Age median 2 time_interval
#> 8 <35 Age median 2 time_interval
#> 9 overall Age median 3 time_interval
#> 10 <35 Age median 3 time_interval
#> # ℹ 155 more rows
#> # ℹ 1 more variable: additional_level <chr>Tidy the summarised object
tableInObservartion() will help you to create a table of
type gt, reactable or datatable. By default it
creates a gt table.
summarisedResult <- summariseInObservation(cdm$observation_period,
output = c("person", "person-days", "sex"),
sex = TRUE
)
summarisedResult |>
tableInObservation(type = "gt")| Variable name | Sex | Estimate name |
Database name
|
|---|---|---|---|
| mockOmopSketch | |||
| episode; observation_period | |||
| Number of females | overall | N (%) | 57 (57.00%) |
| Number of subjects | overall | N (%) | 100 (100.00%) |
| Female | N (%) | 57 (57.00%) | |
| Male | N (%) | 43 (43.00%) | |
| Person-days | overall | N (%) | 345,934 (100.00%) |
| Female | N (%) | 221,503 (64.03%) | |
| Male | N (%) | 124,431 (35.97%) | |
Visualise the results
Finally, we can visualise the trend using
plotInObservation().
summarisedResult <- summariseInObservation(cdm$observation_period,
interval = "years"
)
plotInObservation(summarisedResult)
Notice that one output at a time can be plotted. If more outputs have been included in the summarised result, you will have to filter to only include one variable at time.
Additionally, if results were stratified by sex or age group, we can
further use facet or colour arguments to
highlight the different results in the plot. To help us identify by
which variables we can colour or facet by, we can use visOmopResult
package.
summarisedResult <- summariseInObservation(cdm$observation_period,
interval = "years",
output = c("record", "age"),
sex = TRUE,
ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf))
)
plotInObservation(
summarisedResult |>
filter(variable_name == "Age"),
colour = "sex",
facet = "age_group"
)
Finally, disconnect from the cdm
CDMConnector::cdmDisconnect(cdm = cdm)