Skip to contents

Introduction

In this vignette, we will explore the OmopSketch functions designed to provide an overview of the observation_period table. Specifically, there are six key functions that facilitate this:

Create a mock cdm

Let’s see an example of its functionalities. To start with, we will load essential packages and create a mock cdm using the mockOmopSketch() database.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(OmopSketch)

# Connect to mock database
cdm <- mockOmopSketch()

Summarise observation periods

Let’s now use the summariseObservationPeriod() function from the OmopSketch package to help us have an overview of one of the observation_period table, including some statistics such as the Number of subjects and Duration in days for each observation period (e.g., 1st, 2nd)

summarisedResult <- summariseObservationPeriod(cdm = cdm)

summarisedResult
#> # A tibble: 3,126 × 13
#>    result_id cdm_name       group_name      group_level strata_name strata_level
#>        <int> <chr>          <chr>           <chr>       <chr>       <chr>       
#>  1         1 mockOmopSketch observation_pe… all         overall     overall     
#>  2         1 mockOmopSketch observation_pe… all         overall     overall     
#>  3         1 mockOmopSketch observation_pe… all         overall     overall     
#>  4         1 mockOmopSketch observation_pe… all         overall     overall     
#>  5         1 mockOmopSketch observation_pe… all         overall     overall     
#>  6         1 mockOmopSketch observation_pe… all         overall     overall     
#>  7         1 mockOmopSketch observation_pe… all         overall     overall     
#>  8         1 mockOmopSketch observation_pe… all         overall     overall     
#>  9         1 mockOmopSketch observation_pe… all         overall     overall     
#> 10         1 mockOmopSketch observation_pe… all         overall     overall     
#> # ℹ 3,116 more rows
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> #   estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> #   additional_name <chr>, additional_level <chr>

Notice that the output is in the summarised result format.

We can use the arguments to specify which statistics we want to perform. For example, use the argument estimates to indicate which estimates you are interested regarding the Duration in days of the observation period.

summarisedResult <- summariseObservationPeriod(
  cdm = cdm,
  estimates = c("mean", "sd", "q05", "q95")
)

summarisedResult |>
  filter(variable_name == "Duration in days") |>
  select(group_level, variable_name, estimate_name, estimate_value)
#> # A tibble: 8 × 4
#>   group_level variable_name    estimate_name estimate_value  
#>   <chr>       <chr>            <chr>         <chr>           
#> 1 all         Duration in days mean          3459.34         
#> 2 all         Duration in days sd            3586.96925956871
#> 3 all         Duration in days q05           45              
#> 4 all         Duration in days q95           9766            
#> 5 1st         Duration in days mean          3459.34         
#> 6 1st         Duration in days sd            3586.96925956871
#> 7 1st         Duration in days q05           45              
#> 8 1st         Duration in days q95           9766

Additionally, you can stratify the results by sex and age groups, and specify a date range of interest:

summarisedResult <- summariseObservationPeriod(
  cdm = cdm,
  estimates = c("mean", "sd", "q05", "q95"),
  sex = TRUE,
  ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf)),
  dateRange = as.Date(c("1970-01-01", "2010-01-01"))
)

Notice that, by default, the “overall” group will be also included, as well as crossed strata (that means, sex == “Female” and ageGroup == “>35”).

Tidy the summarised object

tableObservationPeriod() will help you to create a table (see supported types with: visOmopResults::tableType()). By default it creates a gt table.

summarisedResult <- summariseObservationPeriod(
  cdm = cdm,
  estimates = c("mean", "sd", "q05", "q95"),
  sex = TRUE
)

summarisedResult |>
  tableObservationPeriod()
Observation period ordinal Variable name Variable level Estimate name
CDM name
mockOmopSketch
overall
all Number records - N 100
Number subjects - N 100
Subjects not in person table - N (%) 0 (0.00%)
Records per person - Mean (SD) 1.00 (0.00)
Duration in days - Mean (SD) 3,459.34 (3,586.97)
Type concept id Unknown type concept: na N (%) 100 (100.00%)
Start date before birth date - N (%) 0 (0.00%)
End date before start date - N (%) 0 (0.00%)
Column name Observation period end date N missing data (%) 0 (0.00%)
Observation period id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Observation period start date N missing data (%) 0 (0.00%)
Period type concept id N missing data (%) 100 (100.00%)
N zeros (%) 0 (0.00%)
Person id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
1st Number subjects - N 100
Duration in days - Mean (SD) 3,459.34 (3,586.97)
Female
all Number records - N 57
Number subjects - N 57
Records per person - Mean (SD) 1.00 (0.00)
Duration in days - Mean (SD) 3,886.02 (3,922.09)
Type concept id Unknown type concept: na N (%) 57 (100.00%)
Column name Observation period end date N missing data (%) 0 (0.00%)
Observation period id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Observation period start date N missing data (%) 0 (0.00%)
Period type concept id N missing data (%) 57 (100.00%)
N zeros (%) 0 (0.00%)
Person id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
1st Number subjects - N 57
Duration in days - Mean (SD) 3,886.02 (3,922.09)
Male
all Number records - N 43
Number subjects - N 43
Records per person - Mean (SD) 1.00 (0.00)
Duration in days - Mean (SD) 2,893.74 (3,040.21)
Type concept id Unknown type concept: na N (%) 43 (100.00%)
Column name Observation period end date N missing data (%) 0 (0.00%)
Observation period id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Observation period start date N missing data (%) 0 (0.00%)
Period type concept id N missing data (%) 43 (100.00%)
N zeros (%) 0 (0.00%)
Person id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
1st Number subjects - N 43
Duration in days - Mean (SD) 2,893.74 (3,040.21)

Visualise the results

Finally, we can visualise the result using plotObservationPeriod().

summarisedResult <- summariseObservationPeriod(cdm = cdm)

plotObservationPeriod(
  result = summarisedResult,
  variableName = "Number subjects",
  plotType = "barplot"
)

Note that either Number subjects or Duration in days can be plotted. For Number of subjects, the plot type can be barplot, whereas for Duration in days, the plot type can be barplot, boxplot, or densityplot.”

Additionally, if results were stratified by sex or age group, we can further use facet or colour arguments to highlight the different results in the plot. To help us identify by which variables we can colour or facet by, we can use visOmopResult package.

summarisedResult <- summariseObservationPeriod(cdm = cdm, sex = TRUE)
plotObservationPeriod(
  result = summarisedResult,
  variableName = "Duration in days",
  plotType = "boxplot",
  facet = "sex"
)


summarisedResult <- summariseObservationPeriod(
  cdm = cdm,
  sex = TRUE,
  ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf))
)
plotObservationPeriod(summarisedResult,
  colour = "sex",
  facet = "age_group"
)

Summarise in observation

OmopSketch can also help you to summarise the number of records in observation during specific intervals of time.

summarisedResult <- summariseInObservation(cdm$observation_period,
  interval = "years"
)

summarisedResult |>
  select(variable_name, estimate_name, estimate_value, additional_name, additional_level)
#> # A tibble: 132 × 5
#>    variable_name   estimate_name estimate_value additional_name additional_level
#>    <chr>           <chr>         <chr>          <chr>           <chr>           
#>  1 Number of reco… count         1              time_interval   1955-01-01 to 1…
#>  2 Number of reco… percentage    1.00           time_interval   1955-01-01 to 1…
#>  3 Number of reco… count         2              time_interval   1956-01-01 to 1…
#>  4 Number of reco… percentage    2.00           time_interval   1956-01-01 to 1…
#>  5 Number of reco… count         3              time_interval   1957-01-01 to 1…
#>  6 Number of reco… percentage    3.00           time_interval   1957-01-01 to 1…
#>  7 Number of reco… count         4              time_interval   1958-01-01 to 1…
#>  8 Number of reco… percentage    4.00           time_interval   1958-01-01 to 1…
#>  9 Number of reco… count         4              time_interval   1959-01-01 to 1…
#> 10 Number of reco… percentage    4.00           time_interval   1959-01-01 to 1…
#> # ℹ 122 more rows

Note that you can adjust the time interval period using the interval argument, which can be set to either “years”, “quarters”, “months” or “overall” (default value).

summarisedResult <- summariseInObservation(cdm$observation_period,
  interval = "months"
)

Along with the number of records in observation, you can also calculate the number of person-days by setting the output argument to c(“record”, “person-days”).

summarisedResult <- summariseInObservation(cdm$observation_period,
  output = c("record", "person-days")
)


summarisedResult |>
  select(variable_name, estimate_name, estimate_value, additional_name, additional_level)
#> # A tibble: 4 × 5
#>   variable_name    estimate_name estimate_value additional_name additional_level
#>   <chr>            <chr>         <chr>          <chr>           <chr>           
#> 1 Number of recor… count         100            overall         overall         
#> 2 Person-days      count         345934         overall         overall         
#> 3 Number of recor… percentage    100.00         overall         overall         
#> 4 Person-days      percentage    100.00         overall         overall

We can further stratify our counts by sex (setting argument sex = TRUE) or by age (providing an age group). Notice that in both cases, the function will automatically create a group called overall with all the sex groups and all the age groups. We can also define a date range of interest to filter the observation_period table accordingly.

summarisedResult <- summariseInObservation(cdm$observation_period,
  output = c("record", "person-days"),
  interval = "quarters",
  sex = TRUE,
  ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf)),
  dateRange = as.Date(c("1970-01-01", "2010-01-01"))
)


summarisedResult |>
  select(strata_level, variable_name, estimate_name, estimate_value, additional_name, additional_level)

You can include additional output metrics by them to the output argument:

If output = "person", the trend in the number of individuals in observation is returned.

summarisedResult <- summariseInObservation(cdm$observation_period,
  output = c("person"),
  interval = "years",
  sex = TRUE,
  ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf)),
)


summarisedResult |>
  select(strata_level, variable_name, estimate_name, estimate_value, additional_name, additional_level)
#> # A tibble: 968 × 6
#>    strata_level variable_name      estimate_name estimate_value additional_name
#>    <chr>        <chr>              <chr>         <chr>          <chr>          
#>  1 overall      Number of subjects count         1              time_interval  
#>  2 <35          Number of subjects count         1              time_interval  
#>  3 Male         Number of subjects count         1              time_interval  
#>  4 Male &&& <35 Number of subjects count         1              time_interval  
#>  5 overall      Number of subjects percentage    1.00           time_interval  
#>  6 <35          Number of subjects percentage    1.00           time_interval  
#>  7 Male         Number of subjects percentage    1.00           time_interval  
#>  8 Male &&& <35 Number of subjects percentage    1.00           time_interval  
#>  9 overall      Number of subjects count         2              time_interval  
#> 10 <35          Number of subjects count         2              time_interval  
#> # ℹ 958 more rows
#> # ℹ 1 more variable: additional_level <chr>

If output = "sex", the trend in the number of females in observation is returned. If sex = TRUE is specified, this stratification is ignored.

summarisedResult <- summariseInObservation(cdm$observation_period,
  output = c("sex"),
  interval = "years",
  sex = TRUE,
  ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf)),
)


summarisedResult |>
  select(strata_level, variable_name, estimate_name, estimate_value, additional_name, additional_level)
#> # A tibble: 314 × 6
#>    strata_level variable_name     estimate_name estimate_value additional_name
#>    <chr>        <chr>             <chr>         <chr>          <chr>          
#>  1 overall      Number of females count         1              time_interval  
#>  2 <35          Number of females count         1              time_interval  
#>  3 overall      Number of females percentage    1.00           time_interval  
#>  4 <35          Number of females percentage    1.00           time_interval  
#>  5 overall      Number of females count         1              time_interval  
#>  6 <35          Number of females count         1              time_interval  
#>  7 overall      Number of females percentage    1.00           time_interval  
#>  8 <35          Number of females percentage    1.00           time_interval  
#>  9 overall      Number of females count         1              time_interval  
#> 10 <35          Number of females count         1              time_interval  
#> # ℹ 304 more rows
#> # ℹ 1 more variable: additional_level <chr>

If output = "age, the trend in the median age of the population in observation is calculated. If ageGroup and interval are both specified, the age is computed at the beginning of the interval or of the observation period, whichever is more recent.

summarisedResult <- summariseInObservation(
  observationPeriod = cdm$observation_period,
  output = c("age"),
  interval = "years",
  ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf))
)


summarisedResult |>
  select(strata_level, variable_name, estimate_name, estimate_value, additional_name, additional_level)
#> # A tibble: 165 × 6
#>    strata_level variable_name estimate_name estimate_value additional_name
#>    <chr>        <chr>         <chr>         <chr>          <chr>          
#>  1 overall      Age           median        1              time_interval  
#>  2 <35          Age           median        1              time_interval  
#>  3 overall      Age           median        1              time_interval  
#>  4 <35          Age           median        1              time_interval  
#>  5 overall      Age           median        2              time_interval  
#>  6 <35          Age           median        2              time_interval  
#>  7 overall      Age           median        2              time_interval  
#>  8 <35          Age           median        2              time_interval  
#>  9 overall      Age           median        3              time_interval  
#> 10 <35          Age           median        3              time_interval  
#> # ℹ 155 more rows
#> # ℹ 1 more variable: additional_level <chr>

Tidy the summarised object

tableInObservartion() will help you to create a table of type gt, reactable or datatable. By default it creates a gt table.

summarisedResult <- summariseInObservation(cdm$observation_period,
  output = c("person", "person-days", "sex"),
  sex = TRUE
)

summarisedResult |>
  tableInObservation(type = "gt")
Variable name Sex Estimate name
Database name
mockOmopSketch
episode; observation_period
Number of females overall N (%) 57 (57.00%)
Number of subjects overall N (%) 100 (100.00%)
Female N (%) 57 (57.00%)
Male N (%) 43 (43.00%)
Person-days overall N (%) 345,934 (100.00%)
Female N (%) 221,503 (64.03%)
Male N (%) 124,431 (35.97%)

Visualise the results

Finally, we can visualise the trend using plotInObservation().

summarisedResult <- summariseInObservation(cdm$observation_period,
  interval = "years"
)
plotInObservation(summarisedResult)

Notice that one output at a time can be plotted. If more outputs have been included in the summarised result, you will have to filter to only include one variable at time.

Additionally, if results were stratified by sex or age group, we can further use facet or colour arguments to highlight the different results in the plot. To help us identify by which variables we can colour or facet by, we can use visOmopResult package.

summarisedResult <- summariseInObservation(cdm$observation_period,
  interval = "years",
  output = c("record", "age"),
  sex = TRUE,
  ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf))
)
plotInObservation(
  summarisedResult |>
    filter(variable_name == "Age"),
  colour = "sex",
  facet = "age_group"
)

Finally, disconnect from the cdm

CDMConnector::cdmDisconnect(cdm = cdm)