---
title: "Grouped dates"
output:
html:
meta:
css: ["@default@1.13.67", "@copy-button@1.13.67", "@callout@1.13.67", "@article@1.13.67"]
js: ["@sidenotes@1.13.67", "@center-img@1.13.67", "@copy-button@1.13.67", "@callout@1.13.67", "@toc-highlight@1.13.67"]
options:
toc: true
js_highlight:
version: 1.29.0
vignette: >
%\VignetteEngine{litedown::vignette}
%\VignetteIndexEntry{Introduction}
%\VignetteEncoding{UTF-8}
%\VignetteDepends{outbreaks, ggplot2}
---
```{r, include = FALSE}
litedown::reactor(error = TRUE, message = TRUE, print = NA, fig.height = 5)
```
## Introduction
The goal of grates is to make it easy to group dates across a range of different
time intervals. It defines a collection of classes and associated methods that,
together, formalise the concept of grouped dates and are intuitive to use.
Currently there are classes implemented for [year-week](#yearweek),
[year-month](#yearxxx), [year-quarter](#yearxxx) and [yearly](#yearxxx) groupings
as well as for more flexible groupings across [days](#period) and
[months](#month).
To illustrate are examples we use data on a simulated outbreak of Ebola Virus
Disease from the [outbreaks](https://cran.r-project.org/package=outbreaks)
package. For our purposes we are not concerned with analysing the data, instead
we simply use the **date_of_infection** to help illustrate grates functionality.
To start, let us first look at the daily data.
```{r}
#| fig.alt: >
#| Bar chart of daily incidence (by date of infection) covering 2014-03-19
#| to 2015-04-27 inclusive. The graph peaks somewhere between September and
#| November 2014. The "descent" from the peak tapers off slower than the
#| initial "ascent".
library(grates)
library(outbreaks)
library(ggplot2)
# Pull out the date of infection
x <- ebola_sim_clean$linelist$date_of_infection
# Calculate the daily incidence totals (ignoring missing values)
daily <- aggregate(list(cases = x), by = list(date = x), FUN = length)
# Add explicit zeros for days which aren't present
range <- seq.Date(min(daily$date), max(daily$date), by = "day")
daily <- merge(data.frame(date = range), daily, by = "date", all.x = TRUE)
daily <- within(daily, cases[is.na(cases)] <- 0)
# plot the resulting output
ggplot(daily, aes(date, cases)) + geom_col(width = 1) + theme_bw()
```
## Week groupings {#yearweek}
One of the more common date groupings is to a weekly level and grates defines
three classes for users to work with, ``, `` and
the ``.
The most general of these is the `` class. When creating a
general yearweek object, users must specify an associated `firstday` of the
week. This is a value from 1 to 7 representing Monday through Sunday. These
objects can be constructed directly via the `yearweek()` or with the coercion
function, `as_yearweek()`.
::: callout-note
Internally, yearweek objects are stored as the number of weeks (starting at 0)
from the date of the `firstday` nearest the Unix Epoch (1970-01-01). Put more
simply, the number of seven day periods from:
- 1969-12-29 for `firstday` equal to 1 (Monday)
- 1969-12-30 for `firstday` equal to 2 (Tuesday)
- 1969-12-31 for `firstday` equal to 3 (Wednesday)
- 1970-01-01 for `firstday` equal to 4 (Thursday)
- 1970-01-02 for `firstday` equal to 5 (Friday)
- 1970-01-03 for `firstday` equal to 6 (Saturday)
- 1970-01-04 for `firstday` equal to 7 (Sunday)
We use this anchoring around the Unix Epoch as it allows for very efficient
conversion to, and from, date objects in which themselves anchor on 1970-01-01.
That said, most users should not need to consider this internal representation
and should be able to use grates blissfully unaware.
:::
`` objects are used to represent ISO week dates as defined in
[ISO 8601](https://en.wikipedia.org/wiki/ISO_8601). To expand further, it is
easiest to quote from Wikipedia[^1]
> ISO weeks start with Monday and end on Sunday. Each week's year is the
Gregorian year in which the Thursday falls. The first week of the year, hence,
always contains 4 January. ISO week year numbering therefore usually
deviates by 1 from the Gregorian for some days close to 1 January.
[^1]: Wikipedia contributors. (2025, January 15). ISO week date.
In Wikipedia, The Free Encyclopedia.
Retrieved 12:47, March 6, 2025,
from https://en.wikipedia.org/w/index.php?title=ISO_week_date&oldid=1269568343:
Functionally, a `` is equivalent to a `` object
with an associated `firstday` value of 1 (Monday).
`` objects are similar but instead of starting on a Monday, they
start on a Sunday. They have been commonly used by the CDC in America and are
sometimes called CDC weeks. Functionally they are equivalent to a
`` object with an associated `firstday` value of 7 (Sunday).
Continuing with the Ebola data from earlier we can now calculate weekly case
counts:
```{r}
#| fig.alt: >
#| Bar chart of incidence (by the ISO week of infection) covering 2014-W12
#| to 2015-W18 inclusive. The graph peaks at 2014-W38. The "descent" from the
#| peak tapers off slower than the initial "ascent". Six labels of the form
#| 'year-week' are evenly spread along the x-axis and centred on the
#| corresponding bars.
# calculate the total number for across each week
week_dat <- with(daily,
aggregate(
list(cases = cases),
by = list(week = as_isoweek(date)),
FUN = sum
)
)
head(week_dat)
# plot the output
(week_plot <-
ggplot(week_dat, aes(week, cases)) +
geom_col(width = 1, colour = "white") +
theme_bw())
```
To assist in formatting plots of grates objects we also provides x-axis scales
that can be to extend the output from
[ggplot2](https://cran.r-project.org/package=ggplot2) output. For example, if
we prefer non-centralised Date labels we can pass an explicit `format` argument
to the associated scale
```{r}
#| fig.alt: >
#| Bar chart of incidence (by the ISO week of infection) covering the time
#| from March 2014 to April 2015 inclusive. The graph peaks around September
#| 2014. The "descent" from the peak tapers off slower than the initial
#| "ascent". Six labels of the form 'year-month-day' are evenly spread along
#| the x-axis and aligned at the start of the corresponding bars.
week_plot + scale_x_grates_epiweek(format = "%Y-%m-%d")
```
## Period grouping {#period}
`` objects represent groupings of `n` consecutive days calculated
relative to an `offset`. It is useful for when you wish to group an arbitrary
number of dates together (e.g. 10 days).
::: callout-note
Internally `` objects are stored as the integer number, starting
at 0, of periods since the Unix Epoch (1970-01-01) and a specified offset. Here
periods are taken to mean groupings of `n` consecutive days. For storage and
calculation purposes, `offset` is scaled relative to `n`, that is
`offset <- offset %% n` and values of stored relative to this scaled offset.
:::
Like yearweek objects, a period object is easily created with the `as_period()`
coercion function. `as_period()` takes 3 arguments; `x`, the vector (normally a
Date or POSIXt) you wish to group, `n`, the integer number of days you wish to
group, and `offset`, the value you wish to start counting groups from relative
to the Unix Epoch. For convenience, `offset` can be given as a date you want
periods to be relative to (internally this date is converted to integer).
In the example below we aggregate by 14 day periods offset from the earliest
case:
```{r}
#| fig.alt: >
#| Bar chart of incidence (by period of infection) covering the time
#| from March 2014 to April 2015 inclusive. The graph peaks around September
#| 2014. The "descent" from the peak tapers off slower than the initial
#| "ascent". Six labels of the form 'year-month-day' are evenly spread along
#| the x-axis and aligned at the start of the corresponding bars.
period_dat <- with(daily,
aggregate(
list(cases = cases),
by = list(period = as_period(date, n = 14, offset = min(date))),
FUN = sum
)
)
head(period_dat)
ggplot(period_dat, aes(period, cases)) +
geom_col(width = 1, colour = "white") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("")
```
## yearmonth, yearquarter and year {#yearxxx}
Unsurprisingly, ``, `` and ``
represent Year-month, year-quarter and year groupings. Little more needs to be
said so let's jump straight to some examples.
::: callout-note
These objects are stored as the integer number of months/quarters/years
(starting at 0) since the Unix Epoch (1970-01-01). To convert efficiently
between dates and months relative to the UNIX Epoch we used an algorithm based
on the approach of Davis Vaughan in the unreleased
[datea](https://github.com/DavisVaughan/datea/) package.
:::
```{r}
#| fig.alt: >
#| Bar chart of monthly incidence (by date of infection) covering the time
#| from March 2014 to April 2015 inclusive. The graph peaks around September
#| 2014. The "descent" from the peak tapers off slower than the initial
#| "ascent". Labels of the form 'year-month' are evenly spread along
#| the x-axis and aligned at the centred of the corresponding bars.
(month_dat <- with(daily,
aggregate(
list(cases = cases),
by = list(month = as_yearmonth(date)),
FUN = sum
)
))
(month_plot <-
ggplot(month_dat, aes(month, cases)) +
geom_col(width = 1, colour = "white") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab(""))
```
Again we can have non-centred date labels by applying the associated scale with
the desired format.
```{r}
#| fig.alt: >
#| Bar chart of monthly incidence (by date of infection) covering the time
#| from March 2014 to April 2015 inclusive. The graph peaks around September
#| 2014. The "descent" from the peak tapers off slower than the initial
#| "ascent". Labels of the form 'year-month-day' are evenly spread along
#| the x-axis aligned to the start of the corresponding bars.
month_plot + scale_x_grates_yearmonth(format = "%Y-%m-%d")
```
yearquarter works similarly
```{r}
#| fig.alt: >
#| Bar chart of quarterly incidence (by date of infection) covering the time
#| from 2014-Q1 to 2015-Q2 inclusive. The graph peaks over quarters 3 and 4
#| in 2014. Labels on the x-axis and of the form 'year-quarter' are centred on
#| the corresponding bars.
(quarter_dat <- with(daily,
aggregate(
list(cases = cases),
by = list(quarter = as_yearquarter(date)),
FUN = sum
)
))
ggplot(quarter_dat, aes(quarter, cases)) +
geom_col(width = 1, colour = "white") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("")
```
As does year
```{r}
#| fig.alt: >
#| Bar chart of yearly incidence (by date of infection) for 2014 and 2015.
#| There were lots more cases in 2014 compared to 2015 (Roughly speaking
#| 3000 v 700).
(year_dat <- with(daily,
aggregate(
list(cases = cases),
by = list(year = as_year(date)),
FUN = sum
)
))
ggplot(year_dat, aes(year, cases)) +
geom_col(width = 1, colour = "white") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("")
```
## month {#month}
Month objects are groupings of 'n consecutive months' stored relative to the
Unix Epoch. More precisely, `grates_month` objects are stored as the integer
number (starting at 0), of n-month groups since the Unix Epoch (1970-01-01).
This fixed anchoring does make them a little unwieldy but I find they can be
useful for bimonthly data.
```{r}
#| fig.alt: >
#| Bar chart of bimonthly incidence (by date of infection) covering the time
#| from March 2014 to April 2015 inclusive. The graph peaks around
#| September/October 2014. Labels of the form 'year-month-day' are evenly
#| spread along the x-axis aligned to the start of the corresponding bars.
# calculate the bimonthly number of cases
(bimonth_dat <- with(daily,
aggregate(
list(cases = cases),
by = list(group = as_month(date, n = 2)),
FUN = sum
)
))
# by default lower date bounds are used for the x axis
(bimonth_plot <-
ggplot(bimonth_dat, aes(group, cases)) +
geom_col(width = 1, colour = "white") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab(""))
```
Note that the default plotting behaviour of non-centred date labels is different
to that of the yearweek, yearmonth, yearquarter and year scales where labels are
centred by default. To obtain centred labels you must explicitly set the format
to NULL in the scale:
```{r}
#| fig.alt: >
#| Bar chart of bimonthly incidence (by date of infection) covering the time
#| from March 2014 to April 2015 inclusive. The graph peaks around
#| September/October 2014. Labels of the form 'year-month to year-month' are
#| evenly spread along the x-axis centred on the corresponding bars.
bimonth_plot + scale_x_grates_month(format = NULL, n = 2L)
```
## Methods and other functionality
For all grates objects we have added many methods and operations to ensure
logical and consistent behaviour.
```{r}
# Choose some dates spread across a few weeks
first <- as.Date("2024-12-18")
dates <- seq.Date(from = first, by = "5 days", length.out = 7)
# add the corresponding ISO week (see later)
dat <- data.frame(date = dates, isoweek = as_isoweek(dates))
```
Some times it is useful to access both the starting dates covered by grates
objects as well as the end dates. To this end we provide functions
`date_start()` and `date_end()`:
```{r}
with(dat, {
weeks <- unique(isoweek)
data.frame(
isoweek = weeks,
start = date_start(weeks),
end = date_end(weeks)
)
})
```
Note that the conversion of grate objects back to dates is analogous to
`date_start()`.
```{r}
with(dat, identical(as.Date(isoweek), date_start(isoweek)))
```
To find out whether a `grate` object spans a particular date we provide a
`%during%` function:
```{r}
with(dat, {
data.frame(
original_date = date,
isoweek,
contains.2025.01.10 = as.Date("2025-01-10") %during% isoweek
)
})
```
`min()`, `max()`, `range()` and `seq()` all work as you would expect
```{r}
weeks <- dat$isoweek
(minw <- min(weeks))
(maxw <- max(weeks))
(rangew <- range(weeks))
# seq method works if both `from` and `to` are epiweeks
seq(from = minw, to = maxw, by = 6L)
# but will error informatively if `to` is a different class
seq(from = minw, to = 999, by = 6L)
```
Addition (subtraction) of whole numbers will add (subtract) the corresponding
number of weeks to (from) the object
```{r}
(dat <- transform(dat, plus4 = isoweek + 4L, minus4 = isoweek - 4L))
```
Addition of two yearweek objects will error as the intention is unclear.
```{r}
transform(dat, willerror = isoweek + isoweek)
```
Subtraction of two yearweek objects gives the difference in weeks between them
```{r}
transform(dat, difference = plus4 - minus4)
```
epiweek objects can be combined with themselves but not other classes
(assuming an epiweek object is the first entry).
```{r}
c(minw, maxw)
identical(c(minw, maxw), rangew)
c(minw, 1L)
```
## Acknowledgements
The underlying implementation for these objects build upon ideas of Davis
Vaughan and the unreleased [datea](https://github.com/DavisVaughan/datea/)
package as well as those of Zhian Kamvar and the
[aweek](https://cran.r-project.org/package=aweek) package.