--- title: "Grouped dates" output: html: meta: css: ["@default@1.13.67", "@copy-button@1.13.67", "@callout@1.13.67", "@article@1.13.67"] js: ["@sidenotes@1.13.67", "@center-img@1.13.67", "@copy-button@1.13.67", "@callout@1.13.67", "@toc-highlight@1.13.67"] options: toc: true js_highlight: version: 1.29.0 vignette: > %\VignetteEngine{litedown::vignette} %\VignetteIndexEntry{Introduction} %\VignetteEncoding{UTF-8} %\VignetteDepends{outbreaks, ggplot2} --- ```{r, include = FALSE} litedown::reactor(error = TRUE, message = TRUE, print = NA, fig.height = 5) ``` ## Introduction The goal of grates is to make it easy to group dates across a range of different time intervals. It defines a collection of classes and associated methods that, together, formalise the concept of grouped dates and are intuitive to use. Currently there are classes implemented for [year-week](#yearweek), [year-month](#yearxxx), [year-quarter](#yearxxx) and [yearly](#yearxxx) groupings as well as for more flexible groupings across [days](#period) and [months](#month). To illustrate are examples we use data on a simulated outbreak of Ebola Virus Disease from the [outbreaks](https://cran.r-project.org/package=outbreaks) package. For our purposes we are not concerned with analysing the data, instead we simply use the **date_of_infection** to help illustrate grates functionality. To start, let us first look at the daily data. ```{r} #| fig.alt: > #| Bar chart of daily incidence (by date of infection) covering 2014-03-19 #| to 2015-04-27 inclusive. The graph peaks somewhere between September and #| November 2014. The "descent" from the peak tapers off slower than the #| initial "ascent". library(grates) library(outbreaks) library(ggplot2) # Pull out the date of infection x <- ebola_sim_clean$linelist$date_of_infection # Calculate the daily incidence totals (ignoring missing values) daily <- aggregate(list(cases = x), by = list(date = x), FUN = length) # Add explicit zeros for days which aren't present range <- seq.Date(min(daily$date), max(daily$date), by = "day") daily <- merge(data.frame(date = range), daily, by = "date", all.x = TRUE) daily <- within(daily, cases[is.na(cases)] <- 0) # plot the resulting output ggplot(daily, aes(date, cases)) + geom_col(width = 1) + theme_bw() ``` ## Week groupings {#yearweek} One of the more common date groupings is to a weekly level and grates defines three classes for users to work with, ``, `` and the ``. The most general of these is the `` class. When creating a general yearweek object, users must specify an associated `firstday` of the week. This is a value from 1 to 7 representing Monday through Sunday. These objects can be constructed directly via the `yearweek()` or with the coercion function, `as_yearweek()`. ::: callout-note Internally, yearweek objects are stored as the number of weeks (starting at 0) from the date of the `firstday` nearest the Unix Epoch (1970-01-01). Put more simply, the number of seven day periods from: - 1969-12-29 for `firstday` equal to 1 (Monday) - 1969-12-30 for `firstday` equal to 2 (Tuesday) - 1969-12-31 for `firstday` equal to 3 (Wednesday) - 1970-01-01 for `firstday` equal to 4 (Thursday) - 1970-01-02 for `firstday` equal to 5 (Friday) - 1970-01-03 for `firstday` equal to 6 (Saturday) - 1970-01-04 for `firstday` equal to 7 (Sunday) We use this anchoring around the Unix Epoch as it allows for very efficient conversion to, and from, date objects in which themselves anchor on 1970-01-01. That said, most users should not need to consider this internal representation and should be able to use grates blissfully unaware. ::: `` objects are used to represent ISO week dates as defined in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601). To expand further, it is easiest to quote from Wikipedia[^1] > ISO weeks start with Monday and end on Sunday. Each week's year is the Gregorian year in which the Thursday falls. The first week of the year, hence, always contains 4 January. ISO week year numbering therefore usually deviates by 1 from the Gregorian for some days close to 1 January. [^1]: Wikipedia contributors. (2025, January 15). ISO week date. In Wikipedia, The Free Encyclopedia. Retrieved 12:47, March 6, 2025, from https://en.wikipedia.org/w/index.php?title=ISO_week_date&oldid=1269568343: Functionally, a `` is equivalent to a `` object with an associated `firstday` value of 1 (Monday). `` objects are similar but instead of starting on a Monday, they start on a Sunday. They have been commonly used by the CDC in America and are sometimes called CDC weeks. Functionally they are equivalent to a `` object with an associated `firstday` value of 7 (Sunday). Continuing with the Ebola data from earlier we can now calculate weekly case counts: ```{r} #| fig.alt: > #| Bar chart of incidence (by the ISO week of infection) covering 2014-W12 #| to 2015-W18 inclusive. The graph peaks at 2014-W38. The "descent" from the #| peak tapers off slower than the initial "ascent". Six labels of the form #| 'year-week' are evenly spread along the x-axis and centred on the #| corresponding bars. # calculate the total number for across each week week_dat <- with(daily, aggregate( list(cases = cases), by = list(week = as_isoweek(date)), FUN = sum ) ) head(week_dat) # plot the output (week_plot <- ggplot(week_dat, aes(week, cases)) + geom_col(width = 1, colour = "white") + theme_bw()) ``` To assist in formatting plots of grates objects we also provides x-axis scales that can be to extend the output from [ggplot2](https://cran.r-project.org/package=ggplot2) output. For example, if we prefer non-centralised Date labels we can pass an explicit `format` argument to the associated scale ```{r} #| fig.alt: > #| Bar chart of incidence (by the ISO week of infection) covering the time #| from March 2014 to April 2015 inclusive. The graph peaks around September #| 2014. The "descent" from the peak tapers off slower than the initial #| "ascent". Six labels of the form 'year-month-day' are evenly spread along #| the x-axis and aligned at the start of the corresponding bars. week_plot + scale_x_grates_epiweek(format = "%Y-%m-%d") ``` ## Period grouping {#period} `` objects represent groupings of `n` consecutive days calculated relative to an `offset`. It is useful for when you wish to group an arbitrary number of dates together (e.g. 10 days). ::: callout-note Internally `` objects are stored as the integer number, starting at 0, of periods since the Unix Epoch (1970-01-01) and a specified offset. Here periods are taken to mean groupings of `n` consecutive days. For storage and calculation purposes, `offset` is scaled relative to `n`, that is `offset <- offset %% n` and values of stored relative to this scaled offset. ::: Like yearweek objects, a period object is easily created with the `as_period()` coercion function. `as_period()` takes 3 arguments; `x`, the vector (normally a Date or POSIXt) you wish to group, `n`, the integer number of days you wish to group, and `offset`, the value you wish to start counting groups from relative to the Unix Epoch. For convenience, `offset` can be given as a date you want periods to be relative to (internally this date is converted to integer). In the example below we aggregate by 14 day periods offset from the earliest case: ```{r} #| fig.alt: > #| Bar chart of incidence (by period of infection) covering the time #| from March 2014 to April 2015 inclusive. The graph peaks around September #| 2014. The "descent" from the peak tapers off slower than the initial #| "ascent". Six labels of the form 'year-month-day' are evenly spread along #| the x-axis and aligned at the start of the corresponding bars. period_dat <- with(daily, aggregate( list(cases = cases), by = list(period = as_period(date, n = 14, offset = min(date))), FUN = sum ) ) head(period_dat) ggplot(period_dat, aes(period, cases)) + geom_col(width = 1, colour = "white") + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("") ``` ## yearmonth, yearquarter and year {#yearxxx} Unsurprisingly, ``, `` and `` represent Year-month, year-quarter and year groupings. Little more needs to be said so let's jump straight to some examples. ::: callout-note These objects are stored as the integer number of months/quarters/years (starting at 0) since the Unix Epoch (1970-01-01). To convert efficiently between dates and months relative to the UNIX Epoch we used an algorithm based on the approach of Davis Vaughan in the unreleased [datea](https://github.com/DavisVaughan/datea/) package. ::: ```{r} #| fig.alt: > #| Bar chart of monthly incidence (by date of infection) covering the time #| from March 2014 to April 2015 inclusive. The graph peaks around September #| 2014. The "descent" from the peak tapers off slower than the initial #| "ascent". Labels of the form 'year-month' are evenly spread along #| the x-axis and aligned at the centred of the corresponding bars. (month_dat <- with(daily, aggregate( list(cases = cases), by = list(month = as_yearmonth(date)), FUN = sum ) )) (month_plot <- ggplot(month_dat, aes(month, cases)) + geom_col(width = 1, colour = "white") + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("")) ``` Again we can have non-centred date labels by applying the associated scale with the desired format. ```{r} #| fig.alt: > #| Bar chart of monthly incidence (by date of infection) covering the time #| from March 2014 to April 2015 inclusive. The graph peaks around September #| 2014. The "descent" from the peak tapers off slower than the initial #| "ascent". Labels of the form 'year-month-day' are evenly spread along #| the x-axis aligned to the start of the corresponding bars. month_plot + scale_x_grates_yearmonth(format = "%Y-%m-%d") ``` yearquarter works similarly ```{r} #| fig.alt: > #| Bar chart of quarterly incidence (by date of infection) covering the time #| from 2014-Q1 to 2015-Q2 inclusive. The graph peaks over quarters 3 and 4 #| in 2014. Labels on the x-axis and of the form 'year-quarter' are centred on #| the corresponding bars. (quarter_dat <- with(daily, aggregate( list(cases = cases), by = list(quarter = as_yearquarter(date)), FUN = sum ) )) ggplot(quarter_dat, aes(quarter, cases)) + geom_col(width = 1, colour = "white") + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("") ``` As does year ```{r} #| fig.alt: > #| Bar chart of yearly incidence (by date of infection) for 2014 and 2015. #| There were lots more cases in 2014 compared to 2015 (Roughly speaking #| 3000 v 700). (year_dat <- with(daily, aggregate( list(cases = cases), by = list(year = as_year(date)), FUN = sum ) )) ggplot(year_dat, aes(year, cases)) + geom_col(width = 1, colour = "white") + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("") ``` ## month {#month} Month objects are groupings of 'n consecutive months' stored relative to the Unix Epoch. More precisely, `grates_month` objects are stored as the integer number (starting at 0), of n-month groups since the Unix Epoch (1970-01-01). This fixed anchoring does make them a little unwieldy but I find they can be useful for bimonthly data. ```{r} #| fig.alt: > #| Bar chart of bimonthly incidence (by date of infection) covering the time #| from March 2014 to April 2015 inclusive. The graph peaks around #| September/October 2014. Labels of the form 'year-month-day' are evenly #| spread along the x-axis aligned to the start of the corresponding bars. # calculate the bimonthly number of cases (bimonth_dat <- with(daily, aggregate( list(cases = cases), by = list(group = as_month(date, n = 2)), FUN = sum ) )) # by default lower date bounds are used for the x axis (bimonth_plot <- ggplot(bimonth_dat, aes(group, cases)) + geom_col(width = 1, colour = "white") + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("")) ``` Note that the default plotting behaviour of non-centred date labels is different to that of the yearweek, yearmonth, yearquarter and year scales where labels are centred by default. To obtain centred labels you must explicitly set the format to NULL in the scale: ```{r} #| fig.alt: > #| Bar chart of bimonthly incidence (by date of infection) covering the time #| from March 2014 to April 2015 inclusive. The graph peaks around #| September/October 2014. Labels of the form 'year-month to year-month' are #| evenly spread along the x-axis centred on the corresponding bars. bimonth_plot + scale_x_grates_month(format = NULL, n = 2L) ``` ## Methods and other functionality For all grates objects we have added many methods and operations to ensure logical and consistent behaviour. ```{r} # Choose some dates spread across a few weeks first <- as.Date("2024-12-18") dates <- seq.Date(from = first, by = "5 days", length.out = 7) # add the corresponding ISO week (see later) dat <- data.frame(date = dates, isoweek = as_isoweek(dates)) ``` Some times it is useful to access both the starting dates covered by grates objects as well as the end dates. To this end we provide functions `date_start()` and `date_end()`: ```{r} with(dat, { weeks <- unique(isoweek) data.frame( isoweek = weeks, start = date_start(weeks), end = date_end(weeks) ) }) ``` Note that the conversion of grate objects back to dates is analogous to `date_start()`. ```{r} with(dat, identical(as.Date(isoweek), date_start(isoweek))) ``` To find out whether a `grate` object spans a particular date we provide a `%during%` function: ```{r} with(dat, { data.frame( original_date = date, isoweek, contains.2025.01.10 = as.Date("2025-01-10") %during% isoweek ) }) ``` `min()`, `max()`, `range()` and `seq()` all work as you would expect ```{r} weeks <- dat$isoweek (minw <- min(weeks)) (maxw <- max(weeks)) (rangew <- range(weeks)) # seq method works if both `from` and `to` are epiweeks seq(from = minw, to = maxw, by = 6L) # but will error informatively if `to` is a different class seq(from = minw, to = 999, by = 6L) ``` Addition (subtraction) of whole numbers will add (subtract) the corresponding number of weeks to (from) the object ```{r} (dat <- transform(dat, plus4 = isoweek + 4L, minus4 = isoweek - 4L)) ``` Addition of two yearweek objects will error as the intention is unclear. ```{r} transform(dat, willerror = isoweek + isoweek) ``` Subtraction of two yearweek objects gives the difference in weeks between them ```{r} transform(dat, difference = plus4 - minus4) ``` epiweek objects can be combined with themselves but not other classes (assuming an epiweek object is the first entry). ```{r} c(minw, maxw) identical(c(minw, maxw), rangew) c(minw, 1L) ``` ## Acknowledgements The underlying implementation for these objects build upon ideas of Davis Vaughan and the unreleased [datea](https://github.com/DavisVaughan/datea/) package as well as those of Zhian Kamvar and the [aweek](https://cran.r-project.org/package=aweek) package.