Introduction

The Spatiotemporal Observation Annotation Tool (STOAT) is a tool for the automated and flexible environmental annotation of biodiversity datasets with gridded environmental data. rstoat provides a portal to initiate, check the progress of, and retrieve data from annotation requests. This introduction vignette will demonstrate the basics of data annotation, retrieval, and analysis, all in the R environment.

ANNOTATION SUBMISSION AND MANAGEMENT

library(rstoat)

The rstoat package provides a wrapper to STOAT. To use rstoat, you first need to authenticate using a personal Map of Life (MOL) account. If you are using RStudio, you can leave the password argument blank to fill via pop-up.

mol_login("your.email@address.com", "password")

NOTE: This vignette demonstrates code for an account preloaded with appropriate data (the sample species occurrence datasets below, which are additionally included for download using download_sample_data()). Your Map of Life account starts out empty and thus certain lines of code will not work right out of the box.

Sample Data

Occurrence data originate from GBIF, for the two species the powerful owl and the budgerigar, filtered to records from January 1, 2010 - May 21, 2020, limited to continental Australia. DOIs: https://doi.org/10.15468/dl.rx9t53 https://doi.org/10.15468/dl.48e2yv

Powerful owl data consisted of 8,821 records. Budgerigar data consisted of 18,947 records from which 9000 were sampled to match the powerful owl data. Data were reformatted for the Map of Life uploader, and provided here. Successfully annotated data are also provided, in the STOAT result directory format. Access sample data using the following function:

Annotation

rstoat provides two different functions for annotation. start_annotation_batch() is the primary annotation function and requires the uploading of a biodiversity dataset to the user’s MOL account via: https://mol.org/upload/.

Alternatively, users can use start_annotation_simple() for small jobs or pilot jobs before a full annotation. start_annotation_simple() accepts occurrence records directly through the console as a dataframe, though it annotates fewer records than a full annotation job.

For both types of annotations, users must provide an environmental layer configuration code that represents the product, layer, spatial and temporal buffers for annotation. See https://mol.org/stoat/sources for a full list of available layers and an explanation of buffers. Multiple layers can be requested for annotation in a single job.

The environmental layer code takes the form: “product-variable-spatialbuffer-temporalbuffer”. Use get_products() to view the full list of product and variable codes, and to check that the requested spatial and temporal buffers are within the ranges allowed (for computational reasons). The units of the spatial and temporal buffers are meters, and days, respectively.

Please see ?start_annotation_simple() for definitions of the output columns

head(get_products())
#>    variable      product min_spatial max_spatial min_temporal max_temporal spatial_resolution temporal_resolution start_date   end_date
#> 1 landcover       esacci         300        5000          365          365                300                 365 1992-01-01 2018-12-31
#> 2 elevation         srtm          30         100            0            0                 30                   0       <NA>       <NA>
#> 3      ndvi        modis         250        5000            1           30                250                   1 2000-02-24       <NA>
#> 4       evi        modis         250        5000            1           30                250                   1 2000-02-24       <NA>
#> 5       evi  landsat7_rt          30         250            1           64                 30                  16 2017-05-02       <NA>
#> 6       evi landsat7_pre          30         250            1           64                 30                  16 1999-05-01 2017-04-21
powerful_owl_sample <- read.csv("sample_data/powerful_owl_vignette.csv")
head(powerful_owl_sample)
#>   scientificName decimalLatitude decimalLongitude  eventDate
#> 1  Ninox strenua       -27.52915         152.9573 2020-03-03
#> 2  Ninox strenua       -26.67722         153.1176 2020-04-28
#> 3  Ninox strenua       -27.53015         153.2494 2020-05-03
#> 4  Ninox strenua       -27.74489         152.9122 2020-05-05
#> 5  Ninox strenua       -27.50000         153.0000 2020-03-03
#> 6  Ninox strenua       -33.71084         151.0459 2020-04-25
simple_results <- start_annotation_simple(powerful_owl_sample[1:50,], "modis-lst_day-1000-1",
                                          coords = c("decimalLongitude", "decimalLatitude"),
                                          date = "eventDate")
head(simple_results)
#>   event_id scientificName decimalLatitude decimalLongitude  eventDate product variable s_buff t_buff    value      stdev valid_pixel_count
#> 1        1  Ninox strenua       -27.52915         152.9573 2020-03-03   modis  lst_day   1000      1       NA         NA          0.000000
#> 2        2  Ninox strenua       -26.67722         153.1176 2020-04-28   modis  lst_day   1000      1 296.2425 0.15084945          2.164706
#> 3        3  Ninox strenua       -27.53015         153.2494 2020-05-03   modis  lst_day   1000      1 297.1021 0.08640988          3.564706
#> 4        4  Ninox strenua       -27.74489         152.9122 2020-05-05   modis  lst_day   1000      1 296.6948 0.51000000          3.545098
#> 5        5  Ninox strenua       -27.50000         153.0000 2020-03-03   modis  lst_day   1000      1       NA         NA          0.000000
#> 6        6  Ninox strenua       -33.71084         151.0459 2020-04-25   modis  lst_day   1000      1       NA         NA          0.000000

Proceeding with full annotation, once a dataset has been uploaded, the user can view their uploaded datasets using my_datasets(). On this account, we have loaded point occurrence datasets from two species, the Powerful Owl (Ninox strenua) and the Budgerigar (Melopsittacus undulatus). Both datasets are small subsets of the species’ GBIF data, over the range of the two species in Australia. my_datasets() will importantly return the dataset_id.

Now, a user can submit a custom annotation job using start_annotation_batch(). The user must submit the dataset_id to be annotated, a name for the job, and a environmental layer code as with the simple annotation.

In this example, we annotate the Powerful Owl dataset with MODIS Land Surface Temperature data with a 1000-meter spatial buffer radius, and two different temporal buffers: 1-day (i.e. only day-of data) and 30-day.

We also annotate the Budgerigar dataset with MODIS Land Surface Temperature data (1000-meter spatial buffer, 1-day temporal buffer) for a species comparison. STOAT supports the combination of multiple species into a single dataset and thus the annotation of multiple species concurrently, but here the species are separated into two jobs for clarity.

NOTE This code will not run successfully unless you have a dataset uploaded to your Map of Life account

A user can get a list of all of their past and present jobs and their statuses using my_jobs().

Then, they can check in on the status of a submitted annotation job using job_details(). Additionally, they can view the species annotated for a particular successfully completed job using job_species().

After a short while, the annotation will complete, and an email notification will be sent to the user’s login address. The user can then pull the results into R using download_annotation(). The function will return the directory name to which the file has been downloaded, which can be saved for easy reference.

For those interested in following the analysis on their own machine, see Sample Data section above for annotated datasets

DATA VISUALIZATION

Begin data visualization by opening the respective output folder and loading in the appropriate csvs. Records are split across multiple files (one for each layer annotated), which must be first merged. STOAT provides a convenience function, read_output(), that automates this procedure and loads annotated data into a single dataframe.

#dataframe_name <- read_output("path/to/your/downloaded/data/directory")

# This it the code you would use if you directly downloaded a successful annotation
#ninox <- read_output(ninox_result_dir)

# Instead, here we read the equivalent output from the sample data directory
ninox <- read_output(paste0(samples_directory, "/powerful_owl_results"))
head(ninox)
#>                                  molid scientificname               date.x    lng.x     lat.x uncertainty     lat.y    lng.y     date.y modis_lst_day_1000_1 modis_lst_day_1000_30
#> 1 25096d2d-3dc0-48ac-9ef9-48d259a5a75a  Ninox strenua 2010-10-03T00:00:00Z 152.8816 -27.38746       25000 -27.38746 152.8816 2010-10-03                   NA              299.4025
#> 2 4775b07b-8249-478e-86be-aedada4b0569  Ninox strenua 2010-06-04T00:00:00Z 150.6471 -34.74158       25000 -34.74158 150.6471 2010-06-04                   NA              289.5026
#> 3 19d95d25-d86e-4c76-8667-361e5f7cd321  Ninox strenua 2010-08-07T00:00:00Z 150.6471 -34.74158       25000 -34.74158 150.6471 2010-08-07                   NA              286.1000
#> 4 af37ae9f-95c1-4611-9841-be21b3bee68f  Ninox strenua 2010-08-07T00:00:00Z 150.6471 -34.74158       25000 -34.74158 150.6471 2010-08-07                   NA              286.1000
#> 5 80285a52-2e54-425c-813c-1e3213296036  Ninox strenua 2010-08-07T00:00:00Z 150.6471 -34.74158       25000 -34.74158 150.6471 2010-08-07                   NA              286.1000
#> 6 a3a68f1a-c038-421d-81ee-f365e55fbbb2  Ninox strenua 2010-09-20T00:00:00Z 152.3598 -32.48022       25000 -32.48022 152.3598 2010-09-20                   NA              292.8160

Now we can visualize the environmental data for the Powerful Owl with a basic histogram and scatterplot.

hist(ninox$modis_lst_day_1000_1-273.15, breaks = 50, xlab = "MODIS LST Day (degrees C)", ylab = "Count",
     main = "Ninox strenua: MODIS LST Day")

plot(as.Date(ninox$date.y), (ninox$modis_lst_day_1000_1-273.15),
     xlim = as.Date(c("2010-01-01", "2019-01-01")),
     xlab = "Year", ylab = "MODIS LST Day (degrees C)",
     main = "Ninox strenua: MODIS LST Day")
axis.Date(1, ninox$date, at=seq(as.Date("2009-01-01"), as.Date("2019-01-01"), by="years"))

We can compare the environmental characterizations of different species to gain insight into their environmental niches. Here, we compare the Powerful Owl with the Budgerigar, annotated with the same spatial and temporal buffers.

melopsittacus <- read_output(paste0(samples_directory, "/budgerigar_results"))
two_species <- rbind(melopsittacus[,1:10], ninox[,1:10])
plot((modis_lst_day_1000_1-273.15) ~ as.Date(date.y), data = two_species,
     col = as.numeric(factor(two_species$scientificname)),
     xlim = as.Date(c("2010-01-01", "2019-01-01")),
     pch = as.numeric(factor(two_species$scientificname)), cex = 0.4,
     xlab = "Year", ylab = "MODIS LST Day (degrees C)",
     main = "Powerful Owl vs Budgerigar MODIS LST Day")
axis.Date(1, two_species$date, at=seq(as.Date("2009-01-01"), as.Date("2019-01-01"), by="years"))
legend(x="topright", legend = as.character(levels(factor(two_species$scientificname))),
       pch = 1:length(levels(factor(two_species$scientificname))),
       col=1:length(levels(factor(two_species$scientificname))))

We can also start to explore the impacts of spatial and temporal buffers on the resultant environmental characterizations of species.

plot(density(na.omit(ninox$modis_lst_day_1000_30-273.15)),col = "red",
     xlab = "MODIS LST Day (degrees C)",
     main = "Powerful Owl MODIS LST Day 1 vs 30-day t_buff")
lines(density(na.omit(ninox$modis_lst_day_1000_1-273.15)), col = "black")
legend(x="topright", legend = c("1-day t_buff", "30-day t_buff"),
       pch = 20,
       col=c("black", "red"))

options(old)