In this vignette, you can see what a codebook generated from a
dataset with rich metadata looks like. This dataset includes mock data
for a short German Big Five personality inventory and an age variable.
The dataset follows the format created when importing data from formr.org. However, data imported using the
haven
package uses similar metadata. You can also add such
metadata yourself, or use the codebook package for unannotated
datasets.
As you can see below, the codebook
package automatically
computes reliabilities for multi-item inventories, generates nicely
labelled plots and outputs summary statistics. The same information is
also stored in a table, which you can export to various formats.
Additionally, codebook
can show you different kinds of
(labelled) missing values, and show you common missingness patterns. As
you cannot see, but search engines
will, the codebook
package also generates JSON-LD metadata for the dataset.
If you share your codebook as an HTML file online, this metadata should
make it easier for others to find your data. See
what Google sees here.
knit_by_pkgdown <- !is.null(knitr::opts_chunk$get("fig.retina"))
knitr::opts_chunk$set(warning = FALSE, message = TRUE, error = FALSE)
ggplot2::theme_set(ggplot2::theme_bw())
library(codebook)
data("bfi", package = 'codebook')
if (!knit_by_pkgdown) {
library(dplyr)
bfi <- bfi %>% select(-starts_with("BFIK_extra"),
-starts_with("BFIK_open"),
-starts_with("BFIK_consc"))
}
set.seed(1)
bfi$age <- rpois(nrow(bfi), 30)
library(labelled)
var_label(bfi$age) <- "Alter"
By default, we only set the required metadata attributes
name
and description
to sensible values.
However, there is a number of attributes you can set to describe the
data better. Find
out more.
metadata(bfi)$name <- "MOCK Big Five Inventory dataset (German metadata demo)"
metadata(bfi)$description <- "a small mock Big Five Inventory dataset"
metadata(bfi)$identifier <- "doi:10.5281/zenodo.1326520"
metadata(bfi)$datePublished <- "2016-06-01"
metadata(bfi)$creator <- list(
"@type" = "Person",
givenName = "Ruben", familyName = "Arslan",
email = "ruben.arslan@gmail.com",
affiliation = list("@type" = "Organization",
name = "MPI Human Development, Berlin"))
metadata(bfi)$citation <- "Arslan (2016). Mock BFI data."
metadata(bfi)$url <- "https://rubenarslan.github.io/codebook/articles/codebook.html"
metadata(bfi)$temporalCoverage <- "2016"
metadata(bfi)$spatialCoverage <- "Goettingen, Germany"
# We don't want to look at the code in the codebook.
knitr::opts_chunk$set(warning = TRUE, message = TRUE, echo = FALSE)
Dataset name: MOCK Big Five Inventory dataset (German metadata demo)
a small mock Big Five Inventory dataset
Temporal Coverage: 2016
Spatial Coverage: Goettingen, Germany
Citation: Arslan (2016). Mock BFI data.
URL: https://rubenarslan.github.io/codebook/articles/codebook.html
Identifier: doi:10.5281/zenodo.1326520
Date published: 2016-06-01
Creator:
name | value |
---|---|
@type | Person |
givenName | Ruben |
familyName | Arslan |
ruben.arslan@gmail.com | |
affiliation | Organization , MPI Human Development, Berlin |
|
28 completed rows, 28 who entered any information, 0 only viewed the first page. There are 0 expired rows (people who did not finish filling out in the requested time frame). In total, there are 28 rows including unfinished and expired rows.
There were 28 unique participants, of which 28 finished filling out at least one survey.
This survey was not repeated.
The first session started on 2016-07-08 09:54:16, the last session on 2016-11-02 21:19:50.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
People took on average 127.36 minutes (median 1.48) to answer the survey.
## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_bar()`).