Introduction to the capl R package

Joel D. Barnes, M.Sc. and Michelle D. Guerrero, Ph.D.

March 13, 2021

logo

Introduction

The Canadian Assessment of Physical Literacy (CAPL) is the first comprehensive protocol that can accurately and reliably assess a broad spectrum of skills and abilities that contribute to and characterize the physical literacy level of a participating child.

Physical literacy moves beyond just fitness, motor skill or motivation in isolation. The CAPL is unique in that it can assess the multiple aspects of physical literacy: physical competence, daily behaviour, motivation and confidence, and knowledge and understanding.

The domains of physical literacy are summarized in figure 1 of the CAPL-2 manual on page 6:

domains of physical literacy

The Healthy Active Living and Obesity Research Group (HALO) has been responsible for the systematic development of the CAPL since 2008. HALO’s test development efforts have been informed by the assessment of more than 10,000 children and with input from well over 100 researchers and practitioners within related fields of study.

The capl package contains tools enabling users to compute and visualize CAPL-2 (Canadian Assessment of Physical Literacy, Second Edition) scores and interpretations from raw data, all within the R environment without having to use the CAPL-2 website.

Installation

GitHub

Users can download and install the most recent version of the capl package directly from GitHub (www.github.com/barnzilla/capl) using the devtools R package.

devtools::install_github("barnzilla/capl", upgrade = "never", build_vignettes = TRUE, force = TRUE)
library(capl)

Once the capl package is loaded, any available tutorials for the package, such as this vignette, can be accessed by calling the browseVignettes() function.

browseVignettes("capl")

Getting started

Importing raw data

Users must first import their raw data before using the capl package to compute CAPL-2 scores and interpretations. The import_capl_data() function enables users to import data from an Excel workbook into the R global environment.

data <- import_capl_data(
  file_path = "c:/path/to/raw-data.xlsx",
  sheet_name = "Sheet1"
)

Required variables

The capl package requires 60 variables in order to compute CAPL-2 scores and interpretations. Users can use the get_missing_capl_variables() function to retrieve a list of the required variables. The required variables are outlined in the Details section of the documentation.

?get_missing_capl_variables

The capl package is looking for 60 variables by the following names:

Loading the pre-installed dataset

The capl package comes with a demo (fake) dataset of raw data, capl_demo_data, which contains 500 rows of participant data on the 60 variables that are required by the capl package. Users can load the demo dataset and start exploring.

data("capl_demo_data")

The base R str() function allows users to get a sense of how the CAPL-2 raw data should be structured and named for downstream use in the capl package.

str(capl_demo_data)
#> 'data.frame':    500 obs. of  60 variables:
#>  $ age               : int  8 9 9 8 12 10 12 10 12 9 ...
#>  $ gender            : chr  "Male" "Female" "Male" "f" ...
#>  $ pacer_lap_distance: num  15 20 20 15 20 15 15 15 15 NA ...
#>  $ pacer_laps        : int  23 31 169 50 63 15 32 143 43 182 ...
#>  $ plank_time        : int  274 282 9 228 252 110 21 185 6 41 ...
#>  $ camsa_skill_score1: int  14 5 6 13 2 9 4 11 5 11 ...
#>  $ camsa_time1       : int  34 27 13 35 21 NA NA 16 20 14 ...
#>  $ camsa_skill_score2: int  14 5 13 11 14 14 0 4 0 4 ...
#>  $ camsa_time2       : int  35 23 14 35 23 23 33 30 29 18 ...
#>  $ steps1            : int  30627 27788 8457 8769 14169 9610 29459 17112 30008 18270 ...
#>  $ time_on1          : chr  "5:13am" "6:13" "6:07" "6:13" ...
#>  $ time_off1         : chr  "22:00" NA "21:00" "22:00" ...
#>  $ non_wear_time1    : int  25 31 33 25 83 67 20 10 49 64 ...
#>  $ steps2            : int  14905 24750 30111 21077 15786 23828 24735 2621 20690 19652 ...
#>  $ time_on2          : chr  "06:00" "5:13am" "6:13" "6:13" ...
#>  $ time_off2         : chr  "21:00" "23:00" "11:13pm" "23:00" ...
#>  $ non_wear_time2    : int  20 82 4 55 1 53 65 47 82 79 ...
#>  $ steps3            : int  21972 15827 14130 13132 18022 12817 14065 26352 27090 10226 ...
#>  $ time_on3          : chr  "07:00" "05:00" "07:48am" NA ...
#>  $ time_off3         : chr  "11:57pm" NA "08:30pm" NA ...
#>  $ non_wear_time3    : int  6 79 23 65 34 15 72 76 60 40 ...
#>  $ steps4            : int  28084 27369 14315 9963 6993 10092 10774 3208 2878 9055 ...
#>  $ time_on4          : chr  "05:00" "6:13" "6:07" NA ...
#>  $ time_off4         : chr  "08:30pm" "10:57 pm" "22:00" "11:13pm" ...
#>  $ non_wear_time4    : int  32 38 74 20 75 22 84 59 42 22 ...
#>  $ steps5            : int  14858 21112 16880 11707 20917 30200 20220 17995 18712 25336 ...
#>  $ time_on5          : chr  "6:07" "6:13" "06:00" "05:00" ...
#>  $ time_off5         : chr  "11:57pm" "23:00" "8:17pm" "8:17pm" ...
#>  $ non_wear_time5    : int  61 64 73 23 82 42 66 38 55 18 ...
#>  $ steps6            : int  17705 5564 16459 12235 27766 26099 15763 7202 2746 3895 ...
#>  $ time_on6          : chr  "06:00" "06:00" NA "6:07" ...
#>  $ time_off6         : chr  "21:00" NA "10:57 pm" "08:30pm" ...
#>  $ non_wear_time6    : int  33 24 89 8 27 56 66 21 14 7 ...
#>  $ steps7            : int  11067 13540 12106 18795 15039 9082 3733 4029 20791 28499 ...
#>  $ time_on7          : chr  "6:07" "6:07" "8:00am" "06:00" ...
#>  $ time_off7         : chr  "08:30pm" "11:13pm" "8:17pm" "10:57 pm" ...
#>  $ non_wear_time7    : int  8 72 4 38 9 32 49 36 34 43 ...
#>  $ self_report_pa    : int  NA 2 2 4 3 5 NA 7 6 7 ...
#>  $ csappa1           : int  1 2 4 2 2 2 3 2 2 3 ...
#>  $ csappa2           : int  3 2 1 1 1 1 4 1 4 3 ...
#>  $ csappa3           : int  2 3 2 1 NA 1 3 3 4 4 ...
#>  $ csappa4           : int  4 1 1 3 4 4 4 4 4 1 ...
#>  $ csappa5           : int  4 2 3 2 1 2 2 2 4 1 ...
#>  $ csappa6           : int  3 4 1 4 2 2 2 3 4 4 ...
#>  $ why_active1       : int  4 3 5 3 1 5 4 1 1 2 ...
#>  $ why_active2       : int  5 3 4 2 5 3 5 NA 5 NA ...
#>  $ why_active3       : int  3 3 1 4 2 3 4 4 5 3 ...
#>  $ feelings_about_pa1: int  4 3 2 2 1 1 3 4 4 2 ...
#>  $ feelings_about_pa2: int  5 2 2 3 4 2 4 4 2 5 ...
#>  $ feelings_about_pa3: int  2 5 2 5 3 2 2 1 3 5 ...
#>  $ pa_guideline      : int  2 3 4 1 2 4 3 2 2 2 ...
#>  $ crf_means         : int  1 4 4 2 2 1 2 1 4 1 ...
#>  $ ms_means          : int  3 2 1 2 3 1 1 2 4 2 ...
#>  $ sports_skill      : int  2 4 4 1 3 1 3 1 4 3 ...
#>  $ pa_is             : int  10 1 1 1 1 1 2 1 3 1 ...
#>  $ pa_is_also        : int  5 1 4 4 1 7 2 7 2 8 ...
#>  $ improve           : int  3 3 9 3 9 9 3 3 3 6 ...
#>  $ increase          : int  2 8 3 8 8 1 3 3 8 8 ...
#>  $ when_cooling_down : int  4 2 4 2 2 2 2 5 2 2 ...
#>  $ heart_rate        : int  5 6 4 4 4 9 4 8 7 4 ...

The 60 required variables can also be quickly accessed by calling the base R colnames() function.

colnames(capl_demo_data)
#>  [1] "age"                "gender"             "pacer_lap_distance"
#>  [4] "pacer_laps"         "plank_time"         "camsa_skill_score1"
#>  [7] "camsa_time1"        "camsa_skill_score2" "camsa_time2"       
#> [10] "steps1"             "time_on1"           "time_off1"         
#> [13] "non_wear_time1"     "steps2"             "time_on2"          
#> [16] "time_off2"          "non_wear_time2"     "steps3"            
#> [19] "time_on3"           "time_off3"          "non_wear_time3"    
#> [22] "steps4"             "time_on4"           "time_off4"         
#> [25] "non_wear_time4"     "steps5"             "time_on5"          
#> [28] "time_off5"          "non_wear_time5"     "steps6"            
#> [31] "time_on6"           "time_off6"          "non_wear_time6"    
#> [34] "steps7"             "time_on7"           "time_off7"         
#> [37] "non_wear_time7"     "self_report_pa"     "csappa1"           
#> [40] "csappa2"            "csappa3"            "csappa4"           
#> [43] "csappa5"            "csappa6"            "why_active1"       
#> [46] "why_active2"        "why_active3"        "feelings_about_pa1"
#> [49] "feelings_about_pa2" "feelings_about_pa3" "pa_guideline"      
#> [52] "crf_means"          "ms_means"           "sports_skill"      
#> [55] "pa_is"              "pa_is_also"         "improve"           
#> [58] "increase"           "when_cooling_down"  "heart_rate"

Generating demo raw data

The capl package is also equipped with the get_capl_demo_data() function. This function allows users to randomly generate demo raw data and takes parameter n (set to 500 by default). This parameter is used to specify how many rows of demo raw data to generate and must, therefore, be an integer greater than zero. Users, for example, can randomly generate demo raw data for 10,000 participants by executing a single line of code:

capl_demo_data2 <- get_capl_demo_data(n = 10000)

The base R str() function can be called to verify how many rows and columns of data were created.

str(capl_demo_data2)
#> 'data.frame':    10000 obs. of  60 variables:
#>  $ age               : int  10 12 10 8 8 7 NA 12 8 10 ...
#>  $ gender            : chr  "f" "b" "Girl" "b" ...
#>  $ pacer_lap_distance: num  15 15 20 20 20 NA 20 20 20 20 ...
#>  $ pacer_laps        : int  133 190 149 35 187 143 82 39 51 112 ...
#>  $ plank_time        : int  120 2 1 135 45 177 188 219 202 272 ...
#>  $ camsa_skill_score1: int  13 NA 6 9 8 9 11 7 12 8 ...
#>  $ camsa_time1       : int  30 15 14 18 25 25 19 NA 11 22 ...
#>  $ camsa_skill_score2: int  6 14 7 4 5 10 NA 9 4 13 ...
#>  $ camsa_time2       : int  NA 23 23 22 14 29 25 13 10 15 ...
#>  $ steps1            : int  4068 16900 28907 24158 30523 8038 22496 23784 29889 19841 ...
#>  $ time_on1          : chr  "07:00" "06:00" "8:00am" "05:00" ...
#>  $ time_off1         : chr  "8:17pm" "08:30pm" NA "8:17pm" ...
#>  $ non_wear_time1    : int  76 35 63 NA 16 61 53 85 43 6 ...
#>  $ steps2            : int  1063 1897 17853 28541 22759 21121 7124 30927 28035 15053 ...
#>  $ time_on2          : chr  "06:00" "6:13" "06:00" "06:00" ...
#>  $ time_off2         : chr  "21:00" "8:17pm" NA "08:30pm" ...
#>  $ non_wear_time2    : int  16 53 25 66 81 19 45 36 32 51 ...
#>  $ steps3            : int  8340 15720 3856 1282 8814 9122 28609 6683 17781 29380 ...
#>  $ time_on3          : chr  "6:07" "6:07" "07:00" "6:07" ...
#>  $ time_off3         : chr  NA "22:00" "08:30pm" "8:17pm" ...
#>  $ non_wear_time3    : int  78 56 NA 10 34 5 76 89 52 19 ...
#>  $ steps4            : int  30738 20319 30162 12190 14649 7253 14465 26852 21257 2884 ...
#>  $ time_on4          : chr  "06:00" "8:00am" "8:00am" "6:07" ...
#>  $ time_off4         : chr  "21:00" "23:00" "10:57 pm" "22:00" ...
#>  $ non_wear_time4    : int  76 58 6 7 61 66 10 28 42 49 ...
#>  $ steps5            : int  28494 17733 7180 3916 1480 4430 8578 20947 30940 23786 ...
#>  $ time_on5          : chr  NA "5:13am" "5:13am" "07:48am" ...
#>  $ time_off5         : chr  "08:30pm" NA "22:00" "23:00" ...
#>  $ non_wear_time5    : int  89 15 81 15 89 57 35 47 42 30 ...
#>  $ steps6            : int  18555 29477 18135 24160 15221 3946 18621 12294 16166 26659 ...
#>  $ time_on6          : chr  "07:00" "6:07" "05:00" NA ...
#>  $ time_off6         : chr  "23:00" "23:00" "23:00" "11:57pm" ...
#>  $ non_wear_time6    : int  34 68 37 80 51 10 45 35 86 41 ...
#>  $ steps7            : int  9610 29657 11875 26228 27851 6942 23744 3010 26184 22988 ...
#>  $ time_on7          : chr  "07:00" NA "07:48am" "8:00am" ...
#>  $ time_off7         : chr  "11:57pm" "08:30pm" "23:00" "11:13pm" ...
#>  $ non_wear_time7    : int  47 18 8 74 72 65 25 46 39 81 ...
#>  $ self_report_pa    : int  2 6 4 NA 6 3 3 1 4 6 ...
#>  $ csappa1           : int  2 2 2 3 1 1 1 2 4 1 ...
#>  $ csappa2           : int  3 1 2 2 3 1 1 1 4 2 ...
#>  $ csappa3           : int  2 1 2 4 2 4 3 4 4 3 ...
#>  $ csappa4           : int  1 2 2 1 1 1 3 3 3 2 ...
#>  $ csappa5           : int  1 2 3 1 1 1 4 3 1 2 ...
#>  $ csappa6           : int  4 4 2 1 1 4 2 1 4 3 ...
#>  $ why_active1       : int  2 5 2 4 2 5 4 4 5 4 ...
#>  $ why_active2       : int  2 3 3 1 2 2 3 5 4 1 ...
#>  $ why_active3       : int  3 5 4 5 5 4 NA NA 2 4 ...
#>  $ feelings_about_pa1: int  2 1 1 4 3 5 3 NA 2 3 ...
#>  $ feelings_about_pa2: int  2 4 1 3 5 4 5 3 3 1 ...
#>  $ feelings_about_pa3: int  1 4 5 1 3 5 4 3 1 3 ...
#>  $ pa_guideline      : int  2 3 4 4 3 1 2 2 3 2 ...
#>  $ crt_means         : int  1 1 1 3 3 3 2 4 3 1 ...
#>  $ ms_means          : int  4 4 4 1 3 1 2 4 2 1 ...
#>  $ sports_skill      : int  3 3 1 4 2 3 3 1 3 2 ...
#>  $ pa_is             : int  7 1 7 7 7 7 1 1 8 1 ...
#>  $ pa_is_also        : int  1 7 7 6 5 1 7 7 7 7 ...
#>  $ improve           : int  10 10 9 3 9 3 3 3 3 3 ...
#>  $ increase          : int  8 6 8 8 8 8 8 8 7 9 ...
#>  $ when_cooling_down : int  2 9 10 2 2 5 2 10 6 3 ...
#>  $ heart_rate        : int  10 4 4 4 7 4 4 3 4 4 ...

Exporting data to Excel

If users prefer to examine the CAPL demo raw data in a workbook, the export_capl_data() function allows them to export data objects to Excel.

export_capl_data(capl_demo_data2, "c:/path/to/store/capl_demo_data2.xlsx")

Renaming variables

If users have imported their own raw data and plan to use the main function, get_capl(), in the capl package to compute CAPL-2 scores and interpretations, they must ensure their variables names match the names of the 60 required variables. Users can rename their variables by calling the rename_variable() function. This function takes three parameters: x, search, and replace. The x parameter must be the raw data object, the search parameter must be a character vector representing the variable name(s) to be renamed, and the replace parameter must be a character vector representing the new names for the variables specificed in the search parameter. Below we show how to rename variables using a fake dataset called raw_data.

# Create fake data
raw_data <- data.frame(
  age_years = sample(8:12, 100, replace = TRUE),
  genders = sample(c("girl", "boy"), 100, replace = TRUE, prob = c(0.51, 0.49)),
  step_counts1 = sample(1000:30000, 100, replace = TRUE),
  step_counts2 = sample(1000:30000, 100, replace = TRUE),
  step_counts3 = sample(1000:30000, 100, replace = TRUE),
  step_counts4 = sample(1000:30000, 100, replace = TRUE),
  step_counts5 = sample(1000:30000, 100, replace = TRUE),
  step_counts6 = sample(1000:30000, 100, replace = TRUE),
  step_counts7 = sample(1000:30000, 100, replace = TRUE)
)

# Examine the structure of this data
str(raw_data)
#> 'data.frame':    100 obs. of  9 variables:
#>  $ age_years   : int  8 9 10 10 9 10 10 9 8 9 ...
#>  $ genders     : chr  "boy" "girl" "girl" "girl" ...
#>  $ step_counts1: int  28465 23893 26441 18531 3992 14289 19427 17798 11786 20054 ...
#>  $ step_counts2: int  4947 14387 13901 29226 13026 17162 18235 20177 16793 17982 ...
#>  $ step_counts3: int  4762 29848 1643 10834 22820 6205 5615 28781 4293 19236 ...
#>  $ step_counts4: int  1508 10239 12398 14529 15403 3570 2928 9260 9325 12508 ...
#>  $ step_counts5: int  28565 7615 8165 23957 4230 29746 23593 15227 5170 9223 ...
#>  $ step_counts6: int  21803 4207 16257 25273 3005 10600 10941 20869 14111 2210 ...
#>  $ step_counts7: int  24422 17877 1027 3712 15786 1242 12955 4567 12093 27485 ...

# Rename the variables
raw_data <- rename_variable(
  x = raw_data,
  search = c(
    "age_years", 
    "genders", 
    "step_counts1", 
    "step_counts2", 
    "step_counts3", 
    "step_counts4", 
    "step_counts5", 
    "step_counts6", 
    "step_counts7"
  ),
  replace = c(
    "age", 
    "gender", 
    "steps1", 
    "steps2", 
    "steps3", 
    "steps4", 
    "steps5", 
    "steps6", 
    "steps7"
    )
)

# Examine the structure of this data
str(raw_data)
#> 'data.frame':    100 obs. of  9 variables:
#>  $ age   : int  8 9 10 10 9 10 10 9 8 9 ...
#>  $ gender: chr  "boy" "girl" "girl" "girl" ...
#>  $ steps1: int  28465 23893 26441 18531 3992 14289 19427 17798 11786 20054 ...
#>  $ steps2: int  4947 14387 13901 29226 13026 17162 18235 20177 16793 17982 ...
#>  $ steps3: int  4762 29848 1643 10834 22820 6205 5615 28781 4293 19236 ...
#>  $ steps4: int  1508 10239 12398 14529 15403 3570 2928 9260 9325 12508 ...
#>  $ steps5: int  28565 7615 8165 23957 4230 29746 23593 15227 5170 9223 ...
#>  $ steps6: int  21803 4207 16257 25273 3005 10600 10941 20869 14111 2210 ...
#>  $ steps7: int  24422 17877 1027 3712 15786 1242 12955 4567 12093 27485 ...

Eliminating noisy errors with validation

One of the coding philosophies behind the capl package is to create a “quiet” user experience by suppressing “noisy” error and warning messages via validation. That is, the capl package returns missing or invalid values as NA values instead of throwing “noisy” errors that halt code execution. If any variable is missing, for example, the get_capl() function will continue to execute without throwing error or warning messages. The get_missing_capl_variables() function will create required variables that are missing and populate these variables with NA values. In order to implement the validation philosophy, every capl function enlists helper functions to validate the data. If a given value is not of the correct class or out of range, an NA will be returned.

Validation functions in the capl package

There are eight functions included in the capl package (displayed in alphabetical order) to help provide a “quiet” user experience:

  • validate_age()
  • validate_character()
  • validate_domain_score()
  • validate_gender()
  • validate_integer()
  • validate_number()
  • validate_scale()
  • validate_steps()

Users can learn more about these functions by accessing the documentation within the R environment.

?validate_age
?validate_character
?validate_domain_score
?validate_gender
?validate_integer
?validate_number
?validate_scale
?validate_steps

Validation of age

The CAPL-2 is currently validated with 8- to 12-year-old children. However, when a function requires the age variable to execute a computation (e.g., get_capl_interpretation()), the age variable is validated via the validate_age() function.

validated_age <- validate_age(c(7, 8, 9, 10, 11, 12, 13, "", NA, "12", 8.5))

Notice the NA values in the results.

validated_age
#>  [1] NA  8  9 10 11 12 NA NA NA 12  8

The first element is NA because the original value is 7. The next five elements are identical to their original values because they are integers between 8 and 12. The seventh element is NA because the original value is 13. The next two elements are NA because the original values ("" and NA) are obviously invalid. The last element is 8, but notice that the original value is a decimal. Because 8.5 is between 8 and 12, it is considered valid but the floor of the value is returned since CAPL performs age-specific computations based on integer age.

Validation of gender

The CAPL-2 is currently validated for children who identify as boys or girls. When a function requires the gender variable to execute a computation (e.g., get_capl_interpretation()), the gender variable is validated via the validate_gender() function.

validated_gender <- validate_gender(c("Girl", "GIRL", "g", "G", "Female", "f", "F", "", NA, 1))

validated_gender
#>  [1] "girl" "girl" "girl" "girl" "girl" "girl" "girl" NA     NA     "girl"

Notice the results again. This function accepts a number of case-insensitive options (e.g., “Girl”, “G”, “Female”, “F”, 1) for the female gender and returns a standardized “girl” value. The only two elements that are returned as NA have original values that are obviously invalid ("" and NA). The validate_gender() function behaves in a similar fashion for the male gender; it also accepts a number of case-insensitive options and returns a standardized “boy” value.

validated_gender <- validate_gender(c("Boy", "BOY", "b", "B", "Male", "m", "M", "", NA, 0))

validated_gender
#>  [1] "boy" "boy" "boy" "boy" "boy" "boy" "boy" NA    NA    "boy"

Computing CAPL-2 scores and interpretations

The CAPL-2 scoring system is nicely summarized in figure 2 of the CAPL-2 manual on page 7: