R Packages for Data Analysis

From this topic, students are anticipated to be able to build a basic R package, especially using the devtools package.

  • Write a DESCRIPTION file
  • Carefully curate package dependencies
  • Document functions and data using roxygen2 comments and tags
  • Include tests with testthat in accordance with the R package infrastructure.
  • Add a license
  • Update an R package via semantic versioning, NEWS, changelog.
  • Develop and build informative vignettes and a package README.

Resources

Video Lecture:

Written material:

Why R Packages

Why make an R package? As mentioned in the “functions” topic, your analysis will probably benefit from homemade functions: making functions forces you to think about your analysis in terms of its key computational parts, and makes for robust and readable code. Here are a few benefits that result by bundling these functions into an R package:

  • Easily access function documentation
  • Easy to share
  • Built-in checks to ensure your package is working
  • Provides a template for organizing your work

The alternative is keeping the functions stored in separate files, and source()ing them into your analysis scripts, but this can become unwieldy.

Plus, if your package becomes really nice, you might want to share it with the world! At least one tidyverse package supposedly has origins in code that Hadley Wickham bundled up during his PhD.

Finally, packages are a nice way to share datasets – think about the gapminder package or the palmerpenguins package.

Agenda

For this topic, we’ll be making an R package like the toy square package, by following along with “The Whole Game” Chapter of “R packages”. Here’s a checklist based on that chapter. If you miss class, then try following along using the book, and coming to office hours or asking questions on Slack if you get stuck.

First, make a minimal viable product:

  • Load the devtools package in the console (do this every time you go to work on your package).
  • Run create_package() in the console (all devtools functions should be written in the console).
    • Better defaults than going through the File menu.
  • use_git()
  • Make a new R script using use_r(), and write a function there: start with square(). It should take in a single argument and square it.
  • Test drive after using load_all().
  • Check that the package is intact: run check().
  • Edit the DESCRIPTION file. For the license, use_mit_license("Your Name").
  • Document the function: “Code > Insert roxygen skeleton”, then run document().
  • Install and Restart, or run install(). Try loading the package and using it!

Now, expand:

  • use_testthat() for your package, and make an R script containing tests
    • Check all tests with test(). This also happens with check().
  • We don’t really need to use functions from another package here, but practice declaring your general intention to use some functions from the dplyr namespace with use_package("dplyr").
  • Connect your local package to Github with use_github().
  • Add a package README with use_readme_rmd(), and render it every time with build_readme().
    • Once you successfully sync your package to GitHub, other people can install the package with devtools::install_github("your_username/package_name") – this should go in the README.
  • Make a vignette with use_vignette(). Build all vignettes with build_vignettes().
  • Include data with the R package with use_data(R_OBJECT_HERE). Then document a string of its name in a new R script using a different collection of roxygen tags.

Release your package:

  • Make a NEWS.md file with use_news_md() and add the main development notes.
  • Choose a version number; put in description file, and tag a release on GitHub.
    • You should also prepare your package for the next version – I suggest doing this on a new branch.

Test Your Understanding

It might help to browse a completed package directory to answer these questions, like the powers package.

  1. True or False: The mandatory files in an R package are a DESCRIPTION file, a NAMESPACE file, and functions in the R/ directory – everything else is just helpful.
  2. So far we’ve been using devtools functions in the console (instead of saved in a script). True or False: if you’re making a bigger R package, you should start putting those devtools functions in one or more R scripts.
  3. True or False: Add any package name to the Imports field in the DESCRIPTION file, and we can now use functions from that package using ::.
  4. True or False: We can’t both add exports to the NAMESPACE file manually and by using devtools::document().