From this topic, students are expected to be able to:
- Start getting a sense of when to make a function in a data analysis (we will build on this next week).
- Workflow for building a function: start interactively, wrap it as a function.
return()
. Argument names. - Fortify a function:
- generalize the function and use smart defaults;
NA
handling, and ellipses package https://ellipsis.r-lib.org/ - provide useful error messages; sidebar:
if
statements - Unit tests, and (sidebar) assertions
- generalize the function and use smart defaults;
- Data masking in a function.
- Documenting a function
Resources
Video lecture:
Written resources:
- Basic function syntax in R: https://swcarpentry.github.io/r-novice-inflammation/02-func-R/
- When to use functions in your data analysis:
Our own R functions
At this point in the course, we’ve used lots of functions, like mean()
, mutate()
, and pivot_longer()
. But it can be really useful to write your own function. For example, the ability to writing your own functions can supercharge your group_by() %>% summarize()
workflow: you can write your own function to use inside summarize()
, instead of relying soly on functions built into R or available in packages!
Here’s a simple example of a function I wrote to simulate rolling a user-inputted number of D10s (a 10-sided die used for tabletop gaming).
roll_d10 <- function(num_dice) {
sum(sample(1:10, num_dice, replace=TRUE))
}
roll_d10(2)
## [1] 3
(Sidebar: this is not reproducible code, as the output depends on the random seed, which R made up for us and won’t tell us. If I wanted to make this reproducible, then I would set the seed to (say) 123 before running my function with set.seed(123)
.)
Why Functions?
In short, it avoids repeatedly duplicating code. This is helpful because:
It shortens your code – crucially, without losing interpretability – making it easier and faster to read through and process its overall intent.
If your needs change, then you only need to change your code in one place (the function definition) rather than a bunch of places.
Bullet points 1 and 2 mean that using functions typically leads to fewer bugs and fewer headaches.
A good rule of thumb is whenever you find yourself repeating code more than a few times, consider writing a function.
Testing
When you’re using other people’s functions – like those in packages – they often work. However, as you have probably discovered by this point, it is very easy to inadvertently write code – and therefore functions – that do not work. Because of this, it’s important to test the functions we write to make sure they work.
Documentation
You should have also noticed by now that other people’s functions in packages are documented - there’s information about:
- what the function does, at a high level
- the objects it expects you to input
- the object that the function outputs
We can do this with roxygen2 tags to document your function, placed immediately above the function definition. Although roxygen2 tags are designed for use when creating R packages, they provide a standardized way to document a function – and make it easy for you to migrate your function to an R package if need be!
Your turn: functions and tests, the basics
We think working through Worksheet B1 is a great place to go from here to learn the basics of how to define your own functions and how to test it.
Class 1
- Haven’t attempted all of the questions on Worksheet B1? Then attempt unattempted questions.
- Put any questions you have about the worksheet questions or about functions in general in the Google Doc posted to Canvas.
If FAQ emerge in the Google Doc, then we will discuss them together.
Once you’re done Worksheet B1, tackle these follow-up challenge questions to check your understanding. If there are no questions to be answered about the worksheet in the Google Doc, then we will discuss these challenge questions.
Naming
- Will R do anything to stop you from doing this?
cube_num <- function(num) {
num^2
}
Will R do anything to stop you from writing a function where the input argument is named
blahblahblah
?What happens if you do this? Can you think of any adverse consequences?
mean <- function(num) {
num^2
}
Syntax
There are at least two other ways (structurally) to write
roll_d10()
. What are they? (Hint: one is showcased in the worksheet.)What would the function return if I added this line of code before the
sum()
call inroll_d10()
?sum(sample(1:4, num_dice, replace=TRUE)
There’s one function on the worksheet test cells that you haven’t used yet:
expect_known_hash()
. What is it, and when would it be useful?
Class 2
Agenda
We will learn about a couple of advanced topics:
- Ellipses
- Curly-curly
- Default values
These topics are covered in the R4DS Functions book chapter as well. So if you miss this class, then the R4DS Functions reading is a good alternative.
Counting missing values by group
Let’s start by loading some libraries.
library(palmerpenguins)
library(tidyverse)
library(gapminder)
Here’s some code that:
- groups penguins by species, then summarizes the number of missing values in each variable.
- groups gapminder by continent, then summarizes the number of missing values in each variable.
penguins %>% group_by(species) %>%
summarize(across(everything(), ~ sum(is.na(.x))))
## # A tibble: 3 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <int> <int> <int> <int> <int>
## 1 Adelie 0 1 1 1 1
## 2 Chinstrap 0 0 0 0 0
## 3 Gentoo 0 1 1 1 1
## # ℹ 2 more variables: sex <int>, year <int>
gapminder %>% group_by(continent) %>%
summarize(across(everything(), ~ sum(is.na(.x))))
## # A tibble: 5 × 6
## continent country year lifeExp pop gdpPercap
## <fct> <int> <int> <int> <int> <int>
## 1 Africa 0 0 0 0 0
## 2 Americas 0 0 0 0 0
## 3 Asia 0 0 0 0 0
## 4 Europe 0 0 0 0 0
## 5 Oceania 0 0 0 0 0
Your turn: turn this code into a function
By yourself or in small groups, try to turn the code above into a function. Slack react to tell us either “I’m stuck!” or “I’m done!”
Instructor demo: curly-curly
We will construct a solution to the exercise.
Your turn: curly-curly practice
Make a modification to our function: allow the user to also pass in which variables they want to summarize. (Right now it just summarizes all of them.) Slack react to tell us either “I’m stuck!” or “I’m done!”
Instructor demo: ellipses
We’ll modify our function using ellipses to get extra functionality: we’ll allow the user to group by more than one variable.
Instructor demo: defaults
We’ll talk about when you might conceptually want to set a default value for function arguments, and then make a new argument for our function called .groups
that makes it default to dropping the groups in the output.
Attribution
Some of these notes were originally compiled by previous iterations of the instructional staff, including Vincenzo Coia.