From today’s class, students are anticipated to be able to:
- Read and write a delimited file, like a csv, from R using the
readr
package. - Make relative paths using the
here::here()
function. - Read data from a spreadsheet
- Read and write R binary files (rds files) from R.
Formats
Here are a few formats that you might want to read and write data to:
- Spreadsheets: Excel, Google Sheets, Numbers
- Delimited files: Plaintext files containing data, e.g. csv, tsv
- R binary: A serialization of an R object to a binary file. Basically, that means that it can be loaded in and out of R, but it can’t be opened by anything but R.
Thumb rules:
- csv are the most “one-size-fits-all”: you can open them in spreadsheet software, but they are also plaintext, so are lightweight, can be opened in any text editor, and can be “diff”ed.
- Spreadsheets are nice for human interaction.
- R binary is VERY niche. Don’t reach for it unless none of the other options suit your purposes.
Reading and writing data in R
We use the readr
package for this, because we think it has the most “work right out of the box” experience.
Main functions of note are read_csv()
and write_csv()
: tidyverse equivalents of the base R functions read.csv()
and write.csv()
.
Want to read and write to an Excel file? The readxl
package in the tidyverse is for you!
For the very niche option of R binary: read_rds()
and write_rds()
.
Test your Understanding
- True or False: if you want to be deliberate about where
here::here()
points to on your computer, you need to ensure you have an .Rproj file. - True or False: Suppose you have an .Rproj file in the same folder as your R script. Running
here::here()
from that R script will always point to that folder.
Your turn: try it out!
Open RStudio. Go to Session => Set Working Directory => Choose Directory and then pick a folder you would like to read and write data into. Then, run the following piece of code:
library(tidyverse)
library(gapminder)
gap_asia_2007 <- gapminder %>%
filter(year == 2007, continent == "Asia")
head(gap_asia_2007)
Write gap_asia_2007
to a comma-separated value (csv) file named exported_file.csv
with just one command:
write_csv(FILL_THIS_IN, "exported_file.csv")
Check out your files after executing this line!
Now, let’s practice reading csvs by reading the file we just wrote back into R:
gap_asia_2007_in <- read_csv("FILL_THIS_IN")
Check out your R environment after executing this line!
Also notice the output of running read_csv
. This tells us about the types of variables that were read in. It’s a good habit to check this every time you run a read_
function. Sometimes we might want to change how these variable types are specified…
Main idea behind here::here()
We just wrote and read files to our current directory. If we wanted to use a different folder on our computer, we could specify something like:
Documents/STAT545/exported_file.csv
- Mac uses forward slashesDocuments\STAT545\exported_file.csv
- Windows uses backward slashes
However, if you wanted to make your Rproj more portable and accessible to more users in a cross-platform (between Mac, Unix, Windows users), rather than specifying every path explicitly, here::here()
allows you to set relative paths much more easily.
For a quick start, check out this rant by Jenny Bryan, the founder of STAT 545,
For a longer written tutorial, read the above noted STAT 545 chapter on “Writing and Reading files”.