Reading and Writing Data

From today’s class, students are anticipated to be able to:

  • Read and write a delimited file, like a csv, from R using the readr package.
  • Make relative paths using the here::here() function.
  • Read data from a spreadsheet
  • Read and write R binary files (rds files) from R.

Resources

Video lecture:

Tutorial:

Formats

Here are a few formats that you might want to read and write data to:

  • Spreadsheets: Excel, Google Sheets, Numbers
  • Delimited files: Plaintext files containing data, e.g. csv, tsv
  • R binary: A serialization of an R object to a binary file. Basically, that means that it can be loaded in and out of R, but it can’t be opened by anything but R.

Thumb rules:

  • csv are the most “one-size-fits-all”: you can open them in spreadsheet software, but they are also plaintext, so are lightweight, can be opened in any text editor, and can be “diff”ed. 
  • Spreadsheets are nice for human interaction.
  • R binary is VERY niche. Don’t reach for it unless none of the other options suit your purposes.

Reading and writing data in R

We use the readr package for this, because we think it has the most “work right out of the box” experience.

Main functions of note are read_csv() and write_csv(): tidyverse equivalents of the base R functions read.csv() and write.csv().

Want to read and write to an Excel file? The readxl package in the tidyverse is for you!

For the very niche option of R binary: read_rds() and write_rds().

Test your Understanding

  1. True or False: if you want to be deliberate about where here::here() points to on your computer, you need to ensure you have an .Rproj file.
  2. True or False: Suppose you have an .Rproj file in the same folder as your R script. Running here::here() from that R script will always point to that folder.

Your turn: try it out!

Open RStudio. Go to Session => Set Working Directory => Choose Directory and then pick a folder you would like to read and write data into. Then, run the following piece of code:

library(tidyverse) 
library(gapminder)

gap_asia_2007 <- gapminder %>% 
  filter(year == 2007, continent == "Asia")
head(gap_asia_2007)

Write gap_asia_2007 to a comma-separated value (csv) file named exported_file.csv with just one command:

write_csv(FILL_THIS_IN, "exported_file.csv")

Check out your files after executing this line!

Now, let’s practice reading csvs by reading the file we just wrote back into R:

gap_asia_2007_in <- read_csv("FILL_THIS_IN")

Check out your R environment after executing this line!

Also notice the output of running read_csv. This tells us about the types of variables that were read in. It’s a good habit to check this every time you run a read_ function. Sometimes we might want to change how these variable types are specified…

Main idea behind here::here()

We just wrote and read files to our current directory. If we wanted to use a different folder on our computer, we could specify something like:

  • Documents/STAT545/exported_file.csv - Mac uses forward slashes
  • Documents\STAT545\exported_file.csv - Windows uses backward slashes

However, if you wanted to make your Rproj more portable and accessible to more users in a cross-platform (between Mac, Unix, Windows users), rather than specifying every path explicitly, here::here() allows you to set relative paths much more easily.

For a quick start, check out this rant by Jenny Bryan, the founder of STAT 545,

For a longer written tutorial, read the above noted STAT 545 chapter on “Writing and Reading files”.