NYC Flights Case Study: Dates/Times, With Solutions

NYC Flights Data

The NYC Flights data set contains (among many other things) on-time performance data for all flights departing a New York City airport in 2013. Let’s load it from the package nycflights13. Let’s also load the tidyverse; the key package we will be using from it today is lubridate.

There’s lots to explore in this data set, and lots of variables! We’ll work with a super pared down version.

library(tidyverse)
library(nycflights13)

flights_demo <- flights %>% 
  select(year, month, day, hour, minute, flight, carrier)

head(flights_demo)
## # A tibble: 6 × 7
##    year month   day  hour minute flight carrier
##   <int> <int> <int> <dbl>  <dbl>  <int> <chr>  
## 1  2013     1     1     5     15   1545 UA     
## 2  2013     1     1     5     29   1714 UA     
## 3  2013     1     1     5     40   1141 AA     
## 4  2013     1     1     5     45    725 B6     
## 5  2013     1     1     6      0    461 DL     
## 6  2013     1     1     5     58   1696 UA

This currently contains the scheduled departure time of every flight, as well its carrier and flight code.

Exercises

Date-Time Creation and Extraction

I want to add a fake flight to this data set: AC 123, scheduled to depart at 9:00am on Oct 1 2013.

We can use a family of functions named as permutations of “y”, “m”, and “d” to convert character input into special Date objects.

mdy("Oct 1 2013")
## [1] "2013-10-01"
mdy("October 1st 2013")
## [1] "2013-10-01"

We just need to get the order right in what’s passed in - lubridate does the rest!

We can use a similar family of functions to convert character input into special Date-Time objects. Let’s be careful to get the timezone right too, in case it turns out to be important later.

(new_sched_dep_time <- mdy_hm("Oct 1 2013 9:00", tz = "America/New_York"))
## [1] "2013-10-01 09:00:00 EDT"

Now let’s make a 1-row tibble with the components we need: year, month, day, hour, minute, carrier, and flight code. The key will be the year(), month(), etc. functions from the lubridate package.

(new_flight <- tribble(~year, ~month, ~day, ~hour, ~minute, ~flight, ~carrier, 
                      year(new_sched_dep_time), month(new_sched_dep_time), 
                      day(new_sched_dep_time), hour(new_sched_dep_time), 
                      minute(new_sched_dep_time), 123, "AC"))
## # A tibble: 1 × 7
##    year month   day  hour minute flight carrier
##   <dbl> <dbl> <int> <int>  <int>  <dbl> <chr>  
## 1  2013    10     1     9      0    123 AC

Like magic!!! We can then add it to the flights_demo dataset using bind_rows().

flights_demo <- bind_rows(flights_demo, new_flight)

Date-Time Math

The full flights dataset has info about the departure delays of these flights. Let’s make another simplified version for demo purposes with that info.

flights_demo2 <- flights %>% 
  select(year, month, day, dep_time, sched_dep_time, dep_delay)

head(flights_demo2)
## # A tibble: 6 × 6
##    year month   day dep_time sched_dep_time dep_delay
##   <int> <int> <int>    <int>          <int>     <dbl>
## 1  2013     1     1      517            515         2
## 2  2013     1     1      533            529         4
## 3  2013     1     1      542            540         2
## 4  2013     1     1      544            545        -1
## 5  2013     1     1      554            600        -6
## 6  2013     1     1      554            558        -4

The dep_delay variable contains the number of minutes the flight departs either early or late, with a positive number if the flight departs late, and a negative number if the flight departs early. How was this variable made?

Let’s see one way how. Let’s make two Date-Time objects corresponding to the departure and scheduled departure of our fake flight. If we subtract them, then we get a difftime object.

new_sched_dep_time <- ymd_hm("2013 October 1 9:00", tz = "America/New_York")
new_dep_time <- ymd_hm("2013 Oct 1 9:15", tz = "America/New_York")

new_dep_time - new_sched_dep_time 
## Time difference of 15 mins

Beautiful! In this case, this calculation was easy to do by hand, but it would’ve been more annoying if we were calculating the time elapsed between (say) December 11th 2010 3:17am and March 24th 2011 11:51pm.

difftime objects produce human readable output, but can be a little annoying when you want output in consistent units. duration objects to the rescue - they always use seconds! Let’s do the math again but this time coerce the result to a duration object.

(duration_delay <- as.duration(new_dep_time - new_sched_dep_time))
## [1] "900s (~15 minutes)"

Finally we can convert this to minutes by creating a duration object that spans a minute using the convenience function dminutes(), and doing date-time division.

duration_delay/dminutes(1)
## [1] 15