FEV Case Study: Graphing

Review

We’ll continue exploring the FEV data set from last week. Let’s start by loading the data and required packages.

library(rigr) 
library(dplyr)
library(ggplot2)

fev_tbl <- as_tibble(fev) %>% mutate(across(sex:smoke, ~ as.factor(.x)))

Last week, we found that the mean FEV in the smoking group was substantially higher than the average FEV in the non-smoking group. That is, the smokers appear to have better lung function than the non-smokers.

We also had a theory as to why: the association between FEV and smoking status may be confounded, eg. by age/size.

We’ll investigate further this week with our newly acquired plotting skills!

Here are some helpful resources for making plots, now that we are moving towards less “guided” exercises:

Smoking and FEV (unadjusted)

Exercise 1

Now that we have plotting tools, let’s make a plot that compares the FEV of smokers to the FEV of non-smokers.

# FILL IN HERE

This broadly tells the same story as the summaries we calculated last week, but conveys more information and is more visually engaging.

Searching for potential confounders

Let’s do some digging to see if we can root out some potential confounders.

Exercise 2

Last week, we found that the youngest patient who smokes is 9, suggesting that there is a difference in the age distribution among smokers and non-smokers. Make a plot that compares these two distributions.

# FILL IN HERE

Again, we see that the smokers are overall older than the non-smokers.

Exercise 3

We think that age should be related to height, which in turn should be related to FEV. Let’s investigate that more systematically now with plotting.

How would you like to investigate that with plotting? Here’s one suggestion, though you don’t have to take it: make a plot with two panels, one that has a scatterplot of height versus age for the female patients, and one that has a scatterplot of height versus age for the male patients.

# FILL IN HERE

We see that the within-sex trend is similar: height is linear-ish at younger ages, and flat-ish at older ages. Boys wind up taller at older ages.

We will then make a scatterplot of FEV versus height.

ggplot(fev_tbl, aes(x = height, y = fev)) + 
  geom_jitter(width=0.2, alpha = 0.75) + 
  ggtitle("FEV versus height") +
  ylab("FEV (l/s)") +
  xlab("Height (inches)") + 
  theme_bw()

We see that taller participants generally have higher FEV.

Smoking and lung function: a more nuanced look

Now that we know that the smokers are older and bigger and have higher FEV, let’s look at the relationship between FEV and smoking status adjusted for height.

Exercise 4

Make a scatterplot of FEV versus height, with points marked by smoking status.

# FILL IN HERE

Based on this plot, it seems like the FEV of smokers and non-smokers of the same height is pretty similar.