This lesson is being piloted (Beta version)

Making basic plots using ggplot

Overview

Teaching: 15 min
Exercises: 0 min
Questions
  • How do I make more sophisticated plots in ggplot?

Objectives
  • Become familiar with ggplot and dplyr’s methods to summarize and plot data

  • Explore how ggplot aesthetics and geometry work together

Background

ggplot2 takes advantage of tidyverse pipes and chains of data manipulation as well as separating the aesthetics of the plot (what are we plotting) from the styling of the plot (how should we show it?), in order to produce readable and malleable plotting code.

general formula ggplot(data = <DATA>, mapping = aes(<MAPPINGS>)) + <GEOM_FUNCTION>()

# Assign plot to a variable
seaTroutplot <- ggplot(data = seaTrout, 
                       mapping = aes(x = lon, y = lat)) #can assign a base plot to data

#Draw the plot
seaTroutplot + 
  geom_point(alpha=0.1, 
             color = "blue") #layer whatever geom you want onto your plot template
                             #very easy to explore diff geoms without re-typing
                             #alpha is a transparency argument in case points overlap

Exploratory Plots

Let’s start with a practical example. First we’ll plot the basic shape of this data summary, then we’ll look at applying more style choices to improve our plot’s readability.

# monthly longitudinal distribution of salmon smolts and sea trout

seaTrout %>%
  group_by(m=month(DateTime), tag.ID, Species) %>% #make our groups
  summarise(mean=mean(lon)) %>% #mean lon
  ggplot(aes(m %>% factor, mean, colour=Species, fill=Species))+ #the data is supplied, but no info on how to show it!
  geom_point(size=3, position="jitter")+   # draw data as points, and use jitter to help see all points instead of superimposition
  coord_flip()+   #flip x y    
  scale_colour_manual(values=c("grey", "gold"))+  # change the color palette to reflect species a bit better
  scale_fill_manual(values=c("grey", "gold"))+ 
  geom_boxplot()+ #another layer
  geom_violin(colour="black") #aaaaaand another layer

After we apply all the styling, our grouped time factor’s on the Y axis to highlight the longitudinal change that we’re showing on the X axis, and we’re seeing box plots and violins on top of the ‘raw’ data points to provide additional context. We’ve also made a few style choices to ensure we can tease apart all these overlapping plots a bit better.

There are other ways to present a summary of data like this that we might have chosen. geom_density2d() will give us a KDE for our data points and give us some contours across our chosen plot axes.

seaTrout_full %>% #doesnt work on the subsetted data, back to original dataset for this one
  group_by(m=month(DateTime), tag.ID, Species) %>%
  summarise(mean=mean(lon)) %>%
  ggplot(aes(m, mean, colour=Species, fill=Species))+
  geom_point(size=3, position="jitter")+
  coord_flip()+
  scale_colour_manual(values=c("grey", "gold"))+
  scale_fill_manual(values=c("grey", "gold"))+
  geom_density2d(size=2, lty=1) #this is the only difference from the plot above 

Here we start to potentially see why we might like to use multiple plots for each subset, or facets, for our two distinct species, as they’re hard to see on top of one another in this way. Switching to stat_density_2d will fill in my levels (and obliterate my ability to see the underlying data points). I’m also going to use labs() to properly label my axes.

seaTrout %>% #maybe try with full dataset seaTrout1 as well, up to you
  group_by(m=month(DateTime), tag.ID, Species) %>%
  summarise(mean=mean(lon)) %>%
  ggplot(aes(m, mean))+
  stat_density_2d(aes(fill = stat(nlevel)), geom = "polygon")+ #new plot type
  geom_point(size=3, position="jitter")+
  coord_flip()+
  facet_wrap(~Species)+ #faceting our plot by species! we already grouped them
  scale_fill_viridis_c() +
  labs(x="Mean Month", y="Longitude (UTM 33)") #axis labeling

Facets are a great way to highlight differences across your groups, and the most obvious next choice for a grouping is by individual tagged animal. Be aware of how many plots you are going to end up with!

# per-individual density contours - lots of facets!
seaTrout %>%
  ggplot(aes(lon, lat))+
  stat_density_2d(aes(fill = stat(nlevel)), geom = "polygon")+
  facet_wrap(~tag.ID)

So reminder, this is all just exploratory work so far. Using the big individuals plot here, we could identify interesting individuals to subset away for further exploration, or pick out the potential non-survivors and subset them away from the pack. But we’ve worked mostly with summaries of movement data so far, taking advantage of what we know about our domain without actually looking at it yet. Next we’ll do something a bit more spatially-aware.

Key Points

  • You can feed output from dplyr’s data manipulation functions into ggplot using pipes.

  • Plotting various summaries and groupings of your data is good practice at the exploratory phase, and dplyr and ggplot make iterating different ideas straightforward.