ggplot2

Amelia McNamara

June 8, 2017

The Grammar of Graphics

Layers

Visual dimensions (from this morning)

Graphics in R

There are many ways to make graphics in R.

ggplot2

ggplot2 is an R package by Hadley Wickham that lets you make beautiful R graphics (relatively) easily.

It’s part of the tidyverse, which I recommend everyone get to know (dplyr, stringr, lubridate, broom… and many more).

The name ggplot2 refers to The Grammar of Graphics, and it is an implementation of Wilkinson’s ideas in R.

Getting started

Let’s start by going through the intro to R and RStudio lab. You’re going to learn lots more about R as the weeks progress, but we need you to have a few basic skills for this seminar.

R packages

R has many “packages,” which are add-ons to the basic functionality of the language. To use a package, you need to install it (once) and load it (every time you want to use it).

install.packages("ggplot2")
library(ggplot2)

Data

To start, I’m going to load some data,

arbuthnot <- read.csv("http://www.openintro.org/stat/data/arbuthnot.csv", header=T)
glimpse(arbuthnot)
## Observations: 82
## Variables: 3
## $ year  <int> 1629, 1630, 1631, 1632, 1633, 1634, 1635, 1636, 1637, 16...
## $ boys  <int> 5218, 4858, 4422, 4994, 5158, 5035, 5106, 4917, 4703, 53...
## $ girls <int> 4683, 4457, 4102, 4590, 4839, 4820, 4928, 4605, 4457, 49...

qplot()– the easy way out

qplot(x = year, y = girls, data = arbuthnot)

qplot syntax

In order to get qplot() to work, you need to list the variable(s) you want to plot, and then tel R where to “look” for that variable with “data=”.

Since it’s a quick plot, R will guess what kind of mapping you want for your variables.

ggplot()

But, in order to really harness the power of ggplot2 you need to use the more general ggplot() command. The idea of the package is you can “layer” pieces on top of a plot to build it up over time.

You always need to use a ggplot() call to initialize the plot. I usually put my dataset in here, and at least some of my “aesthetics.” But, one of the things that can make ggplot2 tough to understand is that there are no hard and fast rules.

p1 <- ggplot(aes(x=year, y=girls), data=arbuthnot)

If you try to show p1 at this point, you will get “Error: No layers in plot.” This is because we haven’t given it any geometric objects yet.

geoms

In order to get a plot to work, you need to use “geoms” (geometric objects) to specify the way you want your variables mapped to graphical parameters.

p1 + geom_point()

An entire plot

ggplot(aes(x=year, y=girls), data=arbuthnot) + geom_point()

Or

ggplot() + geom_point(aes(x=year, y=girls), data=arbuthnot)

Or

ggplot(arbuthnot) + geom_point(aes(x=year, y=girls))

Same data, different geom

p1 + geom_bin2d()

Saving your work (or not)

Notice that I haven’t been saving these geom layers– I’m just running

p1 + [something]

to see what happens. But, I can save the new version to start building up my plot,

p2 <- p1 + geom_point()

Better labels

p2 <- p2 + xlab("Number of girls born") + ylab("Year") + 
 guides(fill=guide_legend(title="Number of births from Arbuthnot data"))
p2

ATUS data

For your lab, you are going to play with the American Time Use Survey data.

atus <- read.csv("https://raw.githubusercontent.com/AmeliaMN/SummerDataViz/master/IntroToViz/atus.csv", header=T)

The ATUS is a product of the Bureau of Labor Statistics. Each row is a person, and each variable is some information about that person. The first few variables are demographic, and the rest are the number of minutes per day (on average) the person spends on a variety of activities.

Questions

Resources for ggplot2