Introduction to ggplot2
If you are new to ggplot, welcome! If you are used to base R, its probably going to take awhile for you to get the hang of the syntax, but trust me, it’s worth it. GGplot is the tidyverse package for making graphics and you can control and customize pretty much every aspect. So let’s get started.
We are going to be working with the mtcars dataset, an oldie but a goodie. So we’ll take a look at the structure of the dataframe to see what we are working with.
library(ggplot2) library(dplyr) head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
## 'data.frame': 32 obs. of 11 variables: ## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ... ## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ... ## $ disp: num 160 160 108 258 360 ... ## $ hp : num 110 110 93 110 175 105 245 62 95 123 ... ## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ... ## $ wt : num 2.62 2.88 2.32 3.21 3.44 ... ## $ qsec: num 16.5 17 18.6 19.4 17 ... ## $ vs : num 0 0 1 1 0 1 0 1 1 1 ... ## $ am : num 1 1 1 0 0 0 0 0 0 0 ... ## $ gear: num 4 4 4 3 3 3 3 4 4 4 ... ## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
To create any figure with ggplot you need to start off with the ggplot function. In this function, you include the dataset and the aesthetics section. The aesthetics is where you specify which columns you want as the x and y variables. After that, you need to add the type of plot you want to you create. For example, if I want to create a graph using points, I will add geom_point. All of the types of plots you can create follow this same syntax. Geom_whatever type of graph you want to make. If you want a line chart its geom_line, box and whisker plot is geom_boxplot.. etc etc.
Let’s start off with a simple example, making a scatterplot. I am going to use the weight as the x -variable and drat as the y-variable. This is what a standard ggplot will look like.
ggplot(data = mtcars, aes(x = wt, y = drat)) + geom_point()
Color by group
Now, we are going to begin to customize it. Let’s start by color coding the points. We want to be able to quickly identify which points are from cars with 6 cylinders and which are from cars with 4 cylinders. We are first going to set the cylinder column as a factor using the mutate function. We are then going to create a plot like we did before, with the weight as the x-variable and drat as the y-variable but we are going to add this third argument, color. We are setting the color to be equal to the cylinder level. One of the benefits of using ggplot is that it ties seamlessly in with the rest of the tidyverse functions. This makes it really easy to manipulate your data and get it set up in the format you need then send it straight into a plot and it knows to use the manipulated data.
mtcars %>% mutate(cyl = as.factor(cyl)) %>% ggplot(aes(x = wt, y = mpg, color = cyl)) + geom_point()
As you can probably tell, the style is very different from base r plots. GGplot plots are very customizable and there are many pre-made themes that you can use to change to style of the plot. You can also create your own custom theme and specify anything from the line size, font style and size of the labels, the gridlines, etc. I prefer to use the classic theme most of the time, it is most simplistic and minimal. All of the themes are called
mtcars %>% mutate(cyl = as.factor(cyl)) %>% ggplot(aes(x = wt, y = mpg, color = cyl)) + geom_point() + theme_classic()
Black and white
mtcars %>% mutate(cyl = as.factor(cyl)) %>% ggplot(aes(x = wt, y = mpg, color = cyl)) + geom_point() + theme_bw()
mtcars %>% mutate(cyl = as.factor(cyl)) %>% ggplot(aes(x = wt, y = mpg, color = cyl)) + geom_point() + theme_dark()
Now I’ll show you how to create a plot with two different datasets. I’m going to first subset the rows of data for cars with 6 cylinders and name that cars.sub.
cars.sub <- mtcars %>% filter(cyl == 6)
Then I’m going to make a scatterplot of weight and drat for all cars using
geom_point(). Then I want to add a line for the data from our cars.sub dataframe. So I’ll add a new
geom_line() function with the same x and y names but the data argument is going to be the name of the new dataframe. I also want to change the color of the line, so outside of the aesthetics function, I’ll add color = “orange”.
mtcars %>% ggplot(aes(x = wt, y = drat)) + geom_point() + geom_line(aes(x = wt, y = drat), data = cars.sub, color = "orange")
Now we have a scatter plot and a line graph with data from 2 different datasets. If you wanted to use data from the same dataframe for the line part, you wouldn’t need to specify the data argument. Also, notice that the axis names default to the column names of the data you’re plotting. You can easily change these by using the labs function and just specify which axis (x or y) and the name.
mtcars %>% ggplot(aes(x = wt, y = drat)) + geom_point() + geom_line(aes(x = wt, y = drat), data = cars.sub, color = "orange") + labs(x = "Weight", title = "2 Datasets Plot")
Lastly, I’m going to show you how to create side-by-side plots by groups in your data. I want to look at the number of cars that get the same mpg but I want to break them out by how many cylinders the car has. So I want the mpg on the x-axis and I’m going to use
geom_histogram() because a histogram plots the number of times each event (which in this case is the mpg) occurs in the data. Then I am going to add a
facet_wrap() and use the “~” symbol and the column I want it to group the data by.
mtcars %>% ggplot(aes(x = mpg)) + geom_histogram() + facet_wrap(~cyl, scales = "free")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
And now we have 3 different plots, with the number of cylinders at the top. You can see that the ranges of mpg is pretty different between the three groups. Adding scales = “free” argument into the facet_wrap function puts each plot on its own x or y scale.
This is just the beginning of what you can do with ggplot and I’ll be posting more tutorials where I go more in-depth on some features soon!