7  Adding Labels to a Plot

This chapter will focus on how we can add labels to a plot. Labels include not only the information attached to a plot’s axes, but also a title, subtitle, and footnote to a plot. Labels help us convey information about the datas’ context to people who are engaging with our plot. They also act as a tool to help those readers make more accurate interpretations about the information being presented in the visualization.

Figure 7.1 shows a visualization that examines the cost of college tuition from 1980 to 2020 and compares that increase to the amount of inflation during that same time period. This visualization has included several different labels to help readers make sense of the visualized data.

An example data visualization from Visual Capitalist. URL: https://www.visualcapitalist.com/datastream/
Figure 7.1


7.1 Titles and Subtitles

The title of a visualization serves the very important purpose of telling the reader what the visualization is about. It acts as an informative headline and provides a concise description that explains the data that are being visualized. Since it is the first text that readers see, it sets the stage for the whole visualization. In Figure 7.1 the title “The rising cost of college in the U.S.” concisely explains the gist of the visualization.

The subtitle provides additional details about the data visualization. It often highlights overall or specific findings that you want your audience to attend to. In Figure 7.1 the subtitle highlights the main finding that the percentage increase in attending college is higher than the percentage increase in inflation. It also provides the amount of increase for both college cost and inflation to further make this point.

As a second example, consider Figure 7.2.

An example data visualization from fivethirtyeight. URL: https://fivethirtyeight.com/
Figure 7.2

This visualization examines the relationship between the Rotten Tomatoes rating and box office gross for 28 Adam Sandler movies. In this example the title provides the main finding from the data analysis, and the subtitle includes more specific contextual information about the analysis undertaken.


7.2 Axis Labels

Axis labels are typically included to provide important cues for interpreting data visualizations. They define the attributes that are being plotted and also often give information about the scales of those attributes. Figure 7.1 provides a label for the y-axis which informs us that what is being plotted is the change in percentage. Notice that there is no label on the x-axis in this visualization.

Despite what your math teachers in high school may have taught you, not every axis needs to be labelled—although it can’t hurt. In this visualization, the data plotted along the x-axis is year. When you look at this visualization, it is instantly obvious what the values on this axis are without a label telling you this. Moreover, after reading the title and subtitle, you could intuit that these values represent years without a label. In contrast, Figure 7.2 includes a label on both axes, since what is being plotted there needs additional text to help the reader interpret the visualization.


7.3 Caption

In data visualizations, the caption is typically used to indicate the source of the underlying data. This is important because it helps give credibility to the visualization by making the source of the data transparent. It can also be used to provide notes that give additional information about the underlying data (e.g., how values were computed). The captions in Figure 7.1 do both of these. The caption on the bottom-right of the visualization provides the source of the underlying data, while the caption on the bottom-left gives us additional information about the basis for how the CPI-based inflation was calculated. In Figure 7.2, the caption only provides the source of the data used in the visualization.


7.4 Adding Labels to a Plot in R

To illustrate how we can add labels to a plot, we are going to re-create the horizontal bar chart of the number of UMN buildings classified for each type of primary use (e.g., parking, residential). As a reminder, in the first code chunk of your QMD document, load the {tidyverse} and {ggformula} libraries. Also use the read_csv() function to import the umn-buildings.csv data and assign the data into an object called umn. You also might want to view the data to ensure that it imported into the QMD document correctly. (Don’t forget to add code comments as well!)

# Load libraries
library(ggformula)
library(tidyverse)

# Import data
umn <- read_csv("data/umn-buildings.csv")

# View data
umn
# A tibble: 393 × 33
   bldg_num name     legal_name abbr  campus address type  TRI_BLDG_T TRI_TRITEN
   <chr>    <chr>    <chr>      <chr> <chr>  <chr>   <chr> <chr>      <chr>     
 1 332      Cattle … Cattle Fe… CFS1  St. P… 1894 B… Acad… Agricultu… Owned     
 2 333      Cattle … Cattle Fe… CFS2  St. P… 1896 B… Acad… Agricultu… Owned     
 3 302      Beef Ca… Beef Catt… BCB   St. P… 1920 B… Acad… Agricultu… Owned     
 4 343      Botany … Botany Fi… BFH   St. P… 2033 F… Acad… Agricultu… Owned     
 5 345      Agronom… Agronomy … AGRSH St. P… 1472 G… Acad… Agricultu… Owned     
 6 346      Bee Lab… Bee Lab S… BLAB… St. P… 1634 G… Acad… Agricultu… Owned     
 7 347      Bee Res… Bee Resea… BRF   St. P… 1634 G… Acad… Agricultu… Owned     
 8 351      Hog Bar… Hog Barn … HOGB3 St. P… 1845 B… Acad… Agricultu… Owned     
 9 355      Farm Cr… Farm Crop… FCFH  St. P… 1922 H… Acad… Agricultu… Owned     
10 392      Sheep R… Sheep Res… SHEE… St. P… 1794 D… Acad… Agricultu… Owned     
# ℹ 383 more rows
# ℹ 24 more variables: TRI_YEAR_O <dbl>, TRI_DATE_O <chr>, TRI_YEAR_L <dbl>,
#   mortenson_dates <dbl>, ada <chr>, nrhp_status <chr>, gross_sqft <dbl>,
#   assignable_sqft <dbl>, net_sqft <dbl>, structural_sqft <dbl>,
#   usable_sqft <dbl>, BLDG_DETAI <chr>, BLDG_PHOTO <chr>,
#   women_restrooms <dbl>, men_restrooms <dbl>, undesignated_restrooms <dbl>,
#   classrooms <dbl>, laboratories <dbl>, offices_spaces <dbl>, …

To create our initial bar chart we used the following syntax in a new code chunk.

gf_counts(type ~ ., data = umn, color = "black", fill = "maroon")
Horizontal Bar chart of the number of each building type owned by UMN. This plot includes a border and fill color.
Figure 7.3: Horizontal bar chart of the number of each building type owned by UMN. This plot includes a border and fill color.


7.4.1 Adding Labels to the UMN Buildings Bar Chart

To add labels, we are going to modify our original plot by using the gf_labs() function. We will connect the original gf_counts() function and the gf_labs() function together using the |> operator. Connecting functions together is called piping and the operator is referred to as the pipe. So the bones of the code will look like this:

gf_counts() |> gf_labs()

Good coding practice suggests that any modifications to the plot go on a separate line of code; this makes it easier to understand the code and also makes it easier for humans to read. The pipe operator is left on the same line of code as the original gf_counts() function. Then the gf_labs() function is placed on the subsequent line of code and indented. (The indentation helps humans see that this is all part of the same plot.) Here is what that looks like:

gf_counts() |>
  gf_labs()

There are several arguments for gf_labs() depending on which label you want to add/edit. Five arguments1 that we will use a lot include:

  • x=: This argument adds/edits the label on the x-axis.
  • y=: This argument adds/edits the label on the y-axis.
  • title=: This argument adds/edits the title of the plot.
  • subtitle=: This argument adds/edits the subtitle of the plot.
  • caption=: This argument adds/edits the caption of the plot.

Each of these arguments takes a text string (i.e., enclosed in quotation marks) that gives the text you want to appear in that label. As an example consider the rendered plot using the following labels.

gf_counts(type ~ ., data = umn, color = "black", fill = "maroon") |>
  gf_labs(
    x = "Number of Buildings",
    y = "Building Type",
    title = "Buildings Owned by the University of Minnesota",
    subtitle = "The number buildings for each designated type on the Twin Cities campus.",
    caption = "Source: UMN Facility Information Services"
  )
Horizontal bar chart of the number of each building type owned by UMN. Labels have been added to help readers better understand the visualization.
Figure 7.4: Horizontal bar chart of the number of each building type owned by UMN. Labels have been added to help readers better understand the visualization.

Writing good labels is a process, and much like essays or papers, your first draft may not be the final draft. Looking at our labels, there are several improvements that can be made: (1) The title doesn’t capture the visualization well. It also need to stand out more from the subtitle. (2) the subtitle needs additional text to explain “designated type”. (3) Since it is clear from the title and subtitle that we are plotting building types, we may not need a label for the y-axis. To omit a label, don’t add any text between the quotation marks for that label.

gf_counts(type ~ ., data = umn, color = "black", fill = "maroon") |>
  gf_labs(
    x = "Number of Buildings",
    y = "",
    title = "PRIMARY DESIGNATION OF BUILDINGS ON THE TWIN CITIES CAMPUS",
    subtitle = "Each building owned by the University o Minnesota has a primary designated use (e.g., residential). Most of the buildings on the Twin Cities campus are designated as Academic/Administration or Residential.",
    caption = "Source: UMN Facility Information Services"
  )
Horizontal bar chart of the number of each building type owned by UMN. Labels have been added to help readers better understand the visualization.
Figure 7.5: Horizontal bar chart of the number of each building type owned by UMN. Labels have been added to help readers better understand the visualization.

This draft is better, but (1) our subtitle runs off the page, and (2) we might want to add vertical space between the title and subtitle, and between the x-axis label and the caption.

PROTIP

In long labels, you can add a line break with \n. This means newline. You can add as many of these as you need to make sure that text doesn’t run off the page. They can also be added at the beginning of the label to add vertical space between elements in the visualization.

gf_counts(type ~ ., data = umn, color = "black", fill = "maroon") |>
  gf_labs(
    x = "Number of Buildings",
    y = "",
    title = "PRIMARY DESIGNATION OF BUILDINGS ON THE TWIN CITIES CAMPUS",
    subtitle = "\nEach building owned by the University o Minnesota has a primary designated use (e.g., residential).\nMost of the buildings on the Twin Cities campus are designated as Academic/Administration or Residential.",
    caption = "\nSource: UMN Facility Information Services"
  )
Horizontal bar chart of the number of each building type owned by UMN. Labels have been added to help readers better understand the visualization.
Figure 7.6: Horizontal bar chart of the number of each building type owned by UMN. Labels have been added to help readers better understand the visualization.


Exercises: Your Turn

Consider the following data visualization that shows the amount of playing time for each of the Minnesota Lynx players for the 2025 regular season.

A heatmap showing the amount of playing time for each of the Minnesota Lynx players for the 2025 regular season.
Figure 7.7
  1. This visualization does not have a label for either the x- or y-axis. Explain why this is probably an okay decision for this visulaization.
  1. Write the syntax you would add into the gf_labs() function if you were creating this plot using R.

Consider the following data from the umn-bike-amenities.csv file.

Imagine that this has been imported into a data object called bike_amenities. Also suppose we wanted to create a bar chart of the campus attribute, which provides the campus location (East Bank, West Bank, or St. Paul) of the bike amenities in the data.

  1. Write out the syntax to create a horizontal bar chart of the campus attribute. Also color the bar borders black and fill the bars with yellow. Lastly, add labels to help readers better interpret the plot.



  1. There are other arguments that can also be added to gf_labs() that you will learn about later in the textbook. These helps us change the labels for our plot legends, etc. ↩︎