Pretty-Printing Tables in Markdown

Often it is useful to format table output to make it look good or to adhere a particular style (e.g., APA). There are several packages that help in this endeavor when working in an Rmarkdown document. Below the primary tools used are:

  • The kable() function from the knitr package; and
  • Functions from the kableExtra package.

Other packages for formatting tables, among others, include the gt package, the huxtable package, and the expss package. For complete APA formatting check out the papaja package.

The primary input to the kable() function is a data frame. This data frame will include the primary contents for you table. A data frame can either be created as the ouput of a function or directly using the data.frame() function. Below I will create and format a couple different common tables produced for statistical reports. To do so, I will use data from ed-schools-2018.csv file (see the data codebook here). These data include institutional-level attributes for several graduate education schools/programs rated by U.S. News and World Report in 2018.

# Load libraries
library(AICcmodavg)
library(broom)
library(corrr)
library(dplyr)
library(knitr)
library(kableExtra)
library(readr)
library(tidyr)

# Read in data
ed = read_csv(file = "~/Documents/github/epsy-8252/data/ed-schools-2018.csv")

# Drop rows with missing data
educ = ed %>%
  drop_na()

# Create log-transformed variables
educ = educ %>%
  mutate(
    Lpeer = log(peer),
    Ldoc_accept = log(doc_accept),
    Lenroll = log(enroll)
    )

Summary Statistics Table

Say we wanted to produce a table of means and standard deviations for the variables: peer, doc_accept, and enroll. Furthermore, we want these for both the raw (untransformed) and log-transformed versions of these variables. Here is a sketch of what the table should look like:

[INSERT SKETCH]

To begin we will set up a data frame that includes the information from the table. We do this manually to illustrate use of the data.frame() function to set up a data frame.

tab_01 = data.frame(
  Measure = c("Peer rating", "Ph.D. acceptance rate", "Enrollment"),
  M_1  = c(mean(educ$peer), mean(educ$doc_accept), mean(educ$enroll)),
  SD_1 = c(sd(educ$peer), sd(educ$doc_accept), sd(educ$enroll)),
  M_2  = c(mean(educ$Lpeer), mean(educ$Ldoc_accept), mean(educ$Lenroll)),
  SD_2 = c(sd(educ$Lpeer), sd(educ$Ldoc_accept), sd(educ$Lenroll))
)

tab_01
                Measure        M_1        SD_1      M_2      SD_2
1           Peer rating   3.312295   0.4893203 1.187198 0.1439847
2 Ph.D. acceptance rate  40.113115  20.2276300 3.525419 0.6461164
3            Enrollment 969.762295 664.9454219 6.657939 0.7228670

We can now use the kable() function to rename the columns, round the numeric values, and set a caption.

kable(
  tab_01,
  col.names = c("Measure", "*M*", "*SD*", "*M*", "*SD*"),
  digits = 2,
  caption = "Means and Standard Deviations of Three Measures of Graduate Programs of Education ($n=122$)"
  )
Table 1.1: Means and Standard Deviations of Three Measures of Graduate Programs of Education (\(n=122\))
Measure M SD M SD
Peer rating 3.31 0.49 1.19 0.14
Ph.D. acceptance rate 40.11 20.23 3.53 0.65
Enrollment 969.76 664.95 6.66 0.72

Finally, we use functions from the kableExtra package to add our top header row.

kable(
  tab_01,
  col.names = c("Measure", "*M*", "*SD*", "*M*", "*SD*"),
  align = c("l", "c", "c", "c", "c"),
  digits = 2,
  caption = "Means and Standard Deviations of Three Measures of Graduate Programs of Education ($n=122$)"
  ) %>%
  add_header_above(
    header = c(" " = 1, "Untransformed" = 2, "Log-transformed" = 2)
    ) %>%
  footnote(
    general = "Variables were log-transformed using the natural logarithm.",
    general_title = "Note.",
    footnote_as_chunk = TRUE
    )
Table 1.2: Means and Standard Deviations of Three Measures of Graduate Programs of Education (\(n=122\))
Untransformed
Log-transformed
Measure M SD M SD
Peer rating 3.31 0.49 1.19 0.14
Ph.D. acceptance rate 40.11 20.23 3.53 0.65
Enrollment 969.76 664.95 6.66 0.72
Note. Variables were log-transformed using the natural logarithm.

Correlation Table

For our second example, say we wanted to produce a table of pairwise correlations for the variables: peer, doc_accept, and enroll. To begin we will again set up a data frame, but this time we will generate it using functions from the corrr package.

tab_02 = educ %>%
  select(peer, doc_accept, enroll) %>%
  correlate() %>%
  shave(upper = TRUE) %>%
  fashion(decimals = 2, na_print = "—") 

tab_02
     rowname peer doc_accept enroll
1       peer    —          —      —
2 doc_accept -.54          —      —
3     enroll  .10       -.03      —

Now we change the values in the rownames column by mutating in new values, and pipe this into the kable() function, where we will change the column name and add a caption. Keeping the default alignment will align the decimal points within columns.

tab_02 %>%
  mutate(
    rowname = c("1. Peer rating", "2. Ph.D. acceptance rate", "3. Enrollment")
  ) %>%
  kable(
    caption = "Correlations between Three Measures of Graduate Programs of Education",
    col.names = c("Measure", "1", "2", "3")
  )
Table 1.3: Correlations between Three Measures of Graduate Programs of Education
Measure 1 2 3
  1. Peer rating
  1. Ph.D. acceptance rate
-.54
  1. Enrollment
.10 -.03

Regression Table: Single Model

It is common to report the coefficient-level information from a fitted regression model in a table. The nice thing about using the tidy() function to obtain coefficient-level information from a fitted model is that the output is formatted as a data frame. Thus, we can use the output from tidy() directly in the kable() function. Below I fit a regression model and then use piping to obtain the coefficient-level information and create the table.

lm(peer ~ 1 + doc_accept + gre_verbal, data = educ) %>%
  tidy() %>%
  kable()
term estimate std.error statistic p.value
(Intercept) -1.5268 1.6668 -0.916 0.3615
doc_accept -0.0107 0.0019 -5.519 0.0000
gre_verbal 0.0340 0.0106 3.221 0.0016

To format this, we might want to change the column names and round the numerical information to a better number of digits; typically p-values are rounded to three decimal places and coefficients, standard errors and t-values are rounded to two digits.

lm(peer ~ 1 + doc_accept + gre_verbal, data = educ) %>%
  tidy() %>%
  kable(
    caption = "Coefficient-Level Estimates for a Model Fitted to Estimate Variation in Peer Ratings.",
    col.names = c("Predictor", "B", "SE", "t", "p"),
    digits = c(0, 2, 3, 2, 3)
  )
Table 1.4: Coefficient-Level Estimates for a Model Fitted to Estimate Variation in Peer Ratings.
Predictor B SE t p
(Intercept) -1.53 1.667 -0.92 0.362
doc_accept -0.01 0.002 -5.52 0.000
gre_verbal 0.03 0.011 3.22 0.002

Last things to fix are the predictor names and the p-values. The rounding of the p-values has rendered them as zero. We can use the pvalue() function from the scales package to better format the column of p-values. This is carried out prior to piping the output into the kable() function by changing the values in the p.value column. (Note that rather than load a package for a single function we can specify the package directly prior to the function name using two colons; scales::pvalue().) Similarly, we can change the names in the term column at the same time. Lastly, we note that the SEs were truncated when we rounded, so we fix that by increasing the number of digits displayed in that column.

lm(peer ~ 1 + doc_accept + gre_verbal, data = educ) %>%
  tidy() %>%
  mutate(
    p.value = scales::pvalue(p.value),
    term = c("Intercept", "Ph.D. acceptance rate", "Verbal GRE score")
  ) %>%
  kable(
    caption = "Coefficient-Level Estimates for a Model Fitted to Estimate Variation in Peer Ratings.",
    col.names = c("Predictor", "B", "SE", "t", "p"),
    digits = c(0, 2, 3, 2, 3)
  )
Table 1.5: Coefficient-Level Estimates for a Model Fitted to Estimate Variation in Peer Ratings.
Predictor B SE t p
Intercept -1.53 1.667 -0.92 0.362
Ph.D. acceptance rate -0.01 0.002 -5.52 <0.001
Verbal GRE score 0.03 0.011 3.22 0.002

One last tweak is that now in the column of p-values, the alignment of the decimal place is off (default alignment for text is left-aligned). We can fix this by changing the alignment to be right-aligned. This is useful for numeric values so that the decimal points within a column line up.

lm(peer ~ 1 + doc_accept + gre_verbal, data = educ) %>%
  tidy() %>%
  mutate(
    p.value = scales::pvalue(p.value),
    term = c("Intercept", "Ph.D. acceptance rate", "Verbal GRE score")
  ) %>%
  kable(
    caption = "Coefficient-Level Estimates for a Model Fitted to Estimate Variation in Peer Ratings.",
    col.names = c("Predictor", "B", "SE", "t", "p"),
    digits = c(0, 2, 3, 2, 3),
    align = c("l", "r", "r", "r", "r")
  )
Table 1.6: Coefficient-Level Estimates for a Model Fitted to Estimate Variation in Peer Ratings.
Predictor B SE t p
Intercept -1.53 1.667 -0.92 0.362
Ph.D. acceptance rate -0.01 0.002 -5.52 <0.001
Verbal GRE score 0.03 0.011 3.22 0.002

Regression Table: Multiple Models

There are several specific packages that help us create tables of regression results. The Stargazer package, the texreg package and the finalfit package are but a few of these.

I tend to use both the texreg package (more customizable) and the stargazer package (easier). Below I illustrate how to create a table of regression results using the stargazer package. First we fit a few models.

# Fit candidate models
lm.1 = lm(peer ~ 1 + doc_accept, data = educ)
lm.2 = lm(peer ~ 1 + enroll, data = educ)
lm.3 = lm(peer ~ 1 + doc_accept + enroll, data = educ)

After loading the stagazer package, the stargazer() function can be used to create a basic table of regression results. The type= argument defaults to latex, so if you are rendering to an HTML document, you need to change this to type="html".

library(stargazer)

stargazer(lm.1, lm.2, lm.3, type = "html")

<table style="text-align:center"><tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="3"><em>Dependent variable:</em></td></tr>
<tr><td></td><td colspan="3" style="border-bottom: 1px solid black"></td></tr>
<tr><td style="text-align:left"></td><td colspan="3">peer</td></tr>
<tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td><td>(3)</td></tr>
<tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">doc_accept</td><td>-0.013<sup>***</sup></td><td></td><td>-0.013<sup>***</sup></td></tr>
<tr><td style="text-align:left"></td><td>(0.002)</td><td></td><td>(0.002)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">enroll</td><td></td><td>0.0001</td><td>0.0001</td></tr>
<tr><td style="text-align:left"></td><td></td><td>(0.0001)</td><td>(0.0001)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">Constant</td><td>3.836<sup>***</sup></td><td>3.238<sup>***</sup></td><td>3.769<sup>***</sup></td></tr>
<tr><td style="text-align:left"></td><td>(0.083)</td><td>(0.078)</td><td>(0.101)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td></tr>
<tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>122</td><td>122</td><td>122</td></tr>
<tr><td style="text-align:left">R<sup>2</sup></td><td>0.292</td><td>0.011</td><td>0.300</td></tr>
<tr><td style="text-align:left">Adjusted R<sup>2</sup></td><td>0.286</td><td>0.003</td><td>0.288</td></tr>
<tr><td style="text-align:left">Residual Std. Error</td><td>0.414 (df = 120)</td><td>0.489 (df = 120)</td><td>0.413 (df = 119)</td></tr>
<tr><td style="text-align:left">F Statistic</td><td>49.400<sup>***</sup> (df = 1; 120)</td><td>1.328 (df = 1; 120)</td><td>25.490<sup>***</sup> (df = 2; 119)</td></tr>
<tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td colspan="3" style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
</table>

The function outputs raw HTML (or LaTeX), so to get it to form into a table you need to include results='asis' in your Rmarkdown chunk.

```{r message=FALSE, results='asis'} 
library(stargazer)

stargazer(lm.1, lm.2, lm.3, type = "html")
```
Dependent variable:
peer
(1) (2) (3)
doc_accept -0.013*** -0.013***
(0.002) (0.002)
enroll 0.0001 0.0001
(0.0001) (0.0001)
Constant 3.836*** 3.238*** 3.769***
(0.083) (0.078) (0.101)
Observations 122 122 122
R2 0.292 0.011 0.300
Adjusted R2 0.286 0.003 0.288
Residual Std. Error 0.414 (df = 120) 0.489 (df = 120) 0.413 (df = 119)
F Statistic 49.400*** (df = 1; 120) 1.328 (df = 1; 120) 25.490*** (df = 2; 119)
Note: p<0.1; p<0.05; p<0.01

There are several arguments in the stargazer() function to customize the table.

stargazer(
  lm.1, lm.2, lm.3,
  type = "html",
  title = "Three Regression Models Predicting Variation in Peer Ratings",
  column.labels = c("Model A", "Model B", "Model C"),
  colnames = FALSE,
  model.numbers = FALSE,
  dep.var.caption = " ",
  dep.var.labels = "Peer rating (1-5 scale)",
  covariate.labels = c("Ph.D. acceptance rate", "Enrollment"),
  keep.stat = c("rsq", "f"),
  notes.align = "l",
  add.lines = list(c("Corrected AIC", round(AICc(lm.1), 1), round(AICc(lm.2), 1), round(AICc(lm.3), 1))),
  out = "images/table1.html"
  )
Three Regression Models Predicting Variation in Peer Ratings
Peer rating (1-5 scale)
Model AModel BModel C
Ph.D. acceptance rate-0.013***-0.013***
(0.002)(0.002)
Enrollment0.00010.0001
(0.0001)(0.0001)
Constant3.836***3.238***3.769***
(0.083)(0.078)(0.101)
Corrected AIC135175.7135.7
R20.2920.0110.300
F Statistic49.400*** (df = 1; 120)1.328 (df = 1; 120)25.490*** (df = 2; 119)
Note:*p<0.1; **p<0.05; ***p<0.01

There is a known bug with the stargazer table not printing the asterisks next to the significance values in the notes section of the table when outputting to HTML. The solution as documented here is to output the html code to an external file using the out= argument in stargazer() and then inserting that html code in a new code chunk via the includeHTML() function from the htmltools package.

The add.lines= argument adds a line to the bottom of the output. This argument takes a list that includes the name you want to output in the regression table and then the value to output for each of the models. Here we computed the corrected AIC value using the AICc() function from the AICmodavg package for each of the models. (Note: We will learn about this in the More Information Criteria for Model Selection unit.)