Unit 7: Model Evidence

In this set of notes, you will learn more about using information criteria to select a model from a set of candidate models.

Preparation

Before class you will need read the following:

Burnham, K. P., Anderson, D. R., & Huyvaert, K. P. (2010). AIC model selection and multimodel inference in behavioral ecology: Some background, observations, and comparisons. Behavioral Ecology and Sociobiology, 65(1), 23–35.

7.1 Dataset and Research Question

In this set of notes, we will use the data in the ed-schools-2018.csv file (see the data codebook here). These data include institutional-level attributes for several graduate education schools/programs rated by U.S. News and World Report in 2018.

# Load libraries
library(AICcmodavg)
library(broom)
library(corrr)
library(dplyr)
library(ggplot2)
library(readr)
library(sm)
library(tidyr)

# Read in data
ed = read_csv(file = "~/Documents/github/epsy-8252/data/ed-schools-2018.csv")

# Drop rows with missing data
educ = ed %>%
  drop_na()

head(educ)

# A tibble: 6 x 13
   rank school score  peer expert_score gre_verbal gre_quant doc_accept
  <dbl> <chr>  <dbl> <dbl>        <dbl>      <dbl>     <dbl>      <dbl>
1     1 Harva…   100   4.4          4.6        163       159        4.5
2     2 Stanf…    99   4.6          4.8        162       160        6.1
3     3 Unive…    96   4.2          4.3        156       152       29.1
4     3 Unive…    96   4.1          4.5        163       157        5  
5     3 Unive…    96   4.3          4.5        155       153       26.1
6     6 Johns…    95   4.1          4.1        164       162       27.4
# … with 5 more variables: phd_student_faculty_ratio <dbl>,
#   phd_granted_per_faculty <dbl>, funded_research <dbl>,
#   funded_research_per_faculty <dbl>, enroll <dbl>

In the last set of notes we used these data to examine the factors our academic peers use to rate graduate programs. Based on the substantive literature we had three scientific working hypotheses about how programs are rated:

H1: Student-related factors drive the perceived academic quality of graduate programs in education.
H2: Faculty-related factors drive the perceived academic quality of graduate programs in education.
H3: Institution-related factors drive the perceived academic quality of graduate programs in education.

After doing some initial exploration, we translates these working hypotheses into statistical models that we then fitted to the data. The four candidate models were:

$\begin{split} \mathbf{Model~1:~}& \mathrm{Peer~Rating}_i = \beta_0 + \beta_1(\mathrm{GREQ}_i) + \beta_2(\mathrm{GREQ}^2_i) + \beta_3(\mathrm{GREQ}^3_i) + \epsilon_i \\ \mathbf{Model~2:~}& \mathrm{Peer~Rating}_i = \beta_0 + \beta_1(\mathrm{Funded~research}_i) + \beta_2(\mathrm{Funded~research}^2_i) + \beta_3(\mathrm{PhDs~granted}_i) + \beta_4(\mathrm{PhDs~granted}^2_i) + \epsilon_i \\ \mathbf{Model~3:~}& \mathrm{Peer~Rating}_i = \beta_0 + \beta_1(\mathrm{PhD~acceptance~rate}_i) + \beta_2(\mathrm{PhD~student\mbox{-}to\mbox{-}faculty~ratio}_i) + \beta_3(\mathrm{Enrollment}_i) + \epsilon_i \\ \mathbf{Model~4:~}& \mathrm{Peer~Rating}_i = \beta_0 + \beta_1(\mathrm{PhD~acceptance~rate}_i) + \beta_2(\mathrm{PhD~student\mbox{-}to\mbox{-}faculty~ratio}_i) + \epsilon_i \end{split}$

where peer rating, funded research, Ph.D.s granted, Ph.D. acceptance rate, enrollment, and Ph.D. student-to-faculty ratio have all been log-transformed. We will also consider a fourth model that omits enrollment from the institution-related factors model.

# Create log-transformed variables
educ = educ %>%
  mutate(
    Lpeer = log(peer),
    Lfunded_research_per_faculty = log(funded_research_per_faculty),
    Lphd_granted_per_faculty = log(phd_granted_per_faculty + 1),
    Ldoc_accept = log(doc_accept),
    Lphd_student_faculty_ratio = log(phd_student_faculty_ratio + 1),
    Lenroll = log(enroll)
    )

# Fit candidate models
lm.1 = lm(Lpeer ~ 1 + gre_quant + I(gre_quant^2) + I(gre_quant^3), data = educ)
lm.2 = lm(Lpeer ~ 1 + Lfunded_research_per_faculty + I(Lfunded_research_per_faculty^2) + Lphd_granted_per_faculty  + I(Lphd_granted_per_faculty^2), data = educ)
lm.3 = lm(Lpeer ~ 1 + Ldoc_accept + Lenroll + Lphd_student_faculty_ratio, data = educ)
lm.4 = lm(Lpeer ~ 1 + Ldoc_accept + Lphd_student_faculty_ratio, data = educ)

Based on the AIC values for the four candidate models we ranked the hypotheses based on the amount of empirical support:

Table 7.1: Working Hypotheses Rank Ordered by the Amount of Empirical Support as Measured by the AIC
Hypothesis	AIC
Institution-related factors (Reduced)	-193.7
Institution-related factors (Full)	-192.8
Faculty-related factors	-184.5
Student-related factors	-181.8

7.2 Corrected AIC (AICc): Adjusting for Model Complexity and Sample Size

Although AIC has a penalty correction that should account for model complexity, it turns out that when the number of parameters is large relative to the sample size, AIC is still biased in favor of models that have more parameters. This led Hurvich & Tsai (1989) to propose a second-order bias corrected AIC measure (AICc) computed as

$\mathrm{AIC_c} = \mathrm{Deviance} + 2(k)\left( \frac{n}{n - k - 1} \right)$

where $k$ is, again, the number of estimated parameters, and $n$ is the sample size used to fit the model. Note that when $n$ is very large (especially relative to $k$ ) that the last term is essentially 1 and the AICc value would basically reduce to the AIC value. When $n$ is small relative to $k$ this will add more of a penalty to the deviance. The recommendation is to pretty much always use AICc rather than AIC when selecting models.

Below, we will compute the AICc for the first candidate model. (Note that we use $n=122$ cases for the computation for all the models in this data.)

n = 122
k = 5

# Compute AICc for Model 1
-2 * logLik(lm.1)[[1]] + 2 * k * n / (n - k - 1) #Model 1

[1] -181.2

In practice, we will use the AICc() function from the AICcmodavg package to compute the AICc value directly.

AICc(lm.1)

numeric(0)

AICc(lm.2)

numeric(0)

AICc(lm.3)

numeric(0)

AICc(lm.4)

numeric(0)

Based on the $\mathrm{AIC_c}$ values, the model with the most empirical support given the data and four candidate models is Model 4. Again, because the models are proxies for the scientific hypotheses, we can rank order the scientific hypotheses based on the empirical support for each.

Table 7.2: Working Hypotheses Rank Ordered by the Amount of Empirical Support
Hypothesis	AICc
Institution-related factors (Reduced)	-193.3
Institution-related factors (Full)	-192.3
Faculty-related factors	-183.8
Student-related factors	-181.2

Moving forward, we will use the full institution-related factors model to represent the institution-related factors hypothesis.

7.3 Model-Selection Uncertainty

When we adopt one model over another, we are introducing some degree of selection uncertainty into the scientific process. It would be nice if we can quantify and report this uncertainty, and this is the real advantage of using information criteria for model selection; it allows us to quantify the uncertainty we have when we select any particular candidate model.

The amount of model selection uncertainty we have depends on the amount of empirical support each of the candidate models has. For example, if one particular candidate model has a lot of empirical support and the rest have very little empirical support we would have less model uncertainty than if all of the candidate models had about the same amount of empirical support.

Since we measure the empirical support each hypothesis has by computing the AICc for the associated candidate model, we can look at how much more empirical support the most supported hypothesis has relative to each of the other working hypotheses by computing the difference in AICc values between the best fitting model and each of the other candidate models. This measure is referred to as # $\Delta$ AICc.

In our example, the hypothesis with the most empirical support was the institution-related factors model as measured in Model 4.

# Compute delta values
AICc(lm.1) - AICc(lm.4) #Student-related factors

numeric(0)

AICc(lm.2) - AICc(lm.4) #Faculty-related factors

numeric(0)

AICc(lm.4) - AICc(lm.4) #Institution-related factors

numeric(0)

Table 7.3: Working Hypotheses Rank Ordered by the Amount of Empirical Support
Hypothesis	AICc	$\Delta$ AICc
Institution-related factors (Reduced)	-193.3	0.0
Faculty-related factors	-183.8	9.6
Student-related factors	-181.2	12.1

Burnham, Anderson, & Huyvaert (2011, p. 25) give rough guidelines for interpreting $\Delta$ AICc values. They suggest that hypotheses with $\Delta$ AICc values less than 2 are plausible, those in the range of 4–7 have some empirical support, those in the range of 9–11 have relatively little support, and those greater than 13 have essentially no empirical support. Using these criteria:

The institution-related factors hypothesis (Model 4) has a lot of empirical support.
The student-related factor hypothesis (Model 1) and faculty-related factor hypothesis (Model 2) both have little empirical support relative to the institution-related factors hypothesis.

7.4 Relative Likelihood and Evidence Ratios

Onw way we mathematically formalize the strength of evidence for each model is to compute the relative likelihood. The relative likelihood provides the likelihood of each of the candidate models, given the set of candidate models and the data. To compute the relative likelihood,

$\mathrm{Relative~Likelihood} = e ^ {−\frac{1}{2} (\Delta AICc)}$

exp(-1/2 * 12.09) #Student-related factors

[1] 0.00237

exp(-1/2 *  9.56) #Faculty-related factors

[1] 0.008396

exp(-1/2 *  0.00) #Institution-related factors

[1] 1

Table 7.4: Working Hypotheses Rank Ordered by the Amount of Empirical Support
Hypothesis	AICc	$\Delta$ AICc	Rel. Lik.
Institution-related factors (Reduced)	-193.3	0.0	1.000
Faculty-related factors	-183.8	9.6	0.008
Student-related factors	-181.2	12.1	0.002
Note. Rel. Lik. = Relative Likelihood

These quantities allow evidentiary statements for comparing any two scientific hypotheses. For example,

The empirical support for the faculty-related factors hypothesis is 0.008 times that of the empirical support for the institution-related factors hypothesis.

To obtain these evidence ratios, we divide the relative likelihood for any two hypotheses. This will quantify how much more likely one hypothesis is than another given the data. As another example,

The empirical support for the institution-related factors hypothesis is 500 times that of the empirical support for the student-related factors hypothesis. (To obtain this we computed $1/.002=500$ .)

7.5 Model Probabilities

Also referred to as an Akaike Weight ( $w_i$ ), a model probability provides a numerical measure of the probability of each model given the data and the candidate set of models. It can be computed as:

$w_i = \frac{\mathrm{Relative~Likelihood~for~Model~J}}{\sum_j \mathrm{Relative~Likelihood}}$

# Compute sum of relative likelihoods
sum_rel = 1.000000000 + 0.008376636 + 0.002366380

0.002366380 / sum_rel #Student-related factors

[1] 0.002341

0.008376636 / sum_rel #Faculty-related factors

[1] 0.008288

1.000000000 / sum_rel #Institution-related factors

[1] 0.9894

Since the models are proxies for the working hypotheses, the model probabilities can be used to provide probabilities of each working hypothesis as a function of the empirical support. Given the data and the candidate set of working hypotheses:

The probability of the student-related factors hypothesis is 0.002 (very unlikely).
The probability of the faculty-related factors hypothesis is 0.008 (very unlikely).
The probability of the institution-related factors hypothesis is 0.990 (very likely).

Table 7.5: Working Hypotheses Rank Ordered by the Amount of Empirical Support
Hypothesis	AICc	$\Delta$ AICc	Rel. Lik.	AICc Weight
Institution-related factors (Reduced)	-193.3	0.0	1.000	0.989
Faculty-related factors	-183.8	9.6	0.008	0.008
Student-related factors	-181.2	12.1	0.002	0.002
Note. Rel. Lik. = Relative Likelihood

7.6 Tables of Model Evidence

We will use the aictab() function from the AICcmodavg package to compute and create a table of model evidence values directly from the lm() fitted models. This function takes a list of models in the candidate set (it actually has to be an R list). The optional argument modnames= is a vector of model names associated with the models in the candidate set.

myAIC = aictab(
  cand.set = list(lm.1, lm.2, lm.4),
  modnames = c("Student-related factors", "Faculty-related factors", "Institution-related factors (Reduced)")
  )

# View table
myAIC

Model selection based on AICc:

                                      K    AICc Delta_AICc AICcWt Cum.Wt     LL
Institution-related factors (Reduced) 4 -193.33       0.00   0.99   0.99 100.84
Faculty-related factors               6 -183.76       9.56   0.01   1.00  98.25
Student-related factors               5 -181.24      12.09   0.00   1.00  95.88

Note the output includes the number of parameters (K) and AICc value (AICc) for each candidate model, and prints them in order from the most empirical evidence to the least amount of empirical evidence based on the AICc. It also includes the $\Delta$ AICc values, the model probabilities (AICcWt), and log-likelihood (LL) values. The Cum.Wt column gives the cumulative model probabilities. (For example the probability of the first two hypotheses is $0.99 + 0.01 = 1.00$ .)

You can also directly compute the evidence ratios, but you have to do that separately. We do this using the evidence() function from the AICcmodavg package. This function takes the output from the aictab() function as well as the names from that table (given in the modnames= argument) for the two models you want to compute the evidence ratio for.

# Evidence Ratio 1
evidence(
  myAIC,
  model.high = "Institution-related factors (Reduced)",
  model.low = "Faculty-related factors"
  )


Evidence ratio between models 'Institution-related factors (Reduced)' and 'Faculty-related factors':
119.4

# Evidence Ratio 2
evidence(
  myAIC,
  model.high = "Institution-related factors (Reduced)",
  model.low = "Student-related factors"
  )


Evidence ratio between models 'Institution-related factors (Reduced)' and 'Student-related factors':
422.6

7.7 Some Final Thoughts

Based on the model evidence given the data for this candidate set of models:

The institution-related factors hypothesis has the most empirical support.
There is very little empirical support for the other two hypotheses relative to the institution-related factors hypothesis.

We can get a summary of the model rankings along with qualitative descriptors of the empirical support (weight) using the confset() function. The method="ordinal" argument rank orders the models for us.

confset(
  cand.set = list(lm.1, lm.2, lm.4),
  modnames = c("Student-related factors", "Faculty-related factors", "Institution-related factors (Reduced)"),
  method = "ordinal"
  )


Confidence set for the best model

Method:  ordinal ranking based on delta AIC

Models with substantial weight:
                                      K   AICc Delta_AICc AICcWt
Institution-related factors (Reduced) 4 -193.3          0   0.99


Models with some weight:
     K AICc Delta_AICc AICcWt


Models with little weight:
                        K   AICc Delta_AICc AICcWt
Faculty-related factors 6 -183.8       9.56   0.01


Models with no weight:
                        K   AICc Delta_AICc AICcWt
Student-related factors 5 -181.2      12.09      0

It is important to note that it is ultimately the set of scientific working hypotheses that we are evaluating, using the fit from the associated statistical models to a set of empirical data. If we had a different set of data, we may have a whole new ranking of models or interpretation of empirical support. The empirical support is linked to the data.

The amount of empirical evidence is also very much relative to the candidate set of models; a different candidate set of models may result in a different rank ordering or interpretation of empirical support. For example, consider if we had not done any exploration of the model’s functional form, but instead had just included the linear main-effects for each model.

# Fit models
lm.1 = lm(peer ~ 1 + gre_quant + gre_verbal, data = educ)
lm.2 = lm(peer ~ 1 + funded_research_per_faculty + phd_granted_per_faculty, data = educ)
lm.3 = lm(peer ~ 1 + doc_accept + enroll + phd_student_faculty_ratio, data = educ)

confset(
  cand.set = list(lm.1, lm.2, lm.3),
  modnames = c("Student-related factors", "Faculty-related factors", "Institution-related factors (Full)"),
  method = "ordinal"
  )


Confidence set for the best model

Method:  ordinal ranking based on delta AIC

Models with substantial weight:
                                   K  AICc Delta_AICc AICcWt
Institution-related factors (Full) 5 121.5        0.0   0.55
Faculty-related factors            4 121.9        0.4   0.45


Models with some weight:
     K AICc Delta_AICc AICcWt


Models with little weight:
     K AICc Delta_AICc AICcWt


Models with no weight:
                        K  AICc Delta_AICc AICcWt
Student-related factors 4 145.1      23.62      0

Based on the model evidence given the data for this candidate set of models:

The institution-related factors hypothesis still has the most empirical support.
But, the faculty-related factors hypothesis now also has substantial empirical support as well ( $\Delta\mathrm{AICc}=0.4$ ).
There is almost no empirical support for the student-related factors hypothesis relative to the other two hypotheses.

7.8 Pretty Printing Tables of Model Evidence

We can use the data.frame() function to coerce the output from the aictab() function into a data frame. Then we can use dplyr functions to select the columns in the order we want them, add the column of evidence ratios, and re-name any column we want. Lastly, we can use the kable() function from the knitr package and other functions from the kableExtra package to format the table for pretty-printing in Markdown.

# Create data frame to format into table
tab_07 = data.frame(myAIC) %>%
  select(
    Modnames, LL, K, AICc, Delta_AICc, AICcWt
  ) %>%
  mutate(
    ER = max(AICcWt) / AICcWt
  ) %>%
  rename(
    Hypothesis = Modnames,
    # We can include LaTeX math notation in column names
    # Because \ is a special character we need two \\
    '$\\Delta$AICc' = Delta_AICc,
    'AIC Wt.' = AICcWt
  )

# Load libraries for formatting
library(knitr)
library(kableExtra)

# Format the table output
kable(tab_07,
      caption = "Table of Model Evidence for Three Working Hypotheses",
      digits = 2
      ) %>%
  footnote(
    general = "LL = Log-Likelihood; K = Model df; AIC Wt. = Model Probability; ER = Evidence Ratio",
    general_title = "Note.",
    footnote_as_chunk = TRUE
  )

Table 7.6: Table of Model Evidence for Three Working Hypotheses
Hypothesis	LL	K	AICc	$\Delta$ AICc	AIC Wt.	ER
Institution-related factors (Reduced)	100.84	4	-193.3	0.00	0.99	1.0
Faculty-related factors	98.25	6	-183.8	9.56	0.01	119.4
Student-related factors	95.88	5	-181.2	12.09	0.00	422.6
Note. LL = Log-Likelihood; K = Model df; AIC Wt. = Model Probability; ER = Evidence Ratio

Other Resources

In addition to the notes and what we cover in class, there many other resources for learning information criteria for model selection. Here are some resources that may be helpful in that endeavor:

Anderson, David R. (2008). Model based inference in the life sciences: A primer on evidence. New York: Springer.
Burnham, Kenneth P., & Anderson, David R. (2002). Model selection and multimodel inference: A practical information-theoretic approach. New York: Springer.

For table formatting using R Markdown, check out:

References

Hurvich, C., & Tsai, C.-L. (1989). Regression and time series model selection in small samples, Biometrika, 76, 297–307.

Burnham, K. P., Anderson, D. R., & Huyvaert, K. P. (2011). AIC model selection and multimodel inference in behavioral ecology: Some background, observations, and comparisons. Behavioral Ecology and Sociobiology, 65(1), 23–35.