Information Criteria and Model Selection

Published

July 22, 2024

Preparation

In this set of notes, you will use information theoretic approaches (e.g., information criteria) to select one (or more) empirically supported model from a set of candidate models. the usnwr-2024.csv dataset (see the data codebook) to fit a set of models that explain variation in peer ratings of graduate schools of education.

# Load libraries
library(AICcmodavg)
library(broom)
library(educate)
library(patchwork)
library(tidyverse)
library(tidyr)

# Import data
usnwr = read_csv(file = "https://raw.githubusercontent.com/zief0002/benevolent-anteater/main/data/usnwr-2024.csv") |>
  drop_na()

# View data
usnwr

Working Hypotheses and Candidate Models

In a prior sets of notes, we have considered several models that explain variation in institutional prestige:

\[ \begin{split} \mathbf{Model~0:~} \quad \mathrm{Peer~Rating}_i = &\beta_0 + \epsilon_i \\[2ex] \mathbf{Model~1:~} \quad \mathrm{Peer~Rating}_i = &\beta_0 + \beta_1(\mathrm{Enroll}_i) + \beta_2(\mathrm{FT~Students}_i) + \beta_3(\mathrm{FT~Faculty}_i) + \beta_4(\mathrm{Tuition}_i) + \epsilon_i \\[2ex] \mathbf{Model~2:~} \quad \mathrm{Peer~Rating}_i = &\beta_0 + \beta_1(\mathrm{Enroll}_i) + \beta_2(\mathrm{FT~Students}_i) + \beta_3(\mathrm{FT~Faculty}_i) + \beta_4(\mathrm{Tuition}_i) + \\ &\beta_5(\mathrm{SF~Ratio}_i) + \beta_6(\mathrm{Percent~Doctoral~Students}_i) + \beta_7(\mathrm{UG~GPA}_i) +\\ &\beta_8(\mathrm{GRE}_i) + \beta_{9}(\mathrm{PhD~Acceptance~Rate}_i) +\epsilon_i \\[2ex] \mathbf{Model~3:~} \quad \mathrm{Peer~Rating}_i = &\beta_0 + \beta_1(\mathrm{Enroll}_i) + \beta_2(\mathrm{FT~Students}_i) + \beta_3(\mathrm{FT~Faculty}_i) + \beta_4(\mathrm{Tuition}_i) + \\ &\beta_5(\mathrm{SF~Ratio}_i) + \beta_6(\mathrm{Percent~Doctoral~Students}_i) + \beta_7(\mathrm{UG~GPA}_i) +\\ &\beta_8(\mathrm{GRE}_i) + \beta_9(\mathrm{PhD~Acceptance~Rate}_i) + \\ &\beta_{10}(\mathrm{Total~Publications}_i) + \beta_{11}(\mathrm{Total~Research}_i) + \epsilon_i \end{split} \]

where all four models have \(\epsilon_i\overset{i.i.d.}{\sim}\mathcal{N}(0,\sigma^2_{\epsilon})\).

# Fit candidate models
lm.0 = lm(peer_rating ~ 1, data = usnwr)
lm.1 = lm(peer_rating ~ 1 + enroll + ft_students + ft_fac + nonres_tuition, data = usnwr)
lm.2 = lm(peer_rating ~ 1 + enroll + ft_students + ft_fac + nonres_tuition + 
            sf_ratio + pct_doc + ug_gpa + gre + doc_acc, data = usnwr)
lm.3 = lm(peer_rating ~ 1 + enroll + ft_students + ft_fac + nonres_tuition + 
            sf_ratio + pct_doc + ug_gpa + gre + doc_acc + 
            tot_pubs + tot_res, data = usnwr)

Each of these models correspond to a different scientific working hypothesis about what explains institutional prestige:

H0: Institutional prestige is not a function of anything other than the individual institution.
H1: Institutional prestige is a function of the institution’s characteristics.
H2: Institutional prestige is a function of the institution’s characteristics, and student- and faculty-resources.
H3: Institutional prestige is a function of the institution’s characteristics, student- and faculty-resources, and student- and faculty-outcomes.

These working hypotheses are typically created from the theory and previous empirical work in a substantive area. They need to be translated into a statistical models, which can be quite difficult, especially if there is not a lot of theory to guide this translation.

One method for choosing among a set of candidate models is to pick based on the residuals. In this method, we adopt the model for which the data best meet the assumptions, and employing the principle of parsimony—adopting the more parsimonious model when there are multiple models that produce equally “good” residuals. Additionally, we can use the likelihood ratio test (LRT) to help make this decision. This method provides additional empirical evidence about which model is supported when the candidate models are nested.

Unfortunately, we cannot always use the LRT to select models, since there are many times when the set of candidate models are not all nested. Information criteria give us metrics to compare the empirical support for models whether they are nested or not. In the remainder of these notes, we will examine several of the more popular information criteria metrics used for model selection.

Akiake’s Information Criteria (AIC)

The AIC metric is calculated by adding a penalty term to the model’s deviance (\(-2\ln(\mathrm{Likelihood})\)).

\[ \begin{split} AIC &= \mathrm{Deviance} + 2k \\[1ex] &= -2\ln(\mathcal{L}) + 2k \end{split} \]

where k is the number of parameters (including the residual standard error) being estimated in the model. (Recall that the value for k is given as df in the logLik() output.)

Remember that deviance is similar to error (it measures model-data misfit), and so models that have lower values of deviance are more empirically supported. The problem with deviance, however, is that more complex models tend to have smaller deviance values and are, therefore, more supported than their simpler counterparts. Unfortunately, these complex models tend not to generalize as well as simpler models (this is called overfitting). The AIC metric’s penalty term penalizes the deviance more when a more complex model is fitted; it is offsetting the lower degree of model-data misfit in complex models by increasing the ‘misfit’ based on the number of parameters.

This penalty-adjusted measure of ‘misfit’ is called Akiake’s Information Criteria (AIC). We compute the AIC for each of the candidate models and the model with the lowest AIC is selected. This model, we say, is the candidate model with the most empirical support. Below we compute the AIC for the first candidate model.

# Compute AIC for Model 1
# logLik(lm.1)
-2*-45.44917 + 2*6

[1] 102.8983

We could also use the AIC() function to compute the AIC value directly. Here we compute the AIC for each of the candidate models.

# Model 0
AIC(lm.0)

[1] 120.3347

# Model 1
AIC(lm.1)

[1] 102.8983

# Model 2
AIC(lm.2)

[1] 67.80275

# Model 3
AIC(lm.3)

[1] 0.6810194

Based on the AIC values, the candidate model with the most empirical evidence is Model 3; it has the lowest AIC.

Lastly, we note that the AIC value is produced as a column in the model-level output from the glance() function. (Note that the df column from glance() does NOT give the number of model parameters.)

# Model-level output for H3
glance(lm.3)

Empirical Support is for the Working Hypotheses

Because the models are proxies for the scientific working hypotheses, the AIC ends up being a measure of empirical support for any particular working hypothesis. Using the AIC, we can rank order the models (and subsequently the working hypotheses) based on their level of empirical support. Ranked in order of empirical support, the four scientific working hypotheses are:

H3: Institutional prestige is a function of the institution’s characteristics, student- and faculty-resources, and student- and faculty-outcomes.
H2: Institutional prestige is a function of the institution’s characteristics, and student- and faculty-resources.
H1: Institutional prestige is a function of the institution’s characteristics.
H0: Institutional prestige is not a function of anything other than the individual institution.

It is important to remember that the phrase given the data and other candidate models is highly important. The ranking of models/working hypotheses is a relative ranking of the models’ level of empirical support contingent on the candidate models we included in the comparison and the data we used to compute the AIC.

As such, this method is not able to rank any hypotheses that you did not consider as part of the candidate set of scientific working hypotheses. Moreover, the AIC is a direct function of the likelihood which is based on the actual model fitted as a proxy for the scientific working hypothesis. If the predictors used in any of the models had been different, it would lead to different likelihood and AIC values, and potentially a different rank ordering of the hypotheses.

The ranking of models is also based on the data we have. If we had different ranges of the data or a different set of variables, the evidence might support a different model. This is very important. Model 3 is the most empirically supported candidate model GIVEN the four candidate models we compared and the data we used to compute the AIC metric.

The model selection and ranking is contingent on both the set of candidate models you are evaluating, and the data you have.

Based on the AIC values for the four candidate models we ranked the hypotheses based on the amount of empirical support:

Code

cand_models = data.frame(
  Model = c(
    "Model 3",
    "Model 2",
    "Model 1",
    "Model 0"
    ),
  k = c(13, 11, 6, 2),
  AIC = c(AIC(lm.3), AIC(lm.2), AIC(lm.1), AIC(lm.0))
) 

cand_models |>
  gt() |>
  cols_align(
    columns = c(Model),
    align = "left"
  ) |>
  cols_align(
    columns = c(k, AIC),
    align = "center"
  ) |>
  cols_label(
    Model = md("*Model*"),
    k = md("*k*"),
    AIC = md("*AIC*")
  ) |>
  tab_options(
    table.width = pct(40)
  )

Table 1: Models rank-ordered by the amount of empirical support as measured by the AIC.

Model	k	AIC
Model 3	13	0.6810194
Model 2	11	67.8027454
Model 1	6	102.8983425
Model 0	2	120.3346639

Corrected AIC (AICc): Adjusting for Model Complexity and Sample Size

Although AIC has a penalty correction that should account for model complexity, it turns out that when the number of parameters is large relative to the sample size, AIC is still biased in favor of models that have more parameters. This led Hurvich & Tsai (1989) to propose a second-order bias corrected AIC measure (AICc) computed as:

\[ \mathrm{AIC_c} = \mathrm{Deviance} + 2k\left( \frac{n}{n - k - 1} \right) \]

where k is, again, the number of estimated parameters (including residual standard error), and n is the sample size used to fit the model. Note that when n is very large (especially relative to k) that the last term is essentially 1 and the AICc value would basically reduce to the AIC value. When n is small relative to k this will add more of a penalty to the deviance.

The statistical recommendation is to pretty much always use AICc rather than AIC when selecting models.

Below, we will compute the AICc for the Model 3. (Note that we use \(n=83\) cases for the computation for all the models in this data.)

n = 83
k = 13

# Compute AICc for linear hypothesis
-2 * logLik(lm.3)[[1]] + 2 * k * n / (n - k - 1)

[1] 5.956382

In practice, we will use the AICc() function from the {AICcmodavg} package to compute the AICc value directly. Here we compute the AICc for the four candidate models.

# Model 0
AICc(lm.0)

[1] 120.4847

# Model 1
AICc(lm.1)

[1] 104.0036

# Model 2
AICc(lm.2)

[1] 71.52106

# Model 3
AICc(lm.3)

[1] 5.956382

Based on the AICc values, the model with the most empirical support given the data and four candidate models is, again, Model 3. Using these results, we can, again, rank order the models (which correspond to working hypotheses) based on the empirical support for each.

Code

cand_models = data.frame(
  Model = c(
    "Model 3",
    "Model 2",
    "Model 1",
    "Model 0"
    ),
  k = c(13, 11, 6, 2),
  AIC = c(AIC(lm.3), AIC(lm.2), AIC(lm.1), AIC(lm.0)),
  AICc = c(AICc(lm.3), AICc(lm.2), AICc(lm.1), AICc(lm.0))
) 


cand_models |>
  gt() |>
  cols_align(
    columns = c(Model),
    align = "left"
  ) |>
  cols_align(
    columns = c(k, AICc),
    align = "center"
  ) |>
  cols_label(
    Model = md("*Model*"),
    k = md("*k*"),
    AICc = md("*AICc*")
  ) |>
  tab_options(
    table.width = pct(40)
  )

Table 2: Models rank-ordered by the amount of empirical support as measured by the AICc.

Model	k	AIC	AICc
Model 3	13	0.6810194	5.956382
Model 2	11	67.8027454	71.521055
Model 1	6	102.8983425	104.003606
Model 0	2	120.3346639	120.484664

Quantifying Model-Selection Uncertainty

When we adopt one model over another, we are introducing some degree of uncertainty into the scientific process, in this case model selection uncertainty. It would be nice if we can quantify and report this uncertainty, and this is the real advantage of using information criteria for model selection; it allows us to quantify the uncertainty we have when we select any particular candidate model.

The amount of model selection uncertainty we have depends on the amount of empirical support each of the candidate models has. For example, if one particular candidate model has a lot of empirical support and the rest have very little empirical support we would have less model selection uncertainty than if all of the candidate models had about the same amount of empirical support.

Since we quantify the empirical support for each model/working hypothesis by computing the AICc, we can also quantify how much more empirical support the most supported hypothesis has relative to each of the other working hypotheses by computing the difference in AICc values between the model with the most empirical support and each of the other candidate models. This measure is referred to as \(\Delta\)AICc.

In our example, the hypothesis with the most empirical support was Model 2. The \(\Delta\)AICc values can then be computed for each candidate model by subtracting the AICc for Model 2 from the AICc for that candidate model.

# Compute delta AICc value for Model 0
AICc(lm.0) - AICc(lm.3)

[1] 114.5283

# Compute delta AICc value for Model 1
AICc(lm.1) - AICc(lm.3)

[1] 98.04722

# Compute delta AICc value for Model 2
AICc(lm.2) - AICc(lm.3)

[1] 65.56467

# Compute delta AICc value for Model 3
AICc(lm.3) - AICc(lm.3)

[1] 0

Code

cand_models = cand_models |>
  mutate(
    delta_a = AICc - AICc(lm.3)
  ) 

cand_models |>
  gt() |>
  cols_align(
    columns = c(Model),
    align = "left"
  ) |>
  cols_align(
    columns = c(k, AICc, delta_a),
    align = "center"
  ) |>
  cols_label(
    Model = md("*Model*"),
    k = md("*k*"),
    AICc = md("*AICc*"),
    delta_a = html("&#916;AICc")
  ) |>
  tab_options(
    table.width = pct(50)
  )

Table 3: Models rank-ordered by the amount of empirical support as measured by the AICc. Other evidence includes the ΔAICc for each model.

Model	k	AIC	AICc	ΔAICc
Model 3	13	0.6810194	5.956382	0.00000
Model 2	11	67.8027454	71.521055	65.56467
Model 1	6	102.8983425	104.003606	98.04722
Model 0	2	120.3346639	120.484664	114.52828

Burnham, Anderson, & Huyvaert (2011, p. 25) give rough guidelines for interpreting \(\Delta\)AICc values. They suggest that hypotheses with \(\Delta\)AICc values less than 2 are plausible, those in the range of 4–7 have some empirical support, those in the range of 9–11 have relatively little support, and those greater than 13 have essentially no empirical support. Using these criteria:

Model 3 is plausible.
Model 0, 1, and 2 have virtually no empirical support.

This implies that we would have a fair amount of certainty actually selecting Model 3 over the other three models. If multiple models are supported by the empirical evidence it suggests that more than one model would need to be adopted since the empirical evidence wouldn’t really differentiate which model is “better”.

Relative Likelihood and Evidence Ratios

One way we can mathematically formalize the strength of evidence for each model is to compute the relative likelihood. The relative likelihood provides the likelihood of each of the candidate models, given the set of candidate models and the data. To compute the relative likelihood, we use:

\[ \mathrm{Relative~Likelihood} = e ^ {−\frac{1}{2} (\Delta AICc)} \]

# Relative likelihood for Model 0
exp(-1/2 * 114.5283)

[1] 1.350503e-25

# Relative likelihood for Model 1
exp(-1/2 * 98.04722)

[1] 5.120551e-22

# Relative likelihood for Model 2
exp(-1/2 * 65.56467)

[1] 5.79179e-15

# Relative likelihood for Model 3
exp(-1/2 * 0)

[1] 1

Code

cand_models = cand_models |>
  mutate(
    rel_lik = exp(-1/2 * delta_a)
  ) 

cand_models |>
  gt() |>
  cols_align(
    columns = c(Model),
    align = "left"
  ) |>
  cols_align(
    columns = c(k, AICc, delta_a, rel_lik),
    align = "center"
  ) |>
  cols_label(
    Model = md("*Model*"),
    k = md("*k*"),
    AICc = md("*AICc*"),
    delta_a = html("&#916;AICc"),
    rel_lik = html("Rel(&#8466;)")
  ) |>
  tab_options(
    table.width = pct(50)
  ) |>
  tab_footnote(
    footnote = html("Rel(&#8466;) = Relative likelihood"),
    locations = cells_column_labels(columns = rel_lik)
  )

Table 4: Models rank-ordered by the amount of empirical support as measured by the AICc. Other evidence includes the ΔAICc, and relative likelihood for each model.

Model	k	AIC	AICc	ΔAICc	Rel(ℒ)¹
Model 3	13	0.6810194	5.956382	0.00000	1.000000e+00
Model 2	11	67.8027454	71.521055	65.56467	5.791780e-15
Model 1	6	102.8983425	104.003606	98.04722	5.120541e-22
Model 0	2	120.3346639	120.484664	114.52828	1.350515e-25
¹ Rel(ℒ) = Relative likelihood

The relative likelihood values allow us to compute evidence ratios, which are evidentiary statements for comparing any two model or scientific hypotheses. Evidence ratios quantify how much more empirical support one model/hypothesis has versus another. To obtain an evidence ratio, we divide the relative likelihood for any two models/hypotheses. As an example, the evidence ratio comparing Model 3 to Model 0 is computed as:

# ER comparing Model 3 to Model 0
1 / 1.350503e-25

[1] 7.404648e+24

That is,

The empirical support for Model 3 is \(7.40 \times 10^{24}\) times that of the empirical support for Model 0.

We can similarly compute the evidence ratio to make an evidentiary statement comparing the empirical support for any two models. For example to compare the empirical support for Model 2 versus Model 1:

# ER comparing Model 2 to Model 1
5.79179e-15 / 5.120551e-22

[1] 11310873

The empirical support for Model 2 is 11,310,873 times that of the empirical support for Model 1.

Model Probabilities

Also referred to as an Akaike Weight (\(w_i\)), a model probability provides a numerical measure of the probability of each model given the data and the candidate set of models. It can be computed as:

\[ w_i = \frac{\mathrm{Relative~Likelihood~for~Model~J}}{\sum_j \mathrm{All~Relative~Likelihoods}} \]

Here we compute the model probability for all four candidate models.

# Compute sum of relative likelihoods
sum_rel = 1.350503e-25 + 5.120551e-22 + 5.79179e-15 + 1

# Model probability for Model 0
1.350503e-25 / sum_rel

[1] 1.350503e-25

# Model probability for Model 1
5.120551e-22 / sum_rel

[1] 5.120551e-22

# Model probability for Model 2
5.79179e-15 / sum_rel

[1] 5.79179e-15

# Model probability for Model 3
1 / sum_rel

[1] 1

We also add these to our table of model evidence.

Code

cand_models = cand_models |>
  mutate(
    model_prob = rel_lik / sum(rel_lik)
  ) 

cand_models |>
  gt() |>
  cols_align(
    columns = c(Model),
    align = "left"
  ) |>
  cols_align(
    columns = c(k, AICc, delta_a, rel_lik, model_prob),
    align = "center"
  ) |>
  cols_label(
    Model = md("*Model*"),
    k = md("*k*"),
    AICc = md("*AICc*"),
    delta_a = html("&#916;AICc"),
    rel_lik = html("Rel(&#8466;)"),
    model_prob = md("*AICc Wt.*")
  ) |>
  tab_options(
    table.width = pct(50)
  ) |>
  tab_footnote(
    footnote = html("Rel(&#8466;) = Relative likelihood"),
    locations = cells_column_labels(columns = rel_lik)
  )

Table 5: Models rank-ordered by the amount of empirical support as measured by the AICc. Other evidence includes the ΔAICc, relative likelihood, and model probability (AICc weight) for each model.

Model	k	AIC	AICc	ΔAICc	Rel(ℒ)¹	AICc Wt.
Model 3	13	0.6810194	5.956382	0.00000	1.000000e+00	1.000000e+00
Model 2	11	67.8027454	71.521055	65.56467	5.791780e-15	5.791780e-15
Model 1	6	102.8983425	104.003606	98.04722	5.120541e-22	5.120541e-22
Model 0	2	120.3346639	120.484664	114.52828	1.350515e-25	1.350515e-25
¹ Rel(ℒ) = Relative likelihood

Since the models are proxies for the working hypotheses, the model probabilities can be used to provide probabilities of each working hypothesis as a function of the empirical support. Given the data and the candidate set of working hypotheses:

The probability of the zeroth working scientific hypothesis is 0.00000000000000000000000013.
The probability of the first working scientific hypothesis is 0.00000000000000000000051.
The probability of the second working scientific hypothesis is 0.0000000000000058.
The probability of the third working scientific hypothesis is essential 1.

This suggests that the third working scientific hypothesis is the most probable. There is essentially no support for any of the other scientific hypotheses; it has some non-negligible probability.

Some Final Thoughts

Based on the model evidence given the data for this candidate set of models:

The third working scientific hypothesis has the most empirical support.
The other working scintific hypotheses have virtually no empirical support.

Remember, we are computing all of this evidence to select from among the candidate models. Here the empirical evidence is clearly supporting the hypothesis associated with Model 3.

Recall that the information criteria are a function of the log-likelihood. Log-likelihood values, and thus information criteria, from different models can be compared, so long as:

The exact same data is used to fit the models;
The exact same outcome is used to fit the models; and
The assumptions underlying the likelihood (independence, distributional assumptions) are met.

In all four models we are using the same data set and outcome. However, the assumptions only seem reasonably tenable for Models 0 and 2¹. That means that we should not really include Model 1 nor 3 in our candidate set of models/working hypotheses. Changing the set of candidate models/working hypotheses we produce the following table of model evidence.

Code

cand_models_2 = data.frame(
  Model = c("Model 0", "Model 2"),
  k = c(5, 3),
  AICc = c(AICc(lm.0), AICc(lm.2))
  ) |>
  mutate(
    delta_a = AICc - AICc(lm.2),
    rel_lik = exp(-1/2 * delta_a),
    model_prob = rel_lik / sum(rel_lik)
  ) 
  
  
cand_models_2 |>
  gt() |>
  cols_align(
    columns = c(Model),
    align = "left"
  ) |>
  cols_align(
    columns = c(k, AICc, delta_a, rel_lik, model_prob),
    align = "center"
  ) |>
  cols_label(
    Model = md("*Model*"),
    k = md("*k*"),
    AICc = md("*AICc*"),
    delta_a = html("&#916;AICc"),
    rel_lik = html("Rel(&#8466;)"),
    model_prob = md("*AICc Wt.*")
  ) |>
  tab_options(
    table.width = pct(50)
  ) |>
  tab_footnote(
    footnote = html("Rel(&#8466;) = Relative likelihood"),
    locations = cells_column_labels(columns = rel_lik)
  )

Table 6: Models rank-ordered by the amount of empirical support as measured by the AICc after removing Model 0 and Model 3 from the candidate set. Other evidence includes the ΔAICc, relative likelihood, and model probability (AICc weight) for each model.

Model	k	AICc	ΔAICc	Rel(ℒ)¹	AICc Wt.
Model 0	5	120.48466	48.96361	2.33178e-11	2.33178e-11
Model 2	3	71.52106	0.00000	1.00000e+00	1.00000e+00
¹ Rel(ℒ) = Relative likelihood

After removing models from the candidate set (especially the formerly “best” model), many of the metrics changed. Changing the predictors in the candidate models, or adding other models to the candidate set can impact the amount of empirical support and the rank-ordering of hypotheses. If we had a different set of data, we may also have a whole new ranking of models or interpretation of empirical support. The empirical support is linked to the data. The empirical support is very much relative to the candidate set of models and the data we have.

Lastly, it is important to note that although information criteria can tell you about the empirical support among a candidate set of models, it cannot say whether that is actually a “good” model. For that you need to look at the assumptions and other measures (e.g., \(R^2\)). You still need to do all of the work associated with model-building (e.g., selecting predictors from the substantive literature, exploring functional forms to meet the assumptions).

Statistical Inference and Information Criteria

Finally, it is important to mention that philosophically, information-criteria and statistical inference are two very different ways of evaluating statistical evidence. When we use statistical inference for variable selection, the evidence, the p-values, is a measure of how rare an observed statistic (e.g., \(\hat\beta_k\), \(t\)-value) is under the null hypothesis. The AIC, on the other hand, is a measure of the model-data compatibility accounting for the complexity of the model.

In general, the use of p-values is not compatible with the use of information criteria-based model selection methods; see Anderson (2008) for more detail. Because of this, it is typical to not even report p-values when using information criteria for model selection. We should, however, reprt the standard errors or confidence intervals for the coefficient estimates, especially for any “best” model(s). This gives information about the statistical uncertainty that arises because of sampling error.

It is important that you decide how you will be evaluating evidence and making decisions about variable and model selection prior to actually examining the data. Mixing and matching is not cool!

Using R to Create a Table of Model Evidence

We will use the aictab() function from the {AICcmodavg} package to compute and create a table of model evidence values directly from the lm() fitted models. This function takes a list of models in the candidate set (it actually has to be an R list). The argument modnames= is an optional argument to provide model names that will be used in the output.

#Create table of model evidence
model_evidence = aictab(
  cand.set = list(lm.0, lm.1, lm.2, lm.3),
  modnames = c("Model 0", "Model 1", "Model 2", "Model 3")
  )

# View output
model_evidence

The model evidence provided for each model includes the number of parameters (K), AICc value (AICc), \(\Delta\)AICc value (Delta_AICc), the Akiake weight/model probability (AICcWt), cumulative weight (Cum.Wt), and log-likelihood (LL). The output also rank orders the models based on the AICc criterion. The models are printed in order from the model with the most empirical evidence (Model 3) to the working hypothesis with the least amount of empirical evidence (Model 0) based on the AICc.

Pretty Printing Tables of Model Evidence for Quarto Documents

We can format the output from aictab() to be used in the gt() function. Because there are multiple classes associated with the output from the aictab() function, we first pipe model_evidence object into the data.frame() function. Viewing this, we see that the data frame, also includes an additional column that gives the relative likelihoods (ModelLik).

# Create data frame to format into table
tab_01 = model_evidence |>
  data.frame()

# View table
tab_01

Then we can use the select() function to drop the LL and Cum.Wt columns from the data frame. The log-likelihood is redundant to the information in the AICc column, since AICc is a function of log-likelihood and the other information in the table. The cumulative weight can also easily be computed from the information in the AICcWt column.

# Drop columns
tab_01 = tab_01 |>
  select(-LL, -Cum.Wt)

# View table
tab_01

We can then pipe the tab_01 data frame into the gt() function to format the table for pretty-printing in Quarto. For some column labels, I use the html() function in order to use HTML symbols to create the Greek letter Delta and the scripted “L”.

# Create knitted table
tab_01 |>
  gt() |>
  cols_align(
    columns = c(Modnames),
    align = "left"
  ) |>
  cols_align(
    columns = c(K, AICc, Delta_AICc, ModelLik, AICcWt),
    align = "center"
  ) |>
  cols_label(
    Modnames = md("*Model*"),
    K = md("*k*"),
    AICc = md("*AICc*"),
    Delta_AICc = html("&#916;AICc"),
    ModelLik = html("Rel(&#8466;)"),
    AICcWt = md("*AICc Wt.*")
  ) |>
  tab_options(
    table.width = pct(50)
  ) |>
  tab_footnote(
    footnote = html("Rel(&#8466;) = Relative likelihood"),
    locations = cells_column_labels(columns = ModelLik)
  )

Table 7: Models rank-ordered by the amount of empirical support as measured by the AICc after removing Model 1 from the candidate set. Other evidence includes the ΔAICc, relative likelihood, and model probability (AICc weight) for each model.

Model	k	AICc	ΔAICc	Rel(ℒ)¹	AICc Wt.
Model 3	13	5.956382	0.00000	1.000000e+00	1.000000e+00
Model 2	11	71.521055	65.56467	5.791780e-15	5.791780e-15
Model 1	6	104.003606	98.04722	5.120541e-22	5.120541e-22
Model 0	2	120.484664	114.52828	1.350515e-25	1.350515e-25
¹ Rel(ℒ) = Relative likelihood

References

Anderson, D. R. (2008). Model based inference in the life sciences: A primer on evidence. New York: Springer.

Burnham, K. P., Anderson, D. R., & Huyvaert, K. P. (2011). AIC model selection and multimodel inference in behavioral ecology: Some background, observations, and comparisons. Behavioral Ecology and Sociobiology, 65(1), 23–35.

Hurvich, C. M., & Tsai, C.-L. (1989). Regression and time series model selection in small samples,. Biometrika, 76, 297–307.

Footnotes

Analyses not shown.↩︎