Assignment 04

INTRODUCTION TO MULTIPLE REGRESSION

Published

August 7, 2023

Human overpopulation is a growing concern and has been associated with depletion of Earth’s natural resources (water is a big one) and degredation of the environment. This, in turn, has social and economic consequences such as global tension over resources such as water and food, higher cost of living and higher unemployment rates.

Fertility rate is a measure directly linked to population and has been useful to explore hypotheses about factors related to combating overpopulation. For this assignment, you will use the file fertility.csv to examine the effects of the rate of contraception on fertility rates.

Instructions

Submit your responses to each of the questions below in a PDF document. All graphics should be resized so that they do not take up more room than necessary and also should have an appropriate caption. This assignment is worth 18 points. (Each question is worth 1 point unless otherwise noted.)


Part I: Correlations

You will compute the intercorrelations between fertility rate, contraception rate, infant mortality rate, and female education level to evaluate the following three hypotheses:

  • Hypothesis 1: Contraception rates are thought to be negatively associated with fertility rates.
  • Hypothesis 2: Contraception rate is a function of a country’s infant mortality rate. That is, countries with higher infant mortality rates will have lower contraception rates, which in turn will lead to higher fertility rates.
  • Hypothesis 3: Contraception rate is a function of educating women. That is, countries with higher levels of female education will have higher contraception rates which in turn is associated with lower fertility rates.
  1. Examine the structure and formatting of Table 4 at https://zief0002.github.io/musings/creating-tables-to-present-statistical-results.html. Mimic the format and structure of this table to create a table to present the intercorrelations between the outcome and each of the predictors, and between each set of predictors. Make sure the table you create also has an appropriate caption. (2pts.)

  2. Use the results from the intercorrelations to evaluate each of the following hypotheses. Explain by referencing your results. (3pts.)


Part II: Regression Analysis

One factor that can be manipulated through targeted intervention is contraception use. You will fit three regression models to study the effect of contraceptive rates on fertility rate. These models, which stem from the earlier hypotheses about how contraception rate impacts fertility rate are:

  • Model 1: Contraceptive rate as the sole predictor of variation in fertility rate.
  • Model 2: Contraceptive rate and infant mortality rate as predictors of variation in fertility rate.
  • Model 3: Contraceptive rate, infant mortality rate, and female education level as predictors of variation in fertility rate.
  1. Examine the structure and formatting of Table 9 at https://zief0002.github.io/musings/creating-tables-to-present-statistical-results.html. Mimic the format and structure of this table to create a table to present the numerical information from the three models you fitted. Make sure the table you create also has an appropriate caption. If the table is too wide, change the page orientation in your word processing program to “Landscape”, rather than changing the size of the font. (2pts.)

  2. Based on the results of fitting the three models, describe the size and direction of the effect of contraceptive rate on fertility rate and how it changes (or doesn’t) across the three models. Do this by referring to the estimated coefficient and the uncertainty in the parameter estimates. Note: Do not base your response on a p-value. (2pts.)


Model 3: Interpreting Results from the “Final” Model

The research literature suggests that all three predictors are substantively important in predicting variation in fertility rates. Moreover, from the CIs produced in Model 3, we see that all three effects included in the model are statistically relevant (none of the CIs span 0). Because of this, we will adopt Model 3 as our “final” model. Use the results from Model 3 to answer the remainder of the questions on this assignment.

  1. Report the regression equation from fitting Model 3. Use Equation Editor (or some other program that correctly types mathematical expressions) to typeset the equation correctly.

  2. Using output from the ANOVA table, compute and report the value for the model \(R^2\). Show your work for full credit.

  3. Interpret the value of the model \(R^2\) using the context of the data.

  4. Using symbols, write the omnibus null hypothesis that is tested by the model-level F-statistic in this analysis in two different manners: (1) using the coefficient parameters used in the regression model, and (2) using the variance accounted for parameter.

  5. Based on the results of the model-level F-test, does the model seem to explain variation in fertility rates? Explain. Note: Here you can use the p-value as evidence, but do not compare this to 0.05.

  6. Interpret the estimated coefficient value associated with the partial effect of contraception.

  7. Based on the 95% confidence interval for the partial effect of contraception on fertility rate, which parameter values are reasonably compatible with the empirical data? Explain what this implies about the magnitude of the partial effect.

  8. Create a publication quality plot that displays the results from Model 3. For this plot, put the contraception predictor on the x-axis. Control out the effect of education level by setting this to the mean level of education. Display two separate lines to show the effect of infant mortality rate; a small and large rate based on the data. The two lines should be displayed using different linetypes or colors (or both) so that they can be easily differentiated in the plot. Be sure that the figure is appropriately captioned. (2pts.)