Assignment 06

SIMPLE LINEAR REGRESSION

This assignment is worth 20 points. Each question is worth 1 point unless otherwise noted.

Copyright EPSY 5261, 2023
Published

July 24, 2024

Instructions

You will use the nhanes.csv data set to answer questions in this lab. The data codebook includes a description of how the data were collected and of each of the attributes included in the data.

Use the code chunks in the assignment06.qmd file to load packages, import the data, and carry out any R computations. Submit a PDF document of your responses to the questions in this assignment.

A social scientist is interested in answering the following two research questions:

Research Question 1: Among adults in the U.S. in 2017, is the number of hours slept on a weeknight a predictor of the number of hours a person sleeps on the weekend?

You will use the sleep_weeknight and sleep_weekend attributes from the nhanes.csv data to fit a regression model that you will use to answer these research questions.

Part 1: Describing the Sample Relationship

  1. (2pts.) Create a scatterplot of the number of weeknight hours slept versus the number of weekend hours slept for the sample participants. Be sure the predictor and response attributes (based on the research question) are on the appropriate axes. Include appropriate axis labels.

  2. Based on the scatterplot, describe the relationship between these variables. Be sure to refer to all four characteristics in your description.

  3. Based on your description of the relationship, in particular, the functional form, is the correlation coefficient an appropriate numerical summary measure to quantify this relationship? Explain.

  4. Regardless of your answer to the previous question, compute and report the correlation coefficient. Note that there is some missing data in these variables, so you will need to add the argument use="complete.obs" to the cor() function.


Part 2: Simple Linear Regression Model

  1. Fit a simple linear regression model to the data that will help you answer the research question. Report the fitted equation based on the regression output in context. (There is no need to include your output here.)

  2. Interpret the value of the intercept from your fitted equation using the context of the research question.

  3. Interpret the value of the slope from your fitted equation using the context of the research question.

  4. What is the predicted number of hours slept on the weekend for the first person included in the NHANES data set? Be sure to show your work.

  5. What is the residual value for the first person included in the NHANES data set? Be sure to show your work.

  6. Report and interpret the R2 value in context.


Part 3: More Summary Values for our Model

  1. What are the null and alternative hypotheses for testing the slope coefficient?

  2. (2pts.) Conduct a hypothesis test on the slope. Based on the p-value for this test, answer the research question in context.

  3. Create and include a histogram of the response variable.

  4. Also create and include a scatterplot of the residual values versus the predicted values.

  5. (4pts.) Evaluate all four of the assumptions underlying the use of the simple regression model. For each assumption: clearly state whether the assumption is tenable or not and explain why you believe the assumption is tenable or not by referring to the evidence you computed.


How do I submit the assignment?

Create a PDF of your responses and submit the PDF via email to both the instructor and TA. Also cc any group members. Before you submit the assignment check that: