Statistical Modeling and Computation for Educational Scientists


Andrew Zieffler


Invalid Date

Front Matter

The content in this “book”, as the title suggests, is related to statistical modeling and computation. More specifically, the content focuses on using the General Linear Model (GLM) to provide statistical evidence that can help answer substantive questions in the educational and social sciences. It is a book intended for applied practitioners in the educational or social sciences. The statistical content is hopefully presented in a manner that these domain scientists will find useful, including practical suggestions for analysis and the presentation of results intended to help researchers clearly communicate the results of a data analysis.

While the content is not overly mathematical in nature, the reader will need a solid understanding of the principles in algebra for maximum benefit. The burden of calculation that typically accompanied statistical work in previous generations is now primarily carried out in a scientific computing environment. As Thisted & Velleman (1992) point out, “computational advances have changed the face of statistical practice by transforming what we do and by challenging how we think about scientific problems.” To support and help facilitate the use of scientific computing, examples using the R computer language will be used throughout this work.

The organization of content is consistent with the sequence this content is taught in EPsy 8251 and EPsy 8252, two applied statistics courses that form the foundational sequence for many graduate students in the educational and social sciences at the University of Minnesota. These courses require that students have taken a previous statistics course at either the undergraduate or graduate level. Because of that, many introductory ideas are assumed.

Content for EPsy 8251

The content for EPsy 8251 focuses on introducing ideas of statistical computation, and foundations of building, interpreting, and evaluating GLMs. In particular, this content includes OLS estimation, coefficient- and model-based inference, dummy-coded variables, and statistical interactions.

Content for EPsy 8252

The content for EPsy 8252 focuses on extending ideas of scientific computation and statistical modeling. In particular, this content focuses on dealing with some of the issues that crop up in practical work including modeling non-linearity using statistical transformations, and modeling non-independent data using mixed-effects models. Additional tools for model evaluation (e.g., information criteria) are also presented.


This book refers to and uses several data sets throughout the text. Each of these data sets and their codebooks are available online at the book’s github repository,


Many thanks to all the students in my courses who have been through previous iterations of this material. Your feedback has been invaluable, and you are the world’s greatest copyeditors. In particular, I would like to thank the following students who have gone above and beyond in the feedback they have provided: Jonathan Brown, Pablo Vivas Corrales, Amaniel Mrutu, Corissa Rohloff, and Mireya Smith.


Artwork by @allison_horst

Icon and note ideas and prototypes by Desirée De Leon.

The book is typeset using Crimson Text for the body font, Raleway for the headings and Sue Ellen Francisco for the title. The color palette was generated using

Statistical Computing


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Note: If you want to contribute to this, create a Pull Request or send me an email.) Also, feel free to offer criticism, suggestion, and feedback. You can either open an issue on the book’s github page or send me an email directly.

Thisted, R. A., & Velleman, P. F. (1992). Computers and modern statistics. In D. C. Hoaglin & D. S. Moore (Eds.), Perspectives on contemporary statistics, MAA notes no. 21 (pp. 41–53). Mathematical Association of America.