Data Codebooks
The data codebooks provide information about the attributes and source of each of the datasets used in the notes.
ed-schools-2018.csv
The data in ed-schools-2018.csv come from U.S. News and World Report (2018) and contain 13 attributes collected from the \(n=129\) graduate schools of education ranked in the 2018 Best Graduate Schools. The attributes include:
rank
: Rank in USNWRschool
: Graduate program of Educationscore
: Overall score given by USNWRpeer
: Peer assessment score (5.0 = highest)expert_score
: Administrator/expert assessment score (5.0 = highest)gre_verbal
: Mean GRE verbal score in 2016gre_quant
: Mean GRE quantitative score in 2016doc_accept
: Acceptance rate for doctoral students in 2016student_faculty_ratio
: Ratio of doctoral students to faculty members in 2016phd_granted_per_faculty
: Doctorates granted per faculty member in 2015–16funded_research
: Funded research (in millions of dollars)funded_research_per_faculty
: Funded research per faculty member (in thousands of dollars)enroll
: Total graduate education enrollment in 2016
evaluations.csv
This file contains data collected from student evaluations of instructors’ beauty and teaching quality for several courses at the University of Texas. The teaching evaluations were conducted at the end of the semester, and the beauty judgments were made later, by six students who had not attended the classes and were not aware of the course evaluations. The variables are:
prof_id
: Professor ID numberavg_eval
: Average course ratingnum_courses
: Number of courses for which the professor has evaluationsnum_students
: Number of students enrolled in the professor’s coursesperc_evaluating
: Average percentage of enrolled students who completed an evaluationbeauty
: Measure of the professor’s beauty composed of the average score on six standardized beauty ratingstenured
: Is the professor tenured? (0 = non-tenured; 1 = tenured)native_english
: Is the professor a native English speaker? (0 = non-native English speaker; 1 = native English speaker)age
: Professor’s age (in years)female
: Is the professor female? (0 = male; 1 = female)
These source of these data is: Hamermesh, D. S. & Parker, A. M. (2005). Beauty in the classroom: Instructors’ pulchritude and putative pedagogical productivity. Economics of Education Review, 24, 369–376. The data were made available by: Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press.
fci-2015.csv
Each season, Team Marketing Report (TMR) computes the cost of taking a family of four to a professional sports contest for each of the major sporting leagues. Costs are determined by telephone calls with representatives of the teams, venues and concessionaires. Identical questions were asked in all interviews. Prices for Canadian teams were converted to US dollars and comparison prices were converted using a recent exchange rate. Salary data were collected by Sporting Intelligence as part of their Global Sports Salaries Survey 2015.
The data in fci-2015.csv include five attributes collected from from the 2014/2015 season for \(n=122\) professional sports teams across the United States. The attributes include:
team
: Name of professional sports teamleague
: Major sporting league the team plays in (MLB = Major Lague Baseball; NBA = National Basketball Association; NFL = National Football League; NHL = National Hockey League)fci
: Fan Cost Index (FCI). The FCI is a summary of what it costs to take a family of four to a game. It comprises the prices of four (4) adult average-price tickets, two (2) small draft beers, four (4) small soft drinks, four (4) regular-size hot dogs, parking for one (1) car, two (2) game programs and two (2) least expensive, adult-size adjustable caps.salary
: Average yearly salary for players on the active roster
graduation.csv
The data in graduation.csv include student-level attributes for \(n=2344\) randomly sampled students who were first-year, full-time students from the 2002 cohort at a large, midwestern research university. Any students who transferred to another institution were removed from the data. The source of these data is: Jones-White, Radcliffe, Lorenz, & Soria (2014). We will use these data to explore predictors of college graduation.
degree
: Did the student graduate from the institution? (0 = No; 1 = Yes)act
: Student’s ACT score (If the student reported a SAT score, a concordance table was used to transform the score to a comparable ACT score.)scholarship
: Amount of scholarship offered to student (in thousands of dollars)ap
: Number of Advanced Placement credits at time of enrollmentfirstgen
: Is the student a first generation college student? (0 = No; 1 = Yes)nontrad
: Is the student a non-traditional student (older than 19 years old at the time of freshman enrollment)? (0 = No; 1 = Yes)
mn-schools.csv
The data in mnSchools.csv were collected from http://www.collegeresults.org and contain 2011 institutional data for \(n=33\) Minnesota colleges and universities. The attributes include:
name
: College/university namegrad
: Six-year graduation rate, as a percentagepublic
: Sector (1 = public college/university, 0 = private college/university)sat
: Estimated median composite SAT score (in hundreds)tuition
: Amount of tuition and required fees covering a full academic year for a typical student, in thousands of U.S. dollars
movies.csv
The data in movies.csv includes attributes for \(n=1,806\) movies. These data are a subset of data from the movies
data object included in the ggplot2movies package. The original data contains information on 24 variables collected from 28,819 movies. The attributes include:
title
: Movie’s titlebudget
: Movie’s budget (in millions of U.S. dollars)age
: Age of the movie; Computed by subtracting the movie’s release date from 2019mpaa
: MPAA rating (PG, PG-13, R)
nba-player-data.csv and nba-team-data.csv
The data in nba-player-data.csv and nba-team-data.csv, inspired by Woltman, Feldstein, MacKay, & Rocchi (2012), include player-level attributes for \(n=300\) NBA players, and team-level attributes for \(N=30\) different teams, respectively. The player-level attributes in nba-player-data.csv include:
player
: Name of the NBA playerteam
: Name of the NBA team for each playersuccess
: A proxy for player quality/success. This is the quantile for the player based on the player’s free-throw percentage relative to the other players in the league. Higher values indicate a more succesful player (e.g., 0 = lowest 20%; 4 = highest 20%).life_satisfaction
: Score on a survey of life satisfaction. Scores range from 5 to 25, with higher scores indicating more life satisfaction.
The team-level attributes in nba-team-data.csv include:
team
:Name of the NBA teamcoach
:Name of the team’s current coachcoach_experience
: This is the tercile for the coach based on years of coaching experience in the NBA (e.g., 0 = lowest third; 2 = highest third).
netherlands-students.csv and netherlands-schools.csv
The data in netherlands-students.csv and netherlands-schools.csv include student- and school-level attributes, respectively, for \(n_i=2287\) 8th-grade students in the Netherlands provided by Snijders & Bosker (2012).
The student-level attributes in netherlands-students.csv include:
school_id
: The school ID number for each studentlanguage_pre
: Language pre-test scorelanguage_post
: Language post-test scoreses
: Measure of the socio-economic statusverbal_iq
: Student’s score on a verbal IQ test. The variable is centered to have a mean of 0.female
: Student’s sex (0 = male; 1 = female)minority
: Student’s minority status (0 = white; 1 = minority)
The school-level attributes in netherlands-schools.csv include:
school_id
: The school ID numberschool_type
: Indicates whether the school is a public school, a Protestant private school, a Catholic private school, or a non-denominational private schoolpublic
: Indicates whether the school is a public school (1) or a private school (0)school_ses
: School’s average socio-economic statusschool_verbal_iq
: School’s average verbal IQ scoreschool_minority
: Percentage of students at the school who are minority students
nhl.csv
The data in nhl.csv includes data on the cost of attending an NHL game over 9 seasons for the current 31 NHL teams. The attributes include:
team
: NHL team namefci
: Fan cost index (FCI) for each season. There are no data for 2012, since that year the NHL was locked out. The FCI comprises the prices of four (4) average-price tickets, two (2) small draft beers, four (4) small soft drinks, four (4) regular-size hot dogs, parking for one (1) car, two (2) game programs and two (2) least-expensive, adult-size adjustable caps. Costs were determined by telephone calls with representatives of the teams, venues and concessionaires. Identical questions were asked in all interviews.year
: NHL season (e.g., 2002 indicates the 2002–2003 NHL season)hs_hockey
: An dummy coded variable that indicates whether there is state organized high school hockey in the team’s location (0 = no; 1 = yes). This is a proxy for whether there is a hockey tradition in the team’s location.
popular-classroom.csv and popular-student.csv
The data in popular-classroom.csv and popular-student.csv includes data on 2000 different students from 100 different classrooms. The data, provided by Hox (2002), were simulated from data collected as part of a sociological study of student popularity. Student popularity, a rating on a scale of 1–10, was derived by a sociometric procedure in which all students in a class rate all the other students. Each students’ popularity is the average received popularity rating.
The attributes in popular-classroom.csv include:
class
: Classroom IDteacherExp
: Experience level of teacher, in years
The attributes in popular-student.csv include:
student
: Student ID (within a school)class
: Classroom IDpopularity
: Popularity rating of the student (0-10 scale; where 0 = very unpopular and 10 = very popular) based on having all other students in the student’s class rate him/her and computing the average of those ratings.teachPop
: Student’s popularity as rated by the teacher. Higher values are indicative of higher popularityextra
: Extraversion of the student (measured on a 10-point scale)female
: Sex of the student (Male = 0; Female = 1)
riverview.csv
The data in riverview.csv come from Lewis-Beck & Lewis-Beck (2016) and contain five attributes collected from a random sample of \(n=32\) employees working for the city of Riverview, a hyopothetical midwestern city. The attributes include:
education
: Years of formal educationincome
: Annual income (in thousands of U.S. dollars)seniority
: Years of senioritygender
: Employee’s gendermale
: Dummy coded gender variable (0 = Female, 1 = Male)party
: Political party affiliation
same-sex-marriage.csv
This file contains data collected from the 2008 American National Election Study conducted jointly by the University of Michigan and Stanford University. These particular data consist of \(n=1,746\) American’s responses. The attributes in the dataset include:
support
: Does the respondent support gay marriage? (1=Yes; 0=No)attendance
: How often does the respondent attend religious services? (0=Never; 1=Few times a year; 2=Once or twice a month; 3=Almost every week; 4=Every week)denomination
: What is the respondent’s religious denomination?friends
: Does the respondent have family or friends that are LGBT? (1=Yes; 0=No)age
: Respondent’s age, in yearsfemale
: Is the respondent female? (1=Yes; 0=No)
vocabulary.csv
The data, adapted from data provided by Bock (1975), come from the Laboratory School of the University of Chicago and include scaled test scores across four grades from the vocabulary section of the Cooperative Reading Test for \(n=64\) students. The attributes in the dataset include:
id
: The subject ID number for each malevocab_08
: The scaled vocabulary test score in 8th gradevocab_09
: The scaled vocabulary test score in 9th gradevocab_10
: The scaled vocabulary test score in 10th gradevocab_11
: The scaled vocabulary test score in 11th gradefemale
: Dummy coded sex variable (0 = Male, 1 = Female)
wine.csv
The data in wine.csv includes data on 200 different wines. These data are a subset of a larger database (\(n = 6,613\)) from wine.com, one of the biggest e-commerce wine retailers in the U.S. It allows customers to buy wine according to any price range, grape variety, country of origin, etc. The data were made available at http://insightmine.com/. The attributes include:
wine
: Wine namevintage
: Year the wine was produced (centered so that 0 = 2008, 1 = 2009, etc.)region
: Region of the world where the wine was producedvarietal
: Grape varietal (e.g., Cabernet Sauvignon)rating
: Wine rating on a 100 pt. scale (these are from sources such as Wine Spectator, the Wine Advocate, and the Wine Enthusiast)price
: Price in U.S. dollars
References
U.S. News and World Report. (2018). Schools of education. Best Graduate Schools.
Jones-White, D. R., Radcliffe, P. M., Lorenz, L. M., & Soria, K. M. (2014). Priced out?: The influence of financial aid on the educational trajectories of first-year students starting college at a large research university. Research in Higher Education, 55(4), 329–350.
Woltman, H., Feldstein, A., MacKay, J. C., & Rocchi, M. (2012). An introduction to hierarchical linear modeling. Tutorials in Quantitative Methods for Psychology, 8(1), 52–69.
Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling (2nd ed.). Thousand Oaks, CA: Sage.
Hox, J. J. (2002). Multilevel analysis: Techniques and applications. Mahwah, NJ: Lawrence Erlbaum Associates.
Lewis-Beck, C., & Lewis-Beck, M. (2016). Applied regression: An introduction (2nd ed.). Thousand Oaks, CA: Sage.
Bock, R. (1975). Multivariate statistical methods in behavioral research. New York: McGraw-Hill.