Data Codebooks

The data codebooks provide information about the attributes and source of each of the datasets used in the notes.

ed-schools-2018.csv

The data in ed-schools-2018.csv come from U.S. News and World Report (2018) and contain 13 attributes collected from the \(n=129\) graduate schools of education ranked in the 2018 Best Graduate Schools. The attributes include:

rank: Rank in USNWR
school: Graduate program of Education
score: Overall score given by USNWR
peer: Peer assessment score (5.0 = highest)
expert_score: Administrator/expert assessment score (5.0 = highest)
gre_verbal: Mean GRE verbal score in 2016
gre_quant: Mean GRE quantitative score in 2016
doc_accept: Acceptance rate for doctoral students in 2016
student_faculty_ratio: Ratio of doctoral students to faculty members in 2016
phd_granted_per_faculty: Doctorates granted per faculty member in 2015–16
funded_research: Funded research (in millions of dollars)
funded_research_per_faculty: Funded research per faculty member (in thousands of dollars)
enroll: Total graduate education enrollment in 2016

evaluations.csv

This file contains data collected from student evaluations of instructors’ beauty and teaching quality for several courses at the University of Texas. The teaching evaluations were conducted at the end of the semester, and the beauty judgments were made later, by six students who had not attended the classes and were not aware of the course evaluations. The variables are:

prof_id: Professor ID number
avg_eval: Average course rating
num_courses: Number of courses for which the professor has evaluations
num_students: Number of students enrolled in the professor’s courses
perc_evaluating: Average percentage of enrolled students who completed an evaluation
beauty: Measure of the professor’s beauty composed of the average score on six standardized beauty ratings
tenured: Is the professor tenured? (0 = non-tenured; 1 = tenured)
native_english: Is the professor a native English speaker? (0 = non-native English speaker; 1 = native English speaker)
age: Professor’s age (in years)
female: Is the professor female? (0 = male; 1 = female)

These source of these data is: Hamermesh, D. S. & Parker, A. M. (2005). Beauty in the classroom: Instructors’ pulchritude and putative pedagogical productivity. Economics of Education Review, 24, 369–376. The data were made available by: Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press.

fci-2015.csv

Each season, Team Marketing Report (TMR) computes the cost of taking a family of four to a professional sports contest for each of the major sporting leagues. Costs are determined by telephone calls with representatives of the teams, venues and concessionaires. Identical questions were asked in all interviews. Prices for Canadian teams were converted to US dollars and comparison prices were converted using a recent exchange rate. Salary data were collected by Sporting Intelligence as part of their Global Sports Salaries Survey 2015.

The data in fci-2015.csv include five attributes collected from from the 2014/2015 season for \(n=122\) professional sports teams across the United States. The attributes include:

team: Name of professional sports team
league: Major sporting league the team plays in (MLB = Major Lague Baseball; NBA = National Basketball Association; NFL = National Football League; NHL = National Hockey League)
fci: Fan Cost Index (FCI). The FCI is a summary of what it costs to take a family of four to a game. It comprises the prices of four (4) adult average-price tickets, two (2) small draft beers, four (4) small soft drinks, four (4) regular-size hot dogs, parking for one (1) car, two (2) game programs and two (2) least expensive, adult-size adjustable caps.
salary: Average yearly salary for players on the active roster

graduation.csv

The data in graduation.csv include student-level attributes for \(n=2344\) randomly sampled students who were first-year, full-time students from the 2002 cohort at a large, midwestern research university. Any students who transferred to another institution were removed from the data. The source of these data is: Jones-White, Radcliffe, Lorenz, & Soria (2014). We will use these data to explore predictors of college graduation.

degree: Did the student graduate from the institution? (0 = No; 1 = Yes)
act: Student’s ACT score (If the student reported a SAT score, a concordance table was used to transform the score to a comparable ACT score.)
scholarship: Amount of scholarship offered to student (in thousands of dollars)
ap: Number of Advanced Placement credits at time of enrollment
firstgen: Is the student a first generation college student? (0 = No; 1 = Yes)
nontrad: Is the student a non-traditional student (older than 19 years old at the time of freshman enrollment)? (0 = No; 1 = Yes)

mn-schools.csv

The data in mnSchools.csv were collected from http://www.collegeresults.org and contain 2011 institutional data for \(n=33\) Minnesota colleges and universities. The attributes include:

name: College/university name
grad: Six-year graduation rate, as a percentage
public: Sector (1 = public college/university, 0 = private college/university)
sat: Estimated median composite SAT score (in hundreds)
tuition: Amount of tuition and required fees covering a full academic year for a typical student, in thousands of U.S. dollars

movies.csv

The data in movies.csv includes attributes for \(n=1,806\) movies. These data are a subset of data from the movies data object included in the ggplot2movies package. The original data contains information on 24 variables collected from 28,819 movies. The attributes include:

title: Movie’s title
budget: Movie’s budget (in millions of U.S. dollars)
age: Age of the movie; Computed by subtracting the movie’s release date from 2019
mpaa: MPAA rating (PG, PG-13, R)

nba-player-data.csv and nba-team-data.csv

The data in nba-player-data.csv and nba-team-data.csv, inspired by Woltman, Feldstein, MacKay, & Rocchi (2012), include player-level attributes for \(n=300\) NBA players, and team-level attributes for \(N=30\) different teams, respectively. The player-level attributes in nba-player-data.csv include:

player: Name of the NBA player
team: Name of the NBA team for each player
success: A proxy for player quality/success. This is the quantile for the player based on the player’s free-throw percentage relative to the other players in the league. Higher values indicate a more succesful player (e.g., 0 = lowest 20%; 4 = highest 20%).
life_satisfaction: Score on a survey of life satisfaction. Scores range from 5 to 25, with higher scores indicating more life satisfaction.

The team-level attributes in nba-team-data.csv include:

team:Name of the NBA team
coach:Name of the team’s current coach
coach_experience: This is the tercile for the coach based on years of coaching experience in the NBA (e.g., 0 = lowest third; 2 = highest third).

netherlands-students.csv and netherlands-schools.csv

The data in netherlands-students.csv and netherlands-schools.csv include student- and school-level attributes, respectively, for \(n_i=2287\) 8th-grade students in the Netherlands provided by Snijders & Bosker (2012).

The student-level attributes in netherlands-students.csv include:

school_id: The school ID number for each student
language_pre: Language pre-test score
language_post: Language post-test score
ses: Measure of the socio-economic status
verbal_iq: Student’s score on a verbal IQ test. The variable is centered to have a mean of 0.
female: Student’s sex (0 = male; 1 = female)
minority: Student’s minority status (0 = white; 1 = minority)

The school-level attributes in netherlands-schools.csv include:

school_id: The school ID number
school_type: Indicates whether the school is a public school, a Protestant private school, a Catholic private school, or a non-denominational private school
public: Indicates whether the school is a public school (1) or a private school (0)
school_ses: School’s average socio-economic status
school_verbal_iq: School’s average verbal IQ score
school_minority: Percentage of students at the school who are minority students

nhl.csv

The data in nhl.csv includes data on the cost of attending an NHL game over 9 seasons for the current 31 NHL teams. The attributes include:

team: NHL team name
fci: Fan cost index (FCI) for each season. There are no data for 2012, since that year the NHL was locked out. The FCI comprises the prices of four (4) average-price tickets, two (2) small draft beers, four (4) small soft drinks, four (4) regular-size hot dogs, parking for one (1) car, two (2) game programs and two (2) least-expensive, adult-size adjustable caps. Costs were determined by telephone calls with representatives of the teams, venues and concessionaires. Identical questions were asked in all interviews.
year: NHL season (e.g., 2002 indicates the 2002–2003 NHL season)
hs_hockey: An dummy coded variable that indicates whether there is state organized high school hockey in the team’s location (0 = no; 1 = yes). This is a proxy for whether there is a hockey tradition in the team’s location.

popular-classroom.csv and popular-student.csv

The data in popular-classroom.csv and popular-student.csv includes data on 2000 different students from 100 different classrooms. The data, provided by Hox (2002), were simulated from data collected as part of a sociological study of student popularity. Student popularity, a rating on a scale of 1–10, was derived by a sociometric procedure in which all students in a class rate all the other students. Each students’ popularity is the average received popularity rating.

The attributes in popular-classroom.csv include:

class: Classroom ID
teacherExp: Experience level of teacher, in years

The attributes in popular-student.csv include:

student: Student ID (within a school)
class: Classroom ID
popularity: Popularity rating of the student (0-10 scale; where 0 = very unpopular and 10 = very popular) based on having all other students in the student’s class rate him/her and computing the average of those ratings.
teachPop: Student’s popularity as rated by the teacher. Higher values are indicative of higher popularity
extra: Extraversion of the student (measured on a 10-point scale)
female: Sex of the student (Male = 0; Female = 1)

riverview.csv

The data in riverview.csv come from Lewis-Beck & Lewis-Beck (2016) and contain five attributes collected from a random sample of \(n=32\) employees working for the city of Riverview, a hyopothetical midwestern city. The attributes include:

education: Years of formal education
income: Annual income (in thousands of U.S. dollars)
seniority: Years of seniority
gender: Employee’s gender
male: Dummy coded gender variable (0 = Female, 1 = Male)
party: Political party affiliation

same-sex-marriage.csv

This file contains data collected from the 2008 American National Election Study conducted jointly by the University of Michigan and Stanford University. These particular data consist of \(n=1,746\) American’s responses. The attributes in the dataset include:

support: Does the respondent support gay marriage? (1=Yes; 0=No)
attendance: How often does the respondent attend religious services? (0=Never; 1=Few times a year; 2=Once or twice a month; 3=Almost every week; 4=Every week)
denomination: What is the respondent’s religious denomination?
friends: Does the respondent have family or friends that are LGBT? (1=Yes; 0=No)
age: Respondent’s age, in years
female: Is the respondent female? (1=Yes; 0=No)

vocabulary.csv

The data, adapted from data provided by Bock (1975), come from the Laboratory School of the University of Chicago and include scaled test scores across four grades from the vocabulary section of the Cooperative Reading Test for \(n=64\) students. The attributes in the dataset include:

id: The subject ID number for each male
vocab_08: The scaled vocabulary test score in 8th grade
vocab_09: The scaled vocabulary test score in 9th grade
vocab_10: The scaled vocabulary test score in 10th grade
vocab_11: The scaled vocabulary test score in 11th grade
female: Dummy coded sex variable (0 = Male, 1 = Female)

wine.csv

The data in wine.csv includes data on 200 different wines. These data are a subset of a larger database (\(n = 6,613\)) from wine.com, one of the biggest e-commerce wine retailers in the U.S. It allows customers to buy wine according to any price range, grape variety, country of origin, etc. The data were made available at http://insightmine.com/. The attributes include:

wine: Wine name
vintage: Year the wine was produced (centered so that 0 = 2008, 1 = 2009, etc.)
region: Region of the world where the wine was produced
varietal: Grape varietal (e.g., Cabernet Sauvignon)
rating: Wine rating on a 100 pt. scale (these are from sources such as Wine Spectator, the Wine Advocate, and the Wine Enthusiast)
price: Price in U.S. dollars

References

U.S. News and World Report. (2018). Schools of education. Best Graduate Schools.

Jones-White, D. R., Radcliffe, P. M., Lorenz, L. M., & Soria, K. M. (2014). Priced out?: The influence of financial aid on the educational trajectories of first-year students starting college at a large research university. Research in Higher Education, 55(4), 329–350.

Woltman, H., Feldstein, A., MacKay, J. C., & Rocchi, M. (2012). An introduction to hierarchical linear modeling. Tutorials in Quantitative Methods for Psychology, 8(1), 52–69.

Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling (2nd ed.). Thousand Oaks, CA: Sage.

Hox, J. J. (2002). Multilevel analysis: Techniques and applications. Mahwah, NJ: Lawrence Erlbaum Associates.

Lewis-Beck, C., & Lewis-Beck, M. (2016). Applied regression: An introduction (2nd ed.). Thousand Oaks, CA: Sage.

Bock, R. (1975). Multivariate statistical methods in behavioral research. New York: McGraw-Hill.