Data Codebooks

The data codebooks provide information about the attributes and source of each of the datasets used in the notes.

ed-schools-2018.csv

The data in ed-schools-2018.csv come from U.S. News and World Report (2018) and contain 13 attributes collected from the \(n=129\) graduate schools of education ranked in the 2018 Best Graduate Schools. The attributes include:

  • rank: Rank in USNWR
  • school: Graduate program of Education
  • score: Overall score given by USNWR
  • peer: Peer assessment score (5.0 = highest)
  • expert_score: Administrator/expert assessment score (5.0 = highest)
  • gre_verbal: Mean GRE verbal score in 2016
  • gre_quant: Mean GRE quantitative score in 2016
  • doc_accept: Acceptance rate for doctoral students in 2016
  • student_faculty_ratio: Ratio of doctoral students to faculty members in 2016
  • phd_granted_per_faculty: Doctorates granted per faculty member in 2015–16
  • funded_research: Funded research (in millions of dollars)
  • funded_research_per_faculty: Funded research per faculty member (in thousands of dollars)
  • enroll: Total graduate education enrollment in 2016

evaluations.csv

This file contains data collected from student evaluations of instructors’ beauty and teaching quality for several courses at the University of Texas. The teaching evaluations were conducted at the end of the semester, and the beauty judgments were made later, by six students who had not attended the classes and were not aware of the course evaluations. The variables are:

  • prof_id: Professor ID number
  • avg_eval: Average course rating
  • num_courses: Number of courses for which the professor has evaluations
  • num_students: Number of students enrolled in the professor’s courses
  • perc_evaluating: Average percentage of enrolled students who completed an evaluation
  • beauty: Measure of the professor’s beauty composed of the average score on six standardized beauty ratings
  • tenured: Is the professor tenured? (0 = non-tenured; 1 = tenured)
  • native_english: Is the professor a native English speaker? (0 = non-native English speaker; 1 = native English speaker)
  • age: Professor’s age (in years)
  • female: Is the professor female? (0 = male; 1 = female)

These source of these data is: Hamermesh, D. S. & Parker, A. M. (2005). Beauty in the classroom: Instructors’ pulchritude and putative pedagogical productivity. Economics of Education Review, 24, 369–376. The data were made available by: Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press.

fci-2015.csv

Each season, Team Marketing Report (TMR) computes the cost of taking a family of four to a professional sports contest for each of the major sporting leagues. Costs are determined by telephone calls with representatives of the teams, venues and concessionaires. Identical questions were asked in all interviews. Prices for Canadian teams were converted to US dollars and comparison prices were converted using a recent exchange rate. Salary data were collected by Sporting Intelligence as part of their Global Sports Salaries Survey 2015.

The data in fci-2015.csv include five attributes collected from from the 2014/2015 season for \(n=122\) professional sports teams across the United States. The attributes include:

  • team: Name of professional sports team
  • league: Major sporting league the team plays in (MLB = Major Lague Baseball; NBA = National Basketball Association; NFL = National Football League; NHL = National Hockey League)
  • fci: Fan Cost Index (FCI). The FCI is a summary of what it costs to take a family of four to a game. It comprises the prices of four (4) adult average-price tickets, two (2) small draft beers, four (4) small soft drinks, four (4) regular-size hot dogs, parking for one (1) car, two (2) game programs and two (2) least expensive, adult-size adjustable caps.
  • salary: Average yearly salary for players on the active roster

graduation.csv

The data in graduation.csv include student-level attributes for \(n=2344\) randomly sampled students who were first-year, full-time students from the 2002 cohort at a large, midwestern research university. Any students who transferred to another institution were removed from the data. The source of these data is: Jones-White, Radcliffe, Lorenz, & Soria (2014). We will use these data to explore predictors of college graduation.

  • degree: Did the student graduate from the institution? (0 = No; 1 = Yes)
  • act: Student’s ACT score (If the student reported a SAT score, a concordance table was used to transform the score to a comparable ACT score.)
  • scholarship: Amount of scholarship offered to student (in thousands of dollars)
  • ap: Number of Advanced Placement credits at time of enrollment
  • firstgen: Is the student a first generation college student? (0 = No; 1 = Yes)
  • nontrad: Is the student a non-traditional student (older than 19 years old at the time of freshman enrollment)? (0 = No; 1 = Yes)

mn-schools.csv

The data in mnSchools.csv were collected from http://www.collegeresults.org and contain 2011 institutional data for \(n=33\) Minnesota colleges and universities. The attributes include:

  • name: College/university name
  • grad: Six-year graduation rate, as a percentage
  • public: Sector (1 = public college/university, 0 = private college/university)
  • sat: Estimated median composite SAT score (in hundreds)
  • tuition: Amount of tuition and required fees covering a full academic year for a typical student, in thousands of U.S. dollars

movies.csv

The data in movies.csv includes attributes for \(n=1,806\) movies. These data are a subset of data from the movies data object included in the ggplot2movies package. The original data contains information on 24 variables collected from 28,819 movies. The attributes include:

  • title: Movie’s title
  • budget: Movie’s budget (in millions of U.S. dollars)
  • age: Age of the movie; Computed by subtracting the movie’s release date from 2019
  • mpaa: MPAA rating (PG, PG-13, R)

nba-player-data.csv and nba-team-data.csv

The data in nba-player-data.csv and nba-team-data.csv, inspired by Woltman, Feldstein, MacKay, & Rocchi (2012), include player-level attributes for \(n=300\) NBA players, and team-level attributes for \(N=30\) different teams, respectively. The player-level attributes in nba-player-data.csv include:

  • player: Name of the NBA player
  • team: Name of the NBA team for each player
  • success: A proxy for player quality/success. This is the quantile for the player based on the player’s free-throw percentage relative to the other players in the league. Higher values indicate a more succesful player (e.g., 0 = lowest 20%; 4 = highest 20%).
  • life_satisfaction: Score on a survey of life satisfaction. Scores range from 5 to 25, with higher scores indicating more life satisfaction.

The team-level attributes in nba-team-data.csv include:

  • team:Name of the NBA team
  • coach:Name of the team’s current coach
  • coach_experience: This is the tercile for the coach based on years of coaching experience in the NBA (e.g., 0 = lowest third; 2 = highest third).

netherlands-students.csv and netherlands-schools.csv

The data in netherlands-students.csv and netherlands-schools.csv include student- and school-level attributes, respectively, for \(n_i=2287\) 8th-grade students in the Netherlands provided by Snijders & Bosker (2012).

The student-level attributes in netherlands-students.csv include:

  • school_id: The school ID number for each student
  • language_pre: Language pre-test score
  • language_post: Language post-test score
  • ses: Measure of the socio-economic status
  • verbal_iq: Student’s score on a verbal IQ test. The variable is centered to have a mean of 0.
  • female: Student’s sex (0 = male; 1 = female)
  • minority: Student’s minority status (0 = white; 1 = minority)

The school-level attributes in netherlands-schools.csv include:

  • school_id: The school ID number
  • school_type: Indicates whether the school is a public school, a Protestant private school, a Catholic private school, or a non-denominational private school
  • public: Indicates whether the school is a public school (1) or a private school (0)
  • school_ses: School’s average socio-economic status
  • school_verbal_iq: School’s average verbal IQ score
  • school_minority: Percentage of students at the school who are minority students

nhl.csv

The data in nhl.csv includes data on the cost of attending an NHL game over 9 seasons for the current 31 NHL teams. The attributes include:

  • team: NHL team name
  • fci: Fan cost index (FCI) for each season. There are no data for 2012, since that year the NHL was locked out. The FCI comprises the prices of four (4) average-price tickets, two (2) small draft beers, four (4) small soft drinks, four (4) regular-size hot dogs, parking for one (1) car, two (2) game programs and two (2) least-expensive, adult-size adjustable caps. Costs were determined by telephone calls with representatives of the teams, venues and concessionaires. Identical questions were asked in all interviews.
  • year: NHL season (e.g., 2002 indicates the 2002–2003 NHL season)
  • hs_hockey: An dummy coded variable that indicates whether there is state organized high school hockey in the team’s location (0 = no; 1 = yes). This is a proxy for whether there is a hockey tradition in the team’s location.

riverview.csv

The data in riverview.csv come from Lewis-Beck & Lewis-Beck (2016) and contain five attributes collected from a random sample of \(n=32\) employees working for the city of Riverview, a hyopothetical midwestern city. The attributes include:

  • education: Years of formal education
  • income: Annual income (in thousands of U.S. dollars)
  • seniority: Years of seniority
  • gender: Employee’s gender
  • male: Dummy coded gender variable (0 = Female, 1 = Male)
  • party: Political party affiliation

same-sex-marriage.csv

This file contains data collected from the 2008 American National Election Study conducted jointly by the University of Michigan and Stanford University. These particular data consist of \(n=1,746\) American’s responses. The attributes in the dataset include:

  • support: Does the respondent support gay marriage? (1=Yes; 0=No)
  • attendance: How often does the respondent attend religious services? (0=Never; 1=Few times a year; 2=Once or twice a month; 3=Almost every week; 4=Every week)
  • denomination: What is the respondent’s religious denomination?
  • friends: Does the respondent have family or friends that are LGBT? (1=Yes; 0=No)
  • age: Respondent’s age, in years
  • female: Is the respondent female? (1=Yes; 0=No)

vocabulary.csv

The data, adapted from data provided by Bock (1975), come from the Laboratory School of the University of Chicago and include scaled test scores across four grades from the vocabulary section of the Cooperative Reading Test for \(n=64\) students. The attributes in the dataset include:

  • id: The subject ID number for each male
  • vocab_08: The scaled vocabulary test score in 8th grade
  • vocab_09: The scaled vocabulary test score in 9th grade
  • vocab_10: The scaled vocabulary test score in 10th grade
  • vocab_11: The scaled vocabulary test score in 11th grade
  • female: Dummy coded sex variable (0 = Male, 1 = Female)

wine.csv

The data in wine.csv includes data on 200 different wines. These data are a subset of a larger database (\(n = 6,613\)) from wine.com, one of the biggest e-commerce wine retailers in the U.S. It allows customers to buy wine according to any price range, grape variety, country of origin, etc. The data were made available at http://insightmine.com/. The attributes include:

  • wine: Wine name
  • vintage: Year the wine was produced (centered so that 0 = 2008, 1 = 2009, etc.)
  • region: Region of the world where the wine was produced
  • varietal: Grape varietal (e.g., Cabernet Sauvignon)
  • rating: Wine rating on a 100 pt. scale (these are from sources such as Wine Spectator, the Wine Advocate, and the Wine Enthusiast)
  • price: Price in U.S. dollars

References

U.S. News and World Report. (2018). Schools of education. Best Graduate Schools.

Jones-White, D. R., Radcliffe, P. M., Lorenz, L. M., & Soria, K. M. (2014). Priced out?: The influence of financial aid on the educational trajectories of first-year students starting college at a large research university. Research in Higher Education, 55(4), 329–350.

Woltman, H., Feldstein, A., MacKay, J. C., & Rocchi, M. (2012). An introduction to hierarchical linear modeling. Tutorials in Quantitative Methods for Psychology, 8(1), 52–69.

Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling (2nd ed.). Thousand Oaks, CA: Sage.

Hox, J. J. (2002). Multilevel analysis: Techniques and applications. Mahwah, NJ: Lawrence Erlbaum Associates.

Lewis-Beck, C., & Lewis-Beck, M. (2016). Applied regression: An introduction (2nd ed.). Thousand Oaks, CA: Sage.

Bock, R. (1975). Multivariate statistical methods in behavioral research. New York: McGraw-Hill.