Summarizing and Visualizing Data

The chapters in this section will introduce methods for exploring and describing data. In particular we will focus on the visualization and numerical summarization of univariate distributions (i.e., single attribute). In Chapter 5 you will learn methods for summarizing and visualizing categorical attributes and Chapter 6 will introduce methods for quantitative attributes.


Goals for Summarization and Visualization

Data scientists and statisticians visualize data and compute numerical summaries to explore and understand data. In addition to visualizing distributions of data, it is common to also summarize certain feature of the data using numbers. (For example, the mean is one summarization of a distribution of quantitative data.) Together visualizing and summarizing data can help analysts identify features in the data such as typical or extreme observations, and also describe and explore the variation in the data. Data exploration is an important first step in any statistical analysis.

College Scorecard Data

Throughout the chapters in this section we will use the College Scorecard data to illustrate the methods of data exploration. These data were collected and made available by the U.S. Department of Education (DOE). The DOE publishes data on institutions of higher education in their College Scorecard to facilitate transparency and provide information for interested stakeholders (e.g., parents, students, educators). A subset of this data is provided in the file college-scorecard.csv.