Effect Size

In the methods introduced for comparing data to a standard and comparing two groups, t- and z-tests were presented as methods that a researcher can use to examine differences. But, as Kirk (2001) points out, the group differences question often expands into three questions: (a) Is an observed effect real or should it be attributed to chance? (b) If the effect is real, how large is it? and (c) Is the effect large enough to be useful? Using the inferential methods introduced thus far, only the first of Kirk’s three questions can be answered. To answer the other two questions, which are often more meaningful to applied researchers, the results from hypothesis tests need to be supplemented with additional analysis.

Numerous researchers have argued for the use of effect sizes to complement or even replace hypothesis tests (e.g., Cohen, 1990, 1994; Kirk, 1995, 1996; Thompson, 1996, 2007). Effect size is a term used to describe a family of indices that characterize the extent to which sample results diverge from the expectations specified in the null hypothesis. These measures help researchers focus on how meaningful the results from hypothesis tests are by providing answers to Kirk’s second two questions and also provide a method by which to compare the results between different studies. Effect sizes have been fully embraced by the American Psychological Association. Its Publication Manual [APA:2019, p. 25] states, “For the reader to fully understand the importance of your findings, it is almost always necessary to include some index of effect size or strength of relationship in your Results section.” It further states that the “failure to report effect sizes” may be found by editors to be one of the “defects in the design and reporting of results” (p. 5).


References

Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45, 1304–1312.
Cohen, J. (1994). The Earth is round (\(p<.05\)). American Psychologist, 49, 997–1003.
Kirk, R. E. (1995). Experimental design: Procedures for the behavioral sciences (3rd ed.). Brooks/Cole.
Kirk, R. E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56, 746–759.
Kirk, R. E. (2001). Promoting good statistical practices: Some suggestions. Educational and Psychological Measurement, 61(2), 213–218.
Thompson, B. (1996). AERA editorial polices regarding statistical significance testing: Three suggested reforms. Educational Researcher, 25(2), 26–30.
Thompson, B. (2007). Effect sizes, confidence intervals, and confidence intervals for effect sizes. Psychology in the Schools, 44(5), 423–432.