Can We Learn from Mistakes? Development of Categorical Scoring Rubrics for Diagnostic Purposes

June 5, 2017

Press contact:Jenesse Miller(213) 810-8554

How can we learn about what a student can or cannot do from the mistakes they make? This post presents findings based on a new method that provide diagnostic information on student learning.

The learning and educational sciences literature has long recognized that aspects of knowledge, skills and ability may be inferred not only by investigating successful completion of cognitive tasks (as in a school test), but also by identifying and categorizing errors. Errors may range from accidental mistakes, such as slips and guesses, to systematic, such as errors stemming from a lack of knowledge, and errors as the result of persistent wrong ideas or naïve conceptions.

Errors are a rich source of diagnostic information. On one hand, identification of different types of errors may benefit classroom instruction, because each kind of error may imply a different instructional action. For instance, when a student gives the answer 9 for the question 4+__=13+3, it is not clear whether it was a result of a computation error, or a systematic misunderstanding of the equal sign and how to work with equations. The latter is well documented in the literature. Beginner algebra students often treat the equal sign as a “process” (as it is used in arithmetic and calculators); that is, they see the equation above as 4+__=13 (read: 4 plus what is equal 13), and ignore the numbers that appear after the “result” 13. In other words, they do not treat the equal sign as a balance between two quantities that one needs to evaluate first before solving the exercise. An experienced teacher can often capture such mistakes through day-to-day observation and adjust instructional need accordingly, for example, to “reengineer” the understanding of the equal sign rather than to enhance the computational accuracy.

On the other hand, different sources of errors could be reflective of non-cognitive factors. As the educational literature has documented, non-cognitive factors influence students’ academic achievement, but they also contribute to how they would succeed as adults in workplaces, public lives, and so forth. For example, a student who repeatedly skips over questions, or provides off-task answers may lack the motivation to learn or succeed.

Among various question types that frequently appear in educational assessment, constructed-response items, such as essays, short-answer and show-your-work questions, have attracted increasing attention. Because students are required to produce an answer or justify their work rather than selecting one from a list, they are offered more opportunities to showcase what they know (e.g., above and beyond what is required to correctly answer a particular question) and where they fall short (e.g., making all kinds of mistakes). Unfortunately, such valuable diagnostic information is usually thrown away because a raw response is typically scored by whether it matches the correct answer or the critical skills that the item is designed to test.

In our paper in Educational Psychology, we proposed a scoring method that classifies constructed responses into categories that distinguish among sources of various incorrect responses (e.g. slip, lack of understanding, or misconception), as well as among sources of correct responses (e.g. different strategies, different depth of understanding). Our scoring method thus preserves and synthesizes the diagnostic information contained in raw responses, which can be further aggregated to produce a diagnostic report for each student. In the aforementioned algebra example of 4+__=13+3, a student who shows correct computation of 13+3=16 but then makes a mistake when subtracting 4 is deemed to have a calculation error, while another who gets an answer 9=13-4 could be indicative of conceptual misunderstanding of the equal sign. Contrary to the conventional correctness scoring method, in which both students’ responses would receive the same score (both incorrect), our scoring method treats them separately because they represent distinct conceptual (mis)understanding.

To illustrate the use of the new scoring method, we applied it to study 8^th-9^th graders’ knowledge in algebra. We found that the categorical scores do not only capture the level of correctness, in a comparable way as by the conventional scoring method, they also extract information on students’ motivation in test taking, as well as their ability to provide precise answers. The two additional dimensions, neglected by correctness scoring, reflect non-cognitive skills that are critical to students’ cognitive performance. This does not only gives a more complete picture of students’ learning process, but also provide clues of how to close the achievement gaps.

From the Elementary and Secondary Education Act to the Every Student Succeeds Act, US education laws enforce periodic measurement of students’ achievement and hold schools accountable for educational progress. In addition of knowing in what areas our children succeed, it is as important, if not more important, to identify achievement gaps and therefore develop strategies to make sure all kids meet the same high expectation. The educational assessment field today, more than ever, calls for good ways to integrate cognitive and non-cognitive skills, assessment and diagnostic information, all in the same assessment program. Our method can be useful to shade lights on the future development of such an integrative system and thus improve students’ learning.