A very recent study from 2023 posed the question of which type of feedback produced the best improvement in writing performance. The study was published in the Journal of Experimental Psychology: Applied and was predicated on earlier work that suggested students who generate self-feedback using external criteria (as opposed to individual feedback provided by a teacher) produced more improvement. The question then becomes, what tools best help students generate that self-feedback? This study found that rubrics produced the most improvement across the areas scored, with exemplars coming second, and a combination of rubrics and exemplars coming third. That seems relatively straightforward and eminently applicable, but there are some potential confounders that I think are important to consider. Before we get to them, here’s a breakdown of the task and the findings.
Task Parameters
The study included 206 9th and 10th graders with an average age of 14 and 15 for each grade respectively. The task was an SAT essay that required students to read a literature passage and explain how the author built an argument to persuade the audience, using evidence from the text to support their analysis. They had 50 minutes to generate a 500-750 word essay. The students were scored in three areas: Reading, Writing, and Analysis. As you might expect, reading referred to how well the students understood the passage, writing measured style, tone, conventions, focus, and ability to use language effectively, and analysis measured how well the student understood the author’s technique and how complex their arguments were. Each area was scored from 1 (low) to 4 (high). Students in the rubric-only group were given three rubrics, one for each area*. Students in the exemplar group were given exemplars of weak, average, and strong performance. The combination group received all the rubrics and the exemplars. The control group received none of these materials. Rubrics and exemplars were developed and provided by College Board, which owns the SAT test. Students in the first three groups wrote a draft essay and then were asked to look at the provided materials to revise their essay. The control group was simply asked to draft and revise their essay.
The researchers threw in a tiny design curve ball. A week after the rubrics, exemplars, and combination groups revised their first essays, they wrote another essay and then participated in a class on how to use rubrics, exemplars, or both. Then they were told to revise the draft of their second essay. The idea was to examine if they could effect more improvement by providing specific instruction in using the tools provided.
Findings
The study provides a lot of statistical data which I am going to reduce to a couple of short exhibits. First, improvement in reading over the various drafts:
T1 is a draft of Essay 1,T2 is the revision of Essay 1. T3 is a draft of Essay 2 and T4 is revision of Essay 2. From the data, there is a significant improvement in reading for the rubric group between the first and second versions of Essay 1 and between the first and second versions of Essay 2. What’s interesting here is that the rubric group was lower initially than all the other groups except the control group but after being presented with the rubrics, they were higher than the other groups, even without instruction on how to use the rubrics. So the rubrics significantly improved their ability to read and understand the literary passage. The exemplars group narrowly outperformed the combined group without any instruction, and then significantly outperformed the combined group with instruction on using the exemplars.
Next, improvement in writing between the groups:
In writing, the rubric group again started below all the other groups but had a very significant improvement after being given the rubrics. Their second essay attempt was better still and the revision with instruction on using the rubrics was even higher than the scores on their first essay revision. The combined group outperformed the exemplars group in their first revision but the exemplars group significantly outperformed the combined group after being instructed how to use the exemplars. So the rubrics improved students’ ability to write with appropriate focus, conventions, tone more than the other tools, and the tools plus instruction on using them improved their writing even more.
Improvement in analysis between groups:
Again, the groups are tightly clustered at about the same performance level and then the rubric group takes off. The conclusion here is that the rubrics resulted in significant improvement over performance without using them and that use of rubrics produced more improvement than other tools. Why this should be true when the combined group was also using rubrics is mystifying to me. The researchers posited that it may be because exemplars require more effort to process (analyze, compare, etc.) and therefore including them just isn’t as efficient when the goal is establishing performance standards.
Caveats and Confounders
Here’s where I’m going to rain on the parade a bit. We can’t accept these results without considering the following:
- The study had a respectable number of participants, but they were all from a private school located in a suburb. Only 4% of students in the study qualified for free/reduced lunch. It seems likely that family income is going to be higher than average since most of the students’ families have income sufficient to pay tuition at a private school. Income, as research has consistently demonstrated, affects a whole range of student performance measures. The more money a student’s family has, the better they tend to do academically. High income families may hold a different set of expectations for their children and definitely have access to tutoring and external experiences to bolster learning.
- The task is an SAT essay. There is a very high correlation between performance on the SAT and family income. This is also true of the ACT. More than any other demographic, income strongly predicts performance on these tests, with higher family income producing more success.
- The pool of students was mostly white. The demographics were: 72.3% White, 14.1% Asian, 8.7% Hispanic/ Latino/Latina, and 4.9% Black. While race/ethnicity is not as strong a predictor as income on SAT test performance, having a pool of students that is mostly white and high income means these conclusions may only be true (or at least, truest) for white students in private schools from families with higher incomes. Research is only just beginning to explore the degree to which strategies may work with one group and not another. This is true not just in education but in multiple areas of research, including psychology, health and physiology, and others. This particular study didn’t break down performance by demographic group, so there’s no way to know how the use of rubrics and exemplars affected low-income students or non-white students specifically.
So, it this promising? Yes. Can we apply it indiscriminately to all students? That’s less clear and demands more research.
_________________________________________________________________
*For the English teachers out there who might be wondering (as I did), the reading rubric was broken down into four criteria: (a) source text comprehension; (b) central idea and details; (c) errors; (d) use of textual evidence. The writing rubric contained five criteria: (a) cohesiveness and use of language; (b) central claim; (c) essay structure; (d) sentence structure and style; (e) conventions of standard English. The analysis rubric had four criteria: (a) source text analysis; (b) evaluation of evidence; (c) support for claims; (d) focus on relevant features.