RS is the average of scores given by two human readers;
all the others are computer programs.
To anyone who’s ever written an essay for a standardized test—be it the SAT, the ACT, the GMAT, or others—it should come as no surprise that getting a high-scoring essay is a matter of following a formula. The SAT is not the time to show off your lyrical ability or demonstrate your awareness of the nuances of morality: when the prompt is “Is it better to have loved and lost than never loved at all?” it’s hard to argue “It depends” in 25 minutes. Just take a stance, come up with two supporting examples, and hammer that baby out.
Turns out, though, that standardized test essays are so formulaic that test-scoring companies can use algorithms to grade them. And before you get worried about machines giving you a bad score because they’ve never taken an English class, said algorithms give the essays the same scores as human graders do, according to a large study that compared nine such programs with humans readers. The team used more than 20,000 essays on eight prompts, and you can see in the figure to the right, the mean scores found by the programs and the people were so close that they appear as one line on a chart of the results.
Now, that doesn’t mean that the programs can tell when a writer has got the facts wrong or has inserted grammatical gibberish, or even whether they’ve plagiarized someone else’s work, though there are programs that can do that. Les Perelman, a writing teacher at MIT, takes pleasure in tricking the program used by Educational Testing Services, the company that produces the SAT, on all these points. But this level of trickery seems unlikely on the part of the average standardized test taker, sweating out a five-paragraph essay in less than half an hour.
And as for human essay graders, they have only a couple minutes to come up with a score. When you’re under that kind of pressure, machine-like behavior is the best you can hope for.
Image courtesy of Shermis et al.