Computers Can Grade Essays As Well As People Can

By Veronique Greenwood | April 23, 2012 1:49 pm

RS is the average of scores given by two human readers;
all the others are computer programs.

To anyone who’s ever written an essay for a standardized test—be it the SAT, the ACT, the GMAT, or others—it should come as no surprise that getting a high-scoring essay is a matter of following a formula. The SAT is not the time to show off your lyrical ability or demonstrate your awareness of the nuances of morality: when the prompt is “Is it better to have loved and lost than never loved at all?” it’s hard to argue “It depends” in 25 minutes. Just take a stance, come up with two supporting examples, and hammer that baby out.

Turns out, though, that standardized test essays are so formulaic that test-scoring companies can use algorithms to grade them. And before you get worried about machines giving you a bad score because they’ve never taken an English class, said algorithms give the essays the same scores as human graders do, according to a large study that compared nine such programs with humans readers. The team used more than 20,000 essays on eight prompts, and you can see in the figure to the right, the mean scores found by the programs and the people were so close that they appear as one line on a chart of the results.

Now, that doesn’t mean that the programs can tell when a writer has got the facts wrong or has inserted grammatical gibberish, or even whether they’ve plagiarized someone else’s work, though there are programs that can do that. Les Perelman, a writing teacher at MIT, takes pleasure in tricking the program used by Educational Testing Services, the company that produces the SAT, on all these points. But this level of trickery seems unlikely on the part of the average standardized test taker, sweating out a five-paragraph essay in less than half an hour.

And as for human essay graders, they have only a couple minutes to come up with a score. When you’re under that kind of pressure, machine-like behavior is the best you can hope for.

Image courtesy of Shermis et al.

  • prophet

    Reading this makes me so thankful that I had taken the SAT before they added the writing segment. There is nothing as appalling as the encouragement toward formulaic essay writing, we might as well discard colorful vocabulary or just tell our kids not to exercise free thought until college; provided they are fortunate enough to attend. Double-plus ungood indeed. Oh well the penduluum swings, soon a quality education will no longer be an entitlement. Lowering the bar is not a solution to plummeting test scores but what can you expect when a schools funding is based upon said scores. Private schooling is only available to the priveledged and inherently reinforces elitism, but is denying your children some social development skills or an accomodating outlook on less fortunates really that big of price to pay to give them an edge up? College tuition steadily rises and many of those that have to put themselves through it promptly find themselves inundated with a debt they can never hope to escape. People blame low interest rates or a culture of buying beyond ones means for some of our more recent economic pitfalls but who would tell someone to their face that if you cannot afford your education now you have no right to seek it? An education will soon be available strictly for the wealthy and the American promise of being granted at least the oppurtunity to rise beyonds one socio-economic class by their own sweat and sinless means will be lost. Or maybe it has already faded and I never realized..

  • floodmouse

    I want this result to be wrong. I’m not sure if that’s just an emotional response (“I’m worth more than a machine, darn it!”). I still think the consequences of a machine passing bad essays and letting (more) idiots become managers are severe enough to warrant a little human oversight.

  • John Lerch

    So prophet. Did you plagiarize your SAT essay for this paragraph?

  • Tomek

    I’ll throw out a word there that you get a lot of credit from some human graders for making a nuanced point. I’m pretty certain that’s why I did well (I feel having had a real point to make and some vague coordination was enough to get an instant 6). So human graders are probably much better at granting exceptions.

  • prophet

    Har.. no, the very first thing I stated was that I took the SAT before they added the essay portion. In fact I think I was the last or next to last crop of students that took the old one. I was happy with my 1470/1600.

  • Geack

    @1. prophet,
    The SAT writing section is included in order to test whether you have the basic ability to construct a decently organized and supported written argument. It isn’t intended to judge your value as a writer or a thinker, and it makes no sense to imply that it has some limiting impact on someone’s writing ability. Political stance aside, you’re bascially complaining that a spelling test doesn’t allow for poetry.

  • prophet

    I’ll concede that it is important to be able to properly organize an argument but no one is going to win over anyone elses opinions in such a cookie cutter fashion. I don’t agree that anything should bear the appelation of “writing portion” on a test not be intended to judge someones value as a writer. Admittedly, maybe I relish a bit of artistry or individuality but by following that logic should the math portion therefore not judge someones value as a mathematician? What exactly is it intended to judge aside from base competence? Sure that may have a value but a lowest common denominator-type value. Granted I see why things ended up the way they are now and can’t exactly propose a solution but that doesn’t make it less sucky.

    Oh yeah, PS @ lerch I cant plagiarize myself.

  • hollington

    The SAT essay is such a bad idea. It is graded by people; people who are paid to read and quickly assign a grade to an essay. I happen to know a good number of kids who have recently gone through this. What you say: “choose a side, hammer it out” is exactly right. Except…
    One of the questions that was asked on a recent SAT was about government control of industry vs private control of industry. On that test, a student I know wrote in favor of private control, and received the lowest score possible. Same student received top scores on other SAT exams, and his comment was: “The essay I wrote was excellent, I just chose the wrong side”. So, if being graded by humans: “chose [the politically correct] side, hammer it out”.

  • tOM Trottier

    @prophet & @geack – the NYT article should put paid to the argument that the computer recognizes a “decently organized and supported” argument. The program seems to just check for some non-topic keywords and sentence & paragraph structures without any regard to other content, consistency, or truthfulness. See the example at

    Also, the programs may be discriminatory. See “researchers compared machine scores to human ones on essays written by 107 students in a developmental writing course at South Texas College, a community college near the Mexico border that is 95 percent Hispanic. They found no significant correlation.

  • Iain

    Flood says ” I still think the consequences of a machine passing bad essays and letting (more) idiots become managers”
    People are already doing that, machines can do it faster and cheaper. That’s all.


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!


80beats is DISCOVER's news aggregator, weaving together the choicest tidbits from the best articles covering the day's most compelling topics.

See More

Collapse bottom bar