Computers Exploit Human Brainpower to Decipher Faded Texts


text reCAPTCHAIn a neat example of Internet-enabled “crowdsourcing,” the method of distributing a large task to many contributors, researchers are using an anti-spam program to get people to decipher damaged or faded texts, one word at a time. Chances are that if you’ve solved one of those distorted-word tests to secure an account with Facebook, Craigslist, or Ticketmaster, you’ve helped The New York Times inch a little closer to digitizing its entire print newspaper archive from 1851 to 1980 [CNET].

The program, known as reCAPTCHA, is widely used to ensure that humans, rather than spam bots, are commenting on blogs (including some of DISCOVER’s) and signing up for free email accounts. “More web sites are adopting reCAPTCHAs each day, so the rate of transcription keeps growing,” said [lead researcher Luis] von Ahn. “More than 4 million words are being transcribed every day. It would take more than 1,500 people working 40 hours a week at a rate of 60 words a minute to match our weekly output” [Telegraph]. The service is available for free to any site.

Ahn’s lab uses two different optical character recognition (OCR) software programs to scan an old book or newspaper article and convert it into a digital, searchable file. But when the programs disagree on the reading of a word, that word is added to the reCAPTCHA database, and used as part of an anti-spam puzzle. According to a report published in the journal Science [subscription required], humans decipher such words with 99 percent accuracy.

In 2000, von Ahn helped invent the first “CAPTCHA,” which stands for “Completely Automated Public Turing test to tell Computers and Humans Apart,” with a nod to the early computer scientist Alan Turing. The new reCAPTCHA cleverly slips a useful task into what has already become a mundane Internet activity. Says Ahn: “We are demonstrating that we can take human effort — human processing power — that would otherwise be wasted and redirect it to accomplish tasks that computers cannot yet solve” [Wired News].

Last year DISCOVER saw how humans could act as artificial artificial intelligence at the Amazon Mechanical Turk, another fine example of crowdsourcing.

Image: Science/AAAS

August 14th, 2008 Tags: , ,
by Eliza Strickland in Technology | 21 comments | RSS feed | Trackback >

21 Responses to “Computers Exploit Human Brainpower to Decipher Faded Texts”

  1. Jeremiah Says:

    Um… shouldn’t that be “This aged portion of society was”? Haha.

  2. john powell Says:

    A Mental Blockage

    In the current is often found
    Unknown particles of sky and ground.
    Oft they appear as phantasms or as dreams
    Or oft illusions of what is or only seems.

    Nonetheless they do appear as real or imagined fear
    Or as unknowns, unnaturals, torments to eye and ear.
    Look what the fresh new breeze doth bring–
    With its mysterious voice, it doth sing.

    Soft on the air with voice or visual treat,
    It lays its bearing or bounty at your feet.
    Now it is yours, this new thought;
    By this new wind, it is brought.

    Up from the abyss or down from heaven,
    In a current, air now is given.
    It’s oft a creature of what we ingest
    That gives unto us this worst or best.

    Oh, the hazards of seeing or hearing
    That soon become our reasons for fearing!
    The things accepted without investigation
    Causes the brain its mental constipation.

    120205

  3. Sir Mildred Pierce Says:

    “Um? shouldn?t that be ?This aged portion of society was?? Haha.”

    Common mistake. “society” is a plurality, and as such is treated as such in the grammar. Another good example is one might say “Queen is Freddy Mercury, Brian May…” but the proper way to say it would be “Queen are Freddy Mercury, Brian May…” etc.. the brain thinks otherwise because the previous word doesn’t end in “s”, but nevertheless it’s a plurality and thus, treated as such.

  4. Sir Mildred Pierce Says:

    Or rather “This aged portion of society” as a whole is a plurality, not just “society”…

    I would like to see the famous “Roswell Memo given the treatment, as it seems previously only those biased to the answer that the memo really does talk about aliens and discs are teh only ones interpreting it.

  5. Duck Says:

    Hm, how then does the system verify if the typed-in word is correct? Wouldn’t someone have to physically write out the correct answer so the CAPTCHA would know later on if someone entered the correct word, or something else. I could just write ‘poop’ and it wouldn’t catch it.

  6. Ash Says:

    I’m all for typing inane responses to articles if it means the furthering of literacy.
    Imagine if Youtube incorporated it.

  7. @MildredPierce Says:

    Actually, that depends on whether your speaking British English or American English. In British English, collective nouns are treated as plural, “The class were…”, “The team were…”, “U2 are…”, but in American English they are treated as singular nouns.

    Furthermore, in the example above it should be “was” no matter what side of the Atlantic you’re on. The “was” refers to “this [aged] portion”, which is clearly singular because of the “this”. If the quote were “The aged portion of society…” then it would depend on B.E. vs. A.E.

    I’m guessing the quote is an archaic formulation.

  8. @Duck Says:

    The system gives the same words to multiple people. If they agree on what the word should be, then the word is accepted as correct. If some of the writers disagree, then the word is given to more people.

  9. Grimmygrim Says:

    Portion is singular so “was” would be correct. Using “was” or “were” would depend on the context (are they talking about the portion or the society). I’m leaning towards “was”.

  10. ayeroxor Says:

    “Um… shouldn’t that be “This aged portion of society was”? Haha.”

    It can be either. Haha.

  11. Jmar Says:

    I do not understand how this would work for “new words”, yet to be deciphered. Above someone suggested it sent the word to multiple people… does the first person have to wait until enough people verify? Haha. All my experence with this CAPTCHA has been instant either correct or incorrect, from my understanding it’s asking me to verify, not decipher. Am I just not getting a “new word” or what?

  12. rprebel Says:

    It sounds like CAPTCHAs, for the commenter, aren’t new words at all. When I type ’suffolk’ and ‘chiffon’ into the little box below this bigger box, I’m not helping to decipher anything. I’m placing a vote in an election that’s already been decided. They’re also annoying, but spam is moreso.

  13. Ron Delta Says:

    Wow dude, thsoe folks are pretty amazing arent they. Very smart bunch.

    RD
    www.anondo.alturl.com

  14. Fabrizio Says:

    Andrei Broder was the first to invent a CAPTCHA when at Altavista and not Luis von Ahn

  15. Hank Roberts Says:

    When all else fails, read the fine manual:

    http://recaptcha.net/learnmore.html

    “how does the system know the correct answer to the puzzle? Here’s how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.”

    See also: http://web.sbu.edu/history/tschaeper/Hist101/101wwwfbacon.html

  16. Jerome Says:

    Yes, that’s not clear to me either… if I’m deciphering the word, how does the program know what is correct?

  17. thomas Says:

    Here’s how they do it (From the website):

    “But if a computer can’t read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here’s how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.”

    Very cool idea.

  18. komatzu Says:

    @thomas: thanks for the answer!
    I think it should have been mentioned in the article.

  19. Fat Jolly Penguin Says:

    ““Um… shouldn’t that be “This aged portion of society was”? Haha.”

    It can be either. Haha.”

    Actually, it should be “was.” The subject of the sentence is “portion.”

  20. Rich Says:

    If I’d known I was helping the NYT i would have lied!

  21. Kevin Says:

    “Um? shouldn?t that be “This aged portion of society was”? Haha.”

    In most cases, since the subject of the sentence would be portion, then the correctly conjugated form would be “was” as that would agree in number with the subject. However, one thing that seems to have escaped attention would be the use of the subjunctive instead of the indicative. For example, when positing “If I were a grammar-nazi,” “were” is the correct form and not “was” even though the subject (”I”) is singular. I am not saying that this is the particular case here, but that it is a possibility…another would be that the author was bereft of grammar knowledge in the first place.

Leave a Reply