DISCOVER Magazine. Science, Technology and The Future
Current Issue
Subscribe Today »
  • Renew
  • Give a Gift
  • Archives
  • Customer Service
  • Facebook
  • Twitter
  • Newsletter
  • Health & Medicine
  • Mind & Brain
  • Technology
  • Space
  • Human Origins
  • Living World
  • Environment
  • Physics & Math
  • Video
  • Photos
  • Podcast
  • RSS
Cosmic Variance
« Quirks and Quarks: Before the Big Bang
arxiv Find: Universal Quantum Mechanics »

reCAPTCHA

by Sean Carroll

We’ve all seen CAPTCHA‘s — those distorted words that function as a cut-rate Turing test, separating humans from spambots on any number of websites.

image.jpg

This weekend I was at a Kavli Frontiers of Science meeting at the National Academies of Science office in Irvine, and one of the participants was Luis von Ahn — the guy who was responsible for inventing the CAPTCHA idea. He gave a great one-minute talk, in which he traced his personal feelings about being responsible for something that is so useful, yet so annoying.

CAPTCHA, you will not be surprised to hear, is ubiquitous. Luis figured out that the little buggers are filled out about sixty million times per day by someone on the web. So, as the inventer, he first felt a certain amount of pride at having exerted such a palpable influence on modern life. But after a bit of reflection, and multiplying sixty million times by the five seconds it might take to fill in the form, he became depressed at the enormous number of person-hours that were essentially wasted on this task.

Being a clever guy, Luis decided to make lemonade. What we have here is a huge number of people who are recognizing words that a computer can’t make out. Luis realized that there was a separate circumstance in which you would want the computer to recognize the words, even though it wasn’t quite up to the task — optical character recognition, and in particular the problem of digitizing old texts. Apparently, before the advent of the Internet, people would store information by binding together pieces of paper with words printed on them, forming compact volumes known as “books.” In the interest of preserving the products of this outmoded technology, various efforts around the world are attempting to scan in all of those books and store the results digitally. But often the text is not so clear, and the computers don’t do such a great job at translating the images into words.

sample-ocr.gif

Thus, reCAPTCHA was born. At this point you should be able to guess what it does: takes scanned images from actual books, with which optical character recognition software are struggling, and uses them as the source material for CAPTCHA’s. The project is up and running, and can be implemented anywhere the ordinary CAPTCHA’s are used. Now, when you get annoyed at having to make out those squiggly words with lines slashed through them, you can take some solace in knowing that you’re making the world a better place. Or at least saving some books from the trash bin of history.

Share

November 12th, 2007 4:29 PM
in Computing | 16 comments | RSS feed | Trackback >

16 Responses to “reCAPTCHA”

  1. 1.   Tristram Brelstaff Says:
    November 12th, 2007 at 5:10 pm

    Apparently spammers are already using a variant of this idea to automate the breaking of captchas.

  2. 2.   archgoon Says:
    November 12th, 2007 at 5:24 pm

    Ah, there seems to be a bit of missing information. How do they determine that the answer has been correctly entered? If you are using the CAPTCHA to figure out what the CAPTCHA says, how do you know that they got the right answer?

  3. 3.   archgoon Says:
    November 12th, 2007 at 5:25 pm

    Ah, that’s the reason for two. Gotcha.

  4. 4.   dibyadeep Says:
    November 12th, 2007 at 5:58 pm

    I dont really understand how it might help decipher old writing from a book. If you use these then people might as well type anything to get through, how would the computer know in the first place, what the actual word is?

  5. 5.   archgoon Says:
    November 12th, 2007 at 6:15 pm

    dibyadeep, note in the image, you’ve got two words. One word is a known, generated CAPTCHA, which ensures that the input is coming from a human, the other is an unknown word, which is the one we want to decipher. The user doesn’t know which is the CAPTCHA, and which is the generated source (actually, this only matters if the user is pathological), so answers both.

  6. 6.   archgoon Says:
    November 12th, 2007 at 6:17 pm

    Oh, and to guarantee that the user didn’t make a mistake, multiple people can be given the same unknown one for confirmation.

  7. 7.   Lab Lemming Says:
    November 12th, 2007 at 7:56 pm

    So how do we get Google or (insert favorite mega commercial blog host here) to use this?

  8. 8.   No Football « blueollie Says:
    November 12th, 2007 at 9:20 pm

    [...] when they are typeset in a particular way, whereas humans have no trouble. Cosmic Variance talks about this here. It is a short but delightful [...]

  9. 9.   Geoff Arnold » Blog Archive » reCAPTCHA Says:
    November 13th, 2007 at 9:46 am

    [...] a really cool idea: reCAPTCHA. It’s a system which “takes scanned images from actual books, with which optical [...]

  10. 10.   B Says:
    November 13th, 2007 at 1:38 pm

    Needless to say, the actual problem is spam, which is – either way you turn it – an enormous waste of time and energy. It is a mystery to me why it is still legal, given the inconveniences it causes for servers and IT staff all around the world.

  11. 11.   Moshe Says:
    November 13th, 2007 at 2:05 pm

    B., spam is illegal but enforcement is a serious issue. Look at

    http://www.newyorker.com/reporting/2007/08/06/070806fa_fact_specter

    for an interesting take on the issue.

  12. 12.   kryptos. libertas. » Blog Archive » reCAPTCHA Says:
    November 13th, 2007 at 7:14 pm

    [...] Why not make CAPTCHAs useful? [...]

  13. 13.   B Says:
    November 14th, 2007 at 11:50 am

    Hi Moshe,

    Thanks, that’s a nice article indeed. (I had no clue where the word spam comes from!) Well, I guess I’d just take all sites that are advertised in spam mails off the name servers, until they’ve proven it was a mistake. End of problem. There’s a slight chance one or the other site might temporarily be unavailable accidentally, but this seems to me like a price I’d be willing to pay. Just the mere existence of such a procedure would make a big difference.

    Best,

    B.

  14. 14.   Only Humans Allowed To Comment | Karol Krizka Says:
    November 15th, 2007 at 11:45 am

    [...] modification of it called reCAPTCHA. Sean at the Cosmic Variance blog wrote a great description of how reCAPTCHA works, but let me reiterate it in a few less words. The creator of CAPTCHA, Luis von Ahn, realized the [...]

  15. 15.   steve Says:
    November 18th, 2007 at 9:26 pm

    The idea of the reCAPTCHA is compelling.

    Yet, a problem I have with CAPTCHAS in general, is there burden to users of websites. So I’m interested is ways of increasing security, without burdening people with task such as filling in a CAPTCHA form.

    On such technique is a simple client honeypot (a spammer trap) that creates a CAPTCHA, or other form field, that is invisible to the website user. The spam bot, howevers, “sees ” and tries to fill in the honeypot field. If the invisible field is filed in, then, the website knows that its a spammer or other bot hacker.

  16. 16.   Thomas D Says:
    November 19th, 2007 at 10:06 am

    Hmm… the portion of aged text is ungrammatical. It has a singular subject (portion) and a plural verb (were).

    Otherwise, excellent idea!





    • Cosmic Variance Cosmic Variance is a group blog by people who, coincidentally or not, all happen to be physicists and astrophysicists:
      • Daniel Holz
      • JoAnne Hewett
      • John Conway
      • Julianne Dalcanton
      • Mark Trodden
      • Risa Wechsler
      • Sean Carroll
      Our day (and night) jobs notwithstanding, the blog is about whatever we find interesting — science, to be sure, but also arts, politics, culture, technology, academia, and miscellaneous trivia. We have similar outlooks on many things, widely disparate opinions about others, and will do our best to keep the discourse reasonably elevated.
    • Recent Posts

      • Guest Post: Marc Sher on the Nonprofit Textbook Movement
      • Higgs Ripples in the Koi Pond
      • Dark Matter vs. Modified Gravity: A Trialogue
      • The Case for Naturalism
      • Avengers Assemble!
      • Astronomy at the Philadelphia Science Festival
      • Wrapping Up the Semester: Fests, Workshops and Exams
      • A Universe from Nothing?
      • PhD Comics Explains the Higgs Boson
      • What Particle Are You?
      • The Particle At the End of the Universe
      • Aiming at Different Audiences
      • Puzzles!
      • Jon Stewart Doesn’t Understand How Science Works Even a Little Bit
      • Is Physics Among the Dysfunctional Sciences?
    • Recent Comments

      • Thomas on Guest Post: Marc Sher on the Nonprofit Textbook Movement
      • Brett on Guest Post: Marc Sher on the Nonprofit Textbook Movement
      • Steuard on Guest Post: Marc Sher on the Nonprofit Textbook Movement
      • max on Guest Post: Marc Sher on the Nonprofit Textbook Movement
      • Ron Seadler on Guest Post: Marc Sher on the Nonprofit Textbook Movement
      • Maia Miret on Guest Post: Marc Sher on the Nonprofit Textbook Movement
      • Sean Carroll on Guest Post: Marc Sher on the Nonprofit Textbook Movement
      • Chris on Guest Post: Marc Sher on the Nonprofit Textbook Movement
      • Marc Sher on Guest Post: Marc Sher on the Nonprofit Textbook Movement
      • macho on Guest Post: Marc Sher on the Nonprofit Textbook Movement
      • Guest Post: Marc Sher on the Open Textbook Movement – - ScienceNewsX - Science News AggregatorScienceNewsX – Science News Aggregator on Guest Post: Marc Sher on the Nonprofit Textbook Movement
      • Sean Carroll on Puzzles!
    • Facebook

    • Archives By Date

    • Archives By Category

    • Useful Pages

      • Home
      • RSS Feed
      • Comments Feed
      • About
      • Links (Blogroll)
      • Guest Bloggers
      • Equations Using LaTeX
      • Facebook page and group
      • Twitter
      • Goodies Store
      • Google Blog Search
      • Technorati Profile
      • Bloglines citations
    • Site Meter



  • Kalmbach Publishing Co.

    Copyright © 2012, Kalmbach Publishing Co.

    Privacy - Terms - Reader Services - Subscribe Today - Advertise - About Us