The way bubbles are filled in encodes quite a bit of identifying information
What’s the News: Standardized tests aren’t as impersonal as you might think. Much as detectives analyze a note’s handwriting to pinpoint its author, scientists have developed a way to identify test-takers, voters, and so on just from the way they fill in bubbles.
How the Heck:
- The researchers (from Princeton’s Center for Information Technology Policy) used a set of 92 surveys of 20 questions each to train and test their computer program.
- After setting aside eight questions from each survey, they analyzed the remaining 12 to determine the distinctive characteristics of each individual’s bubbling style. Maybe they tend to fill bubbles with a squiggle, or a series of diagonal strokes that point to the right or left, but whatever their quirks, the program learned to identify individual test takers. Its specifications are quite detailed—it draws on 804 different features concerning color and shape of the mark.
- To test its abilities, the team then sicced the program on the eight questions it hadn’t seen during its training. If the bubbles had been filled in with random patterns, it would have given the correct answer only one out of 92 times. But it returned the right test-taker 51% of the time, and 75% of the time, the correct answer was in its top three choices.
What’s the Context:
- Fill-in-the-bubble forms are used in tests and in elections, among other settings. This program represents a rare way to tell whether cheating has occurred on a standardized test: “Imagine that a student takes a standardized test, performs poorly, and pays someone to repeat the test on his behalf. Comparing the bubble marks on both answer sheets could provide evidence of such cheating. A similar approach could detect third-party modification of certain answers on a single test,” says coauthor Will Clarkson in a blog post.
- Of course, when it comes to election ballots, the potential uses lean toward the dastardly. While the program could be used to detect fraudulent absentee ballots, it could also be used to violate anonymity and reveal how an individual voted, which Clarkson points out is a worry in areas where scanned images of ballots are released, like Humboldt County in California. Election officials should carefully weigh the costs and benefits of releasing such information, given that this kind of analysis is possible.
- The senior author on the paper, Ed Felten, is an influential internet security researcher and chief technologist of the US Federal Trade Commission. He’s known for breaking the watermarks used for digital rights management (aka DRM) by music companies and for pointing out various flaws (sometimes very, very large ones) in the security techniques used by companies like Sony.
The Future Holds:
- Findings like this are an important reminder that perfect anonymity is more elusive than it seems. The researchers will present their work (which you can read in its entirety here [pdf]) at the 2011 USENIX Security Symposium in San Francisco this August.
- Ah, but can you disguise your bubble identity, or can forms to be changed to make identification more difficult? The researchers were hoping you’d ask that—they have a whole section in their paper dedicated to it. Using ink stamps, as some areas do, removes the possibility of making identifiable markings, and being careful not to color outside the lines mitigates it too.
Image credit: Will Clarkson