How to Create Codes That Even the NSA Can’t Break

By Amir Aczel | July 31, 2013 1:36 pm

NSA security lock

In a previous post I described mathematicians’ ongoing search for key properties of prime numbers. That effort may seem to belong entirely within the realm of pure mathematics; but surprisingly, the importance of primes goes far beyond the abstruse obsessions of ivory-tower mathematicians. In fact, the use of prime numbers underlies some of the most dramatic events in the news these past weeks: the story behind Edward Snowden’s revelations that the National Security Agency (NSA) is snooping on the communications of both American citizens and European diplomats.

While the Europeans have protested about their internal communications being intercepted by the NSA—ironically—the tools that one can use for protection from spying by anyone are readily accessible online, in the professional literature, and in publicly-available manuals and textbooks. These methods all rely on clever uses of prime numbers.

The essentials of these techniques are far from new. The foundations of a program to create codes so powerful that they could not be broken even if an eavesdropper were to use the entire available worldwide computing power were laid more than 35 years ago. The year 1976 saw the development of the Diffie-Hellman key exchange method (named after Whitfield Diffie and Martin Hellman; the names Ralph Merkle, James Ellis, Clifford Cocks, and Malcolm Williamson are often also associated with it); and the following, 1977, witnessed the appearance of the RSA algorithm. Both methods have advanced over the past three and a half decades, but information about their extensions is also readily available to anyone.

How do these techniques work? I will explain both methods here—necessarily in a simplified way. (Those interested in learning more can read some of the articles in the links that appear throughout this post.)

Alice sends Bob a secret message

The Diffie-Hellman key exchange idea has been described in a clear and concise way using an analogy by Terence Tao, whose work on prime numbers I mentioned in my previous post. The idea is as follows. Alice wants to send Bob a secret message (cryptographers prefer to use “from Alice to Bob” instead of the mundane “from A to B”) and she wants to prevent Eve (the “eavesdropper”) from reading it. So Alice places the message in a box, puts a good lock on it, keeps the key, and sends the package to Bob. (If Alice were to separately send Bob the key, there would be a chance that Eve could intercept both the package and the key.)

Bob has no key to Alice’s lock. So what he does instead is to put his own lock on the box. And he now sends the package back to Alice, locked twice: using both her lock and his. Alice gets the package, removes her own lock using her key, and then sends the box, still safe because it bears Bob’s lock, back to Bob. Now Bob uses his key, opens the box, and gets the message! Each person here used his or her own lock and key—and yet a message was passed perfectly safely from Alice to Bob.

The digital version

This idea is implemented digitally in the Diffie-Hellman key exchange. The message to be sent from Alice to Bob is a secret number, call it n. Alice’s “key” is an exponent, a, which she chooses, and then uses it to raise to. So the “locked box with the message” that Alice sends Bob is na. Bob has his own “key,” which is a number of his own choosing, b, that he uses as an exponent. He doesn’t know n or a, but he has na, which he got from Alice, so he raises this number to the power b. He thus sends Alice the “box with the two locks”: nab. Alice’s using her own key to open her own lock means her taking the ath root of nab, which, from the simple math of exponents, we know gives her nb, which she now sends back to Bob. Using his “key,” his exponent b, Bob takes the bth root of nb, and he thus obtains the secret number n that Alice wanted to convey to him.

Creating stronger codes with primes

It is possible to send a secret number from Alice to Bob as I just described, and if the numbers are large enough, one would have a reasonable probability that the number might not be deduced by Eve. In actuality, however, modern implementations of the Diffie-Hellman key exchange use more sophisticated elements to make it more difficult to break the code. And the secret number is not sent from Alice to Bob, but rather deduced by both of them using the formula nab (which, of course, is also equal to  nba).

Alice and Bob choose a prime number, which they assume can be known to Eve, or to anyone in the world. Let’s say that this number is 11. They then do all calculations using the mathematical multiplicative group of integers modulo 11 (like a clock going around to 12 and then starting from 1, this group starts to count again after reaching 11). They also choose a base, and let’s suppose it is the number 5. Alice then chooses her secret number, say 3. Independently, Bob chooses his secret number, 4.

Alice raises the commonly-agreed-on base of 5 to the power of her secret number 3, and does the calculation modulo 11. She gets:  53 = 125, but 125 modulo 11 is 4 (it’s the remainder of dividing 125 by 11, which gives 11 and a remainder of 4—it acts like 16 hours in a clock, but this clock is based on 11 rather than 12). She sends Bob the answer, the number 4. Recall that Bob had chosen a secret number of 4, so he raises the 4 he got from Alice to the 4th power, modulo 11, and this gives him 44 = 256, but 256 modulo 11 is 3 (because 11×23 = 253, leaving the remainder 3), which is his final answer.

Alice gets from Bob the original 5 they had both agreed on, but now raised to the power of his secret number, 4, modulo 11, which is 625 modulo 11, which is 9 (as 11×56 = 616, leaving a remainder of 9). She then raises this number to the power of her secret number of 3, again doing this calculation modulo 11. She gets the same number that Bob got, 3 (because 93 = 729, but modulo 11 it is 3, since 11×66 = 726, which leaves a remainder of 3).

Using this complicated modular arithmetic based on a prime number, but essentially raising a number to hidden powers as in the previous section, Alice and Bob establish a common secret number, in this example, 3. Modular arithmetic using prime numbers helps make the algorithm much more difficult to decipher by an eavesdropper.* In reality, the prime number is large, and so are the other numbers. When Alice and Bob use secret numbers 100 digits long, the common number jointly deduced by Alice and Bob cannot be learned by Eve even if she has access to all the world’s available computing power.

Once Alice and Bob have established a common secret number, they can use it as a key to encrypt messages from one to the other and should have a high probability that their communication will not be deciphered by an outsider.

Two keys are better than one

The year after the Diffie-Hellman algorithm was published, three academics then working at MIT—Ron Rivest, Adi Shamir, and Leonard Adelman—came up with a brilliant idea for encrypting messages. What they tried to do was to avoid the stage in which Alice and Bob must create a common secret number, since this stage slows down the communication between them.

The three MIT scientists developed the notion of a pair of keys: a public key and a private key, which are then jointly used for communicating secret messages. The public key can be published and known to all. Its use saves time. The private key is a secret that Bob keeps, allowing him to decipher coded messages from Alice (or from anyone who knows his public key). Bob publishes his public key, which is a large number. This number is obtained when he multiplies together two very large prime numbers, known only to him (they constitute his private key). When Alice wants to send Bob a secret message, she encrypts it using his known public key. But in order to decrypt the message, one would need to know Bob’s private key, which is the two prime numbers he had used to create his publicly-known key. Supposedly, only Bob can do this.

Encrypting and decrypting messages using the RSA algorithm is a complicated mathematical procedure that relies on modular arithmetic and prime numbers similarly to the way they are used in the description of the Diffie-Hellman system above. But it is more sophisticated so that it can allow deciphering using only the private key. The public key alone is useless for deciphering the RSA code.

The essential element of RSA is the fact that the public key is composed of the product of two very large unknown prime numbers. It so happens that factoring a number into its prime components is very difficult when the primes are large. (35 = 7×5, a product of two primes, is easy; but 46,324,637 = 5,881 × 7,877 is harder, and primes used in RSA encryption are much larger still.) It is this fact alone that keeps Eve in the dark. She knows the product of the two prime numbers—but she can’t easily (and hopefully not at all) deduce what the two primes are!

The RSA Challenge

Right after the RSA system was invented, Martin Gardner published in Scientific American an encrypted message and a large RSA number, with 129 digits, that was the product of two primes. He challenged his readers to break the code, offering a $100 prize. It took 17 years for the number to be factored and the message deciphered. This was a relatively short period of time—many had expected that it would take an exceedingly long time, and Rivest, Shamir, and Adelman had jested that it could take several “quadrillion years.” The complex operation was achieved using distributed computing with thousands of computers around the world performing parts of the general calculation—thus demonstrating the power of such an approach.

RSA Security, founded by the academics, has since published several similar numbers, and for a time there was a cash prize offered for their factoring into pairs of primes, which the company subsequently withdrew. By now, some of these challenges have been met by mathematicians using distributed computing. Here is one problem that is still outstanding, an RSA number with 210 digits, that has never yet been factored into two primes:

RSA-210 = 245246644900278211976517663573088018467026787678332759743414451715061600830038587216952208399332071549103626827191679864079776723243005600592035631246561218465817904100131859299619933817012149335034875870551067

Obviously, the larger the number to be factored, the longer the time needed to break it into a pair of primes. Beyond a certain length (in decimal digits), the RSA code becomes impregnable and therefore any message based on it undecipherable (in a reasonably finite length of time) by an eavesdropper. The RSA algorithm is widely used today in Internet security.

NSA’s uses and abuses of encryption

In adopting standards for encryption in the United States, and for exporting encryption products, the NSA has pushed for, and succeeded in implementing, legal limits on the size of the numbers used in RSA coding, so that—with its supercomputers—it would be able to decipher any message based on it. Presumably, the Europeans are not bound by these restrictions, and their cryptanalysts should have been able to easily devise an unbreakable RSA code (by choosing primes that are large enough) for use in routine European diplomatic communications as well as protecting their computers from hacking.

And as history has shown, supercomputers are less effective than wide-ranging worldwide distributed computing for breaking advanced codes—but by its very nature, the NSA could never employ the latter. On the other hand, the most recent revelations seem to indicate that one of the purposes of NSA searches is in fact to identify people or entities that use encryption in their communications. If so, all the more reason for the European governments to use established, Western, advanced codes, so as to set themselves apart from terrorist entities, whose codes would necessarily look different. This would actually help the NSA concentrate on identifying real threats rather than wasting resources on intercepting Brussels messages such as: “Pierre, Italian or Chinese for lunch today? Yours, Hans.”

Thus we find ourselves where we do now, in an arms race of encryption and decryption, a world in which pure mathematics plays the key role in helping invent better and better codes. As the codes become more sophisticated, so do the code-breakers, and the cycle perpetuates itself. What is so amazing is that codes that were considered absolutely unbreakable a few decades ago do become breached as the technology improves—but then again, those designing new encryption methods, on all sides, use ever more complicated math to keep a step ahead of their pursuers.

*There are two good reasons for using modular arithmetic. The first is that it acts as a many-to-one function, in the sense that many numbers, when divided by a prime, will give the same remainder—thus making Eve’s life much more complicated (she can’t uniquely reconstruct Alice and Bob’s secret numbers). Using the clock example, if she should overhear that a meeting is to take place at 1 o’clock, she couldn’t tell if it’s a.m. or p.m., or which day. The second reason is that it puts a cap on the size of numbers involved when using exponentials, since (by definition!) without modular arithmetic these numbers grow “exponentially,” and could make computations intractable.

Image courtesy Maksim Kabakou / Shutterstock

CATEGORIZED UNDER: Technology, Top Posts
  • Samer Halim

    this is awesome

  • phanmo

    Given that that pretty much all of the “established, Western, advanced” algorithms used are available online, and that there are numerous ways to get around the US restrictions (which were all but eliminated in the Clinton era anyways), why would encrypted data from “terrorist entities” look any different from US, European, or, for that matter, commercial data?

    • Amir D Aczel

      My assumption is that if a large entity such as the European Union were to establish a major “unbreakable” code for its very heavy traffic of internal diplomatic communication in Brussels and elsewhere, it would have a “signature” that would look different from something created by a small ragtag entity. While it’s true that the basic information is online and elsewhere, for one thing, it’s hard to see the latter using supercomputers or a big distributed-computing network to find very large primes and then do the very complicated coding. They may not have the technological resources of the EU.

      • phanmo

        True, but I would imagine that, rather than creating their own system, such a group would use something like GnuPG, which is relatively widely used, free, open-source, and being continuously tested. As there is already a large user base, both commercial and private, the emails in question wouldn’t look any different from those already being sent. Assuming no major break-through in quantum computing and consistently good user-level security (probably the weakest link), secure communications wouldn’t be too difficult to manage.

        • Amir D Aczel

          Yes, quantum computing would change much in this area. I’m trying to find out more about the state of the art there. Thanks

          • Amir D. Aczel

            Your comment made me think more about my policy suggestion. Here is how it can be done: The Europeans tell the U.S. (and vice versa): We don’t want you spying on us–there’s no security-related need for this. So we are going to use the toughest codes we can invent. BUT: So that you know these are our messages (and not a terrorist group or hostile nation’s messages), we will use a signature (also encrypted), to which we give you the key (it can be RSA or something else, and changed regularly as needed). This way, you know it’s us and don’t even try to break our code. Save your efforts for hostile entities. [Something similar was used in the Law of the Sea decades ago: there was a code that meant: “Friendly ship” when passing a naval group at night.]

    • Lisa Raffensperger

      Amir Aczel responds:

      My assumption is that if a large entity such as the European Union were to establish a major “unbreakable” code for its very heavy traffic of internal diplomatic communication in Brussels and elsewhere, it would have a “signature” that would look different from something created on the run by a small ragtag entity. While the basic information is online and elsewhere, for one thing, it’s hard to see the latter using supercomputers or a big distributed-computing network to find very large primes and then do the complicated coding. They may not have the resources of the EU.

  • Glen

    So if someone just happens to know the product of two large primes, just by dumb luck, they’ve broken that key? It goes into a database and is never a secure key again?

    Am I missing something ?

    • Amir D. Aczel

      Well, it’s actually close to impossible to do by “dumb luck” as you say. I should compute the probability (it’s probably smaller than being hit by lightning or winning a big lottery; but I haven’t done that computation). You see, a 120-digit number is immensely large. That was really the point of the “RSA Challenge” I wrote about. Sheer luck is of course less effective that “brute force” by computer (actually, a whole network of computers!), and it took 17 years to break the prime-code for a 129-digit number. “Trial and error” would be hitting only a small subset of the immense number of possible primes–and beyond the first 10,000 or so primes, even a list is not available. So this is really as “safe” an encryption key as technology can give us today (without quantum computing, which would be much faster, and without a resolution of the celebrated P vs. NP problem, the biggest unsolved problem in computer science, and one with a $1 million prize on it–which could change our entire understanding of what is an “unbreakable” code.) Hope this answers your question.

      • Amir D Aczel

        More: To compute the probability of getting it by “sheer luck” you’d need to use the Prime Number Theorem, which says that the number of primes up to a number, n, is roughly n/ln(n), where “ln” is natural log. So for a 120-digit n, it is the number, n, divided by 2.303 x 120, which is the 120-digit-number divided by 276, or roughly a 117-digit-number! What is one divided by this number? Something extremely, extremely close to zero; and this is the chance of hitting it in one random shot.

        • Glen

          Okay, but then what is to stop an agency like the NSA from just running supercomputers all day 24/7, 365 just randomly (or sequentially) computing the product of two primes, finding the answers (whatever creates 120 digit number), and storing all of those answers in a database.

          Eventually they’d have massive databases of all these ‘answers’, which they could sort sequentially for a quick search of their database whenever they encounter a key and want to decode it into its two primes.

          Wouldn’t they likely be building this database as we speak which could have trillions of answers in it by now. And if they are lucky, the answer to a key they run across will be right there waiting for them in their ever rapidly growing database of answers.

          For all anyone knows, their secure key answer is already in a database somewhere and thus no longer secure to whatever entity has it.

          I still feel like I’m missing something regarding why this is truly as secure as its claimed to be when its based on something for which the answers can be calculated well in advance of ever needing it.

          • Glen

            So I guess what I’m thinking is, while the public key is the product of two large primes, and it alone is useless for decoding the message, shouldn’t the private key be MORE than just the answer (the two primes)?

            Does the private key and the decryption algorithm use the two primes and is further factored with the users secret passkey and perhaps another large randomly generated number?

            I feel like you should need more than just the answer (which two primes is this public key comprised of) in order to decode the encrypted message, if it’s truly going to be secure.

          • Amir D Aczel

            The math of why it works (public key + private key) is very complicated. Simon Singh, who wrote a book on coding (“The Code Book”) has a technical appendix on it, but at the end of that appendix he still doesn’t get to the heart of how the interplay between the two keys really works. One of the links above does that. Basically, there are some deep properties in abstract algebra that make it work.
            As for your idea about the NSA: Yes, we don’t know what they actually know or don’t, but it’s a matter of resources: Are they going to keep their supercomputers running 24/7, as you say, just to break a code of an ally (the EU)? Or would the same resources be better spent tracking communications by “rogue nations” or terror groups? So, by the end of the day, it’s really a resource-allocation problem.

  • Glen

    Seriously, how hard is it just for the NSA to get your private key via typical security flaws that permit hacking and then simultaneously get your password via a key logger?

    Seems these keys become little more than a false sense of security.

  • Valdis Kletnieks

    Glen: The problem with “does the NSA leave the supercomputers going 24/7” is that if you use a sufficiently long key (several hundred digits long), you’re looking at “all the computers on the planet running 24/7 for anywere from several centuries to several million years”.

    And that’s *per key*, You want to break a second key, you get to start crunching for another several centuries…

    Similarly, you can’t make a list of all the key pairs and look it up – there’s enough pairs of primes with 500 digits or so that you’re looking at needing somewhere around “all the atoms in the solar system” to store the resulting list…

  • Richard Economy


  • Roger Williams, III

    “both of them using the formula nab (which, of course, is also equal to nba).’

    What about the non-commutative case where abba?

  • Jim Donaught

    According to the prime number theorem the number of primes less than n is approximately n/ln(n)

    The largest known prime, as of mid-2013, is (2^57,885,161) −1; the number of primes less than this is correspondingly enormous . . . well over 2^57,000,000 . Now square that number, to get the number of possible products of two primes: 2^114,000,000. And that’s based only on primes up to the largest prime known so far.

    Nobody is going to be storing that list, let alone sorting or searching it.

  • David Gregory Kerr

    Here is a crypto message


    see if you can crack it.

    • Trump 2011

      It took me 3 years but I’ve finally cracked it. It says “I am a gay homotexual”.

  • bobirjon

    thisis awesome


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

About Amir Aczel

Amir D. Aczel studied mathematics and physics at the University of California at Berkeley, where he was fortunate to meet quantum pioneer Werner Heisenberg. He also holds a Ph.D. in mathematical statistics. Aczel is a Guggenheim Fellow, a Sloan Foundation Fellow, and was a visiting scholar at Harvard in 2005-2007. He is the author of 18 critically acclaimed books on mathematics and science, several of which have been international bestsellers, including Fermat's Last Theorem, which was nominated for a Los Angeles Times Book Award in 1996 and translated into 31 languages. In his latest book, "Why Science Does Not Disprove God," Aczel takes issue with cosmologist Lawrence M. Krauss's theory that the universe emerged out of sheer "nothingness," countering the arguments using results from physics, cosmology, and the abstract mathematics of set theory.


See More

Collapse bottom bar