DNA Data Storage Moves Beyond Moore’s Law

By Nathaniel Scharping | April 8, 2016 5:19 pm

The drop of pink solution in this pipette, which contains millions of DNA molecules, could store 10,000 gigabytes of data. (Credit: Tara Brown Photography/ University of Washington)

Over the past few decades, it has become apparent that Moore’s Law has started to come apart. The 1965 observation, named after Gordon E. Moore, stated that the number of components on a chip seemed to double every year, but we are reaching the limit of silicon’s storage capabilities.

To keep pushing the boundaries of computing technology, we’ll need to rethink the basic components of computers themselves. And the field of DNA storage could offer a solution to a problem growing ever more apparent in our digital world: Where do we store billions of gigabytes of data that make up the Internet?

“A large part of building better computers is about finding better materials to build computers with,” says Luis Ceze, an associate professor in the Computer Science Department at the University of Washington. “So, silicon happens to be a fantastic material, but it’s reaching a point where it’s unclear that we can continue pushing forward with silicon. So I find it fascinating that biology has evolved many molecules that are useful for building better computers in the future.”

Beyond Silicon

Current archival facilities, such as the data storage center Facebook recently built in Oregon, occupy entire warehouses and can store about an exabyte — 1 billion gigabytes of data — at a maximum. That’s just a fraction of the entire internet, which is forecast to reach 16 zettabytes, or 16,000 exabytes, by 2017.

By encoding information using DNA, the blueprint for life on Earth, researchers say that they could take all of that information and fit it in your living room. By taking bits of information and translating them from the 1s and 0s on a computer chip into the four letters of DNA, scientists can create strands of DNA that encode for anything you like, from a Taylor Swift song to the Library of Congress.

To accomplish this, researchers build an index that links the four nucleotides that make up DNA (A,T,C and G) to the strings of 1s and 0s we already use on our computers. A DNA synthesizer creates short strands of DNA that each hold a part of a file’s code. Once all of the information has been converted to DNA, the information can be stored and retrieved by a DNA sequencer that reads combinations of nucleotides.

A Better Way to Encode DNA

Ceze is part of a team of researchers at the University of Washington that has developed a new method of encoding and reading information stored in synthetic DNA. They looked to a widely used audio compression tool called the Huffman code, which is a way to express strings of binary code in a shorter way.

He says that their method allows for even greater storage capacity by reducing redundancies — the process of making multiple identical strands to account for errors — and allows individual pieces of the data to be read without sequencing all of the DNA stored, something that had not previously been done. The method includes unique “primers” in individual strands of DNA that can be targeted during the sequencing process to highlight a particular strand. They say that this improves functionality of their system, eliminating the need to sequence the entire database just to read a single strand.

As a proof-of-concept, the team encoded the information for several image files in synthetic DNA and successfully sequenced the strands to redraw the pictures. While they only encoded several megabytes of information, Ceze says that the process could be scaled up to hold much larger databases.

“If we compare flash to DNA in terms of density, or the number of bits in a certain volume, DNA will be at least a billion times denser. You can put an exabyte in a cubic inch, which would be a few sugar cubes,” says Ceze.


The three images the researchers turned into DNA, and then back into pictures. (Credit: Bornholt et. al/University of Washington)

Ceze emphasizes that synthesizing DNA to store data is not related to genetic engineering. Instead of attempting to put together the right strands of DNA to create an organism, their method is entirely synthetic.

DNA Computers

Storing data in strand of DNA has one significant drawback: it’s slow. Unlike computer chips, which communicate at nearly the speed of light using electrons, DNA data storage relies on physically moving molecules around.

For this reason, we shouldn’t expect to see DNA hard drives at your local computer store in the near future, Ceze says. Instead, he envisions using DNA data storage to preserve massive data archives, such as those used by Facebook and cloud storage services, where speed is not as crucial. The technology also remains expensive. But, even compared to five years ago, prices have dropped precipitously, according to Ceze. He’s looking forward to further reductions in the cost of synthesizing and sequencing DNA, which would heighten the feasibility of DNA data banks.

“Computers were pretty expensive a while ago, and then they got cheaper because there was a demand for them that dropped the price. So now that DNA storage is creating even more demand [for DNA synthesis and sequencing] beyond the biomedical industry, that will push the price down,” says Ceze.

CATEGORIZED UNDER: Technology, top posts
  • http://phatsonic.de b_i_d

    OK, I haven’t read the paper yet, but judging from this article, the somewhat good storage capacity stems just from having 4 states per bit, combined with the not particularly new Huffman algorythm. It’s a cool experiment, but selling it as a data storage mechanism feels like grabbing for straws. I’m more looking forward to light-based storage systems or even electron-based ones, that work 3-dimensional.

  • http://www.mazepath.com/uncleal/qz4.htm Uncle Al

    Tell us about write and read speeds, storage and retrieval speeds – and if storage is degraded by a read. Yer gonna use DNA! I’m gonna use lithium chloride. Lithium is Li-6 and Li-7, Chlorine is Cl-35 and Cl-37. An exabyte is 10^24 bytes.

    LiCl is mean density 2.07 g/cm^3 and 42.394 g/mol. 2 bits/formula unit. Given a 42.394 g crystal cube of edge length 2.736 centimeters, 1.25 inch^3 stores (2)(6.022×10^23) equals 1.2×10^24 bits, an exabit (the 0.2 is error correction coding), in a robust 605 °C melting point lattice.

    A paired nucleotide triphosphate is two bits (AT, TA, CG, GC each rung). One DNA unhydrated bit then has average MW = 499.5 vs two LiCl bits at formula weight 42.394 The DNA storage density stated is hugely too high. Begin with a degradation factor of (2)(499.5)/(42.304) equals 23.6, then degrade further with magnesium coordination of the phosphates, overall massive hydration (crappy packing density), and error correction coding. They lied, big time.

    Both proposals are beyond ridiculous. Mine is much cheaper, faster, more reliable, more compact, and more secure against intrusion. If my vicious idiot government comes knocking with a subpoena, turn on the fire sprinklers. 83.5 g LiCl dissolves in 100 g of water at 20 °C.

    • polzzlop

      “Lithium is Li-6 and Li-7, Chlorine is Cl-35 and Cl-37 … 2 bits/formula unit”
      Why not 4 bits?

      • http://www.mazepath.com/uncleal/qz4.htm Uncle Al

        A binary digit, bit, has two values: (1,0) (+/-) (on/off) (high,low); (6,7) (35/37). LiCl is then two independent bits to obtain (6,35) (6,37) (7,35) (7,37). However, (6,6) (7,7) (35,35) (37,37) do not obtain. LiCl is a cubic lattice. Each Li is surrounded by six Cl, each Cl is surrounded by six Li, octahedral coordination (rock salt structure, face-centered cubic lattice, space group Fm-3m, #225). Index an origin to read the isotope pattern (perhaps a sodium ion).

        If you have big brass clangers, propose adding F-centers to obtain three chlorine lattice-position states – 35, 37, vacancy plus unpaired electron. 3.14 ev is 395 nm, thus appearing violet. doi:10.1103/PhysRev.137.A1814

    • OWilson

      Even if hard wired bathroom servers became obsolete, I doubt FOE request retrieval speeds would be significantly reduced :)

    • bmatic

      how would you write data in a crystal of LiCl?

  • kvineyard

    Newsflash: Los Angeles, April 2020 – A Taylor Swift song encoded to DNA has become sentient and has begun murdering ex-boyfriends. John Mayer and Joe Jonas remain missing.


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!


See More

Collapse bottom bar