Life’s deliberate typos

By Ed Yong | May 19, 2011 2:00 pm

Within your body, a huge amount of information is copied over and over again, reliably and predictably. Your life depends on it. Typos occur, but they are quickly corrected. Edits are made, but sparingly. Or, at least, that’s what we thought.

It starts with DNA. This famous molecule is a chain of four ‘bases’, denoted by the letters A, C, G and T. These four letters, in various combinations, contain instructions for building thousands of proteins, a workforce of molecular machines that keep you alive and well. But first, DNA has to be copied (or “transcribed”) into a related molecule called RNA. It too is made of four bases: A, C and G reprise their roles, but U stands in for T. Each triplet of letters in RNA denotes a different amino acid, the building blocks of proteins. Small factories read along the RNA like a piece of tickertape, using it to string together amino acids in the right sequence.

So DNA leads to RNA leads to proteins – this is the grandiosely-named “central dogma of life”.

People often assume that this flow of information happens with exacting precision. Every stretch of RNA should be a perfect match for the piece of DNA it is copied from. Take a piece of DNA, and you could predict the exact string of letters in its corresponding piece of RNA, and the amino acids of the resulting protein.

But that’s not always the case.

Typos creep into the transcripts. Some of these are genuine errors where the wrong letter is put in place – proofreading proteins usually fix these mistakes. Other typos are deliberate edits – for example, proteins called deaminases will often convert some As into Gs, and (more rarely) some Cs into Us.

Now, Mingyao Li and Isabel Wang from the University of Pennsylvania School of Medicine have found that these typos go far beyond the edits that we knew about.

Li and Wang studied white blood cells from 27 unrelated people, and looked at both their DNA and RNA sequences. They found over 10,000 places across the genome where the molecules didn’t match up, spread over a third of all our genes.  Some of these looked like the type of RNA edits that scientists already knew about, but around half of them were clearly something new. Li and Wang called these changes ‘RDDs’, short for RNA-DNA differences.

The duo took great care to make sure that the RDDs weren’t just the result of errors in their own sequencing methods. So they asked different laboratories to prepare and sequence the samples. They focused on parts of the genome that they had scanned several times over, and where the DNA letter is the same from person to person. And they used cells from people whose DNA had already been sequenced as part of two big genetics initiatives – the International HapMap Project and the 1000 Genomes Project. These existing sequences matched those that Li and Wang produced afresh.

The RDDs weren’t just random errors. Every one of them showed up in at least two people and 80% showed up in half of the sample. They were there in infants and adults. They were there in people outside the original group of 27. They were there in other types of cells – neurons, skin cells, embryonic stem cells, cancer cells.  And they were always the same at any given site, even in different people – if an T in DNA becomes a G in RNA, then it always becomes a G rather than an A or C. There must be some sort of guide that determines which DNA letters are edited and what they’re edited to.

These typo-ridden molecules co-exist with those that more accurately reflect the DNA they were copied from. At any given RDD, around 20% of the RNA sequences differ from their corresponding DNA, while the rest are accurate matches. But that’s an average figure – at some sites, Li and Wang found RDDs in nearly every RNA sequence they examined.

These typos carry over into proteins. Li and Wang found several proteins whose amino acids correspond to the altered RNA sequence rather than the underlying DNA one. Around a third of the RDDs lead to a different amino acid, but about one in a hundred change the size of the protein altogether. For example, one RDD in the gene RPL28 lengthens the resulting protein by 55 amino acids.

For now, Li and Wang don’t know how the RDDs are produced. Are the different letters slipped in as the RNA strand is assembled, or is the strand edited afterwards? What determines which letter is substituted at a given site? And perhaps most importantly, what do they do? Do they affect our behaviour, our development, our physical features, or our risk of disease?

To answer these questions, Li and Wang argue that as well as studying the genome – the sum of our DNA – we need to pay equal attention to the transcriptome – the sum of our RNA. So far, DNA has hogged the limelight; for example, we have poured millions of dollars into scouring our genome for DNA variants that affect our risk of disease. But DNA is the tip of the iceberg. Identical pieces of DNA can be transcribed and edited into subtly different strands of RNA, which can produce very different proteins. These other layers of diversity are now being uncovered.

The wave of next-generation sequencing technology has certainly helped, according to George Church, a pioneer of genomic sequencing. As our tools have become more powerful, our knowledge has grown deeper. “We are seeing a huge uptick in observations of modified bases,” says Church. “These are very exciting times to be studying -omes.”

Reference: Li, Wang, Li, Bruzel, Richards, Toung & Cheung. 2011. Widespread RNA and DNA Sequence Differences in the Human Transcriptome.

UPDATE: Interesting reactions to this paper are appearing on Twitter and around the web. I’m collecting them on this Storify:
[View the story “RNA-DNA differences” on Storify]

CATEGORIZED UNDER: Genetics, Genomics, Molecular biology

Comments (9)

  1. Oh heavens, this is just so cool! (I say, looking up from a heavy textbook on molecular biology in preparation fr tomorrow’s exam on the same topic.) Just shows there will be work left for me, by the time I graduate! 😀

  2. Fascinating stuff – extremely well written piece again Ed.

  3. I just finished reading The Selfish Gene earlier today – seeing this reminds me of how much science has progressed since 1976!

  4. Robert S-R

    Neat stuff! I knew DNA transcription made errors (rarely) but never thought it could be so common, or have a harmless outcome.

    Do the proteins made by RDDs have actual uses in the body or cell? Are the changes beneficial, neutral, or harmful? Or, in other words, are they necessary or unnecessary?

    (If it sounds like I’ve asked the same question three times, I think it’s because I’m not sure how to ask it.)

  5. I was a little disappointed in the BBC article about this. It’s really interesting if it’s borne out, but the general point about the transcriptome being of major importance is a good one. It is still an RNA world! The “Central Dogma” leads many people into thinking along a DNA-first line, but in reality all the interesting stuff in the cell (well, a lot of it) is taking place at the insanely dynamic and poorly-understood level of RNA. This is the big challenge facing biology, and (I think) is likely to be the answer to how life evolved in the first place.

  6. amphiox

    re #5;

    DNA is actually kind of boring if you think about it. It’s just the information repository, the physical substance of the books of a library. Nothing happens until the books are read. And the really interesting stuff is all wrapped up in the how, when, where, and who of the reading.

  7. jdmimic

    So what will all this mean for all the work using reverse transcriptase? Is it possible then that the DNA created from RNA in this way could actually be transcribed into a different RNA strand? If so, how will this affect the accuracy of our sequence knowledge?

  8. First @jdmimic – GREAT comment!

    Second – When I read articles like this I can’t help but think that it’s going to be a really, really long time before we figure out how our genome works. It’s like every time we figure something out, two or three more surprises turn up that make us have to reevaluate some major genetic paradigm that came before it. I know we’ll figure it all out eventually, but man, what a puzzle!

  9. CK

    Well written but check the accuracy of this: –

    “This famous molecule is a chain of four ‘bases’, denoted by the letters A, C, G and T. ”

    Actually DNA (and RNA) are composed of nucleotides; the nitrogenous base is just one group of a nucleotide molecule, there is also a five-carbon sugar group and a phosphate group(s). Definitely not a chain of bases.

    With regards to the study, it seems to have generated a lot of excitement though I’m not sure it’s wholly warranted. Seems to me that this paper is ultimately about RNA polymerase fidelity. That is to say specific genomic loci seem to have properties (presumably conformational) that increase the probability of a particular type of mis-transcription occuring (e.g. AA to AC). The paper presents zero evidence that there is anything deliberate about this. We already knew that polymerase enzymes do not operate with perfect fidelity and this paper seems to tell us that this is not wholly random (not surprising really).


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Not Exactly Rocket Science

Dive into the awe-inspiring, beautiful and quirky world of science news with award-winning writer Ed Yong. No previous experience required.

See More

Collapse bottom bar