ENCODE: the rough guide to the human genome

By Ed Yong | September 5, 2012 1:00 pm

Back in 2001, the Human Genome Project gave us a nigh-complete readout of our DNA. Somehow, those As, Gs, Cs, and Ts contained the full instructions for making one of us, but they were hardly a simple blueprint or recipe book. The genome was there, but we had little idea about how it was used, controlled or organised, much less how it led to a living, breathing human.

That gap has just got a little smaller. A massive international project called ENCODE – the Encyclopedia Of DNA Elements – has moved us from “Here’s the genome” towards “Here’s what the genome does”. Over the last 10 years, an international team of 442 scientists have assailed 147 different types of cells with 24 types of experiments. Their goal: catalogue every letter (nucleotide) within the genome that does something. The results are published today in 30 papers across three different journals, and more.

For years, we’ve known that only 1.5 percent of the genome actually contains instructions for making proteins, the molecular workhorses of our cells. But ENCODE has shown that the rest of the genome – the non-coding majority – is still rife with “functional elements”. That is, it’s doing something.

It contains docking sites where proteins can stick and switch genes on or off. Or it is read and ‘transcribed’ into molecules of RNA. Or it controls whether nearby genes are transcribed (promoters; more than 70,000 of these). Or it influences the activity of other genes, sometimes across great distances (enhancers; more than 400,000 of these). Or it affects how DNA is folded and packaged. Something.

According to ENCODE’s analysis, 80 percent of the genome has a “biochemical function”. More on exactly what this means later, but the key point is: It’s not “junk”. Scientists have long recognised that some non-coding DNA has a function, and more and more solid examples have come to light [edited for clarity – Ed]. But, many maintained that much of these sequences were, indeed, junk. ENCODE says otherwise. “Almost every nucleotide is associated with a function of some sort or another, and we now know where they are, what binds to them, what their associations are, and more,” says Tom Gingeras, one of the study’s many senior scientists.

And what’s in the remaining 20 percent? Possibly not junk either, according to Ewan Birney, the project’s Lead Analysis Coordinator and self-described “cat-herder-in-chief”. He explains that ENCODE only (!) looked at 147 types of cells, and the human body has a few thousand. A given part of the genome might control a gene in one cell type, but not others. If every cell is included, functions may emerge for the phantom proportion. “It’s likely that 80 percent will go to 100 percent,” says Birney. “We don’t really have any large chunks of redundant DNA. This metaphor of junk isn’t that useful.”

That the genome is complex will come as no surprise to scientists, but ENCODE does two fresh things: it catalogues the DNA elements for scientists to pore over; and it reveals just how many there are. “The genome is no longer an empty vastness – it is densely packed with peaks and wiggles of biochemical activity,” says Shyam Prabhakar from the Genome Institute of Singapore. “There are nuggets for everyone here. No matter which piece of the genome we happen to be studying in any particular project, we will benefit from looking up the corresponding ENCODE tracks.”

There are many implications, from redefining what a “gene” is, to providing new clues about diseases, to piecing together how the genome works in three dimensions. “It has fundamentally changed my view of our genome. It’s like a jungle in there. It’s full of things doing stuff,” says Birney. “You look at it and go: “What is going on? Does one really need to make all these pieces of RNA? It feels verdant with activity but one struggles to find the logic for it.

Think of the human genome as a city. The basic layout, tallest buildings and most famous sights are visible from a distance. That’s where we got to in 2001. Now, we’ve zoomed in. We can see the players that make the city tick: the cleaners and security guards who maintain the buildings, the sewers and power lines connecting distant parts, the police and politicians who oversee the rest. That’s where we are now: a comprehensive 3-D portrait of a dynamic, changing entity, rather than a static, 2-D map.

And just as London is not New York, different types of cells rely on different DNA elements. For example, of the roughly 3 million locations where proteins stick to DNA, just 3,700 are commonly used in every cell examined. Liver cells, skin cells, neurons, embryonic stem cells… all of them use different suites of switches to control their lives. Again, we knew this would be so. Again, it’s the scale and the comprehensiveness that matter.

“This is an important milestone,” says George Church, a geneticist at the Harvard Medical School. His only gripe is that ENCODE’s cells lines came from different people, so it’s hard to say if differences between cells are consistent differences, or simply reflect the genetics of their owners. Birney explains that in other studies, the differences between cells were greater than the differences between people, but Church still wants to see ENCODE’s analyses repeated with several types of cell from a small group of people, healthy and diseased. That should be possible since “the cost of some of these [tests] has dropped a million-fold,” he says.

The next phase is to find out how these players interact with one another. What does the 80 percent do (if, genuinely, anything)? If it does something, does it do something important? Does it change something tangible, like a part of our body, or our risk of disease? If it changes, does evolution care?

[Update 07/09 23:00 Indeed, to many scientists, these are the questions that matter, and ones that ENCODE has dodged through a liberal definition of “functional”. That, say the critics, critically weakens its claims of having found a genome rife with activity. Most of the ENCODE’s “functional elements” are little more than sequences being transcribed to RNA, with little heed to their physiological or evolutionary importance. These include repetitive remains of genetic parasites that have copied themselves ad infinitum, the corpses of dead and once-useful genes, and more.

To include all such sequences within the bracket of “functional” sets a very low bar. Michael Eisen from the Howard Hughes Medical Institute said that ENCODE’s definition as a “meaningless measure of functional significance” and Leonid Kruglyak from Princeton University noted that it’s “barely more interesting” than saying that a sequence gets copied (which all of them are). To put it more simply: our genomic city’s got lots of new players in it, but they may largely be bums.

This debate is unlikely to quieten any time soon, although some of the heaviest critics of ENCODE’s “junk” DNA conclusions have still praised its nature as a genomic parts list. For example, T. Ryan Gregory from Guelph University contrasts their discussions on junk DNA to a classic paper from 1972, and concludes that they are “far less sophisticated than what was found in the literature decades ago.” But he also says that ENCODE provides “the most detailed overview of genome elements we’ve ever seen and will surely lead to a flood of interesting research for many years to come.” And Michael White from the Washington University in St. Louis said that the project had achieved “an impressive level of consistency and quality for such a large consortium.” He added, “Whatever else you might want to say about the idea of ENCODE, you cannot say that ENCODE was poorly executed.” ]

Where will it lead us? It’s easy to get carried away, and ENCODE’s scientists seem wary of the hype-and-backlash cycle that befell the Human Genome Project. Much was promised at its unveiling, by both the media and the scientists involved, including medical breakthroughs and a clearer understanding of our humanity. The ENCODE team is being more cautious. “This idea that it will lead to new treatments for cancer or provide answers that were previously unknown is at least partially true,” says Gingeras, “but the degree to which it will successfully address those issues is unknown.

“We are the most complex things we know about. It’s not surprising that the manual is huge,” says Birney. “I think it’s going to take this century to fill in all the details. That full reconciliation is going to be this century’s science.”

Find out more about ENCODE:

So… how much is “functional” again?

So, that 80 percent figure… Let’s build up to it.

We know that 1.5 percent of the genome codes for proteins. That much is clearly functional and we’ve known that for a while. ENCODE also looked for places in the genome where proteins stick to DNA – sites where, most likely, the proteins are switching a gene on or off. They found 4 million such switches, which together account for 8.5 percent of the genome.* (Birney: “You can’t move for switches.”) That’s already higher than anyone was expecting, and it sets a pretty conservative lower bound for the part of the genome that definitively does something.

In fact, because ENCODE hasn’t looked at every possible type of cell or every possible protein that sticks to DNA, this figure is almost certainly too low. Birney’s estimate is that it’s out by half. This means that the total proportion of the genome that either creates a protein or sticks to one, is around 20 percent.

To get from 20 to 80 percent, we include all the other elements that ENCODE looked for – not just the sequences that have proteins latched onto them, but those that affects how DNA is packaged and those that are transcribed at all. Birney says, “[That figure] best coveys the difference between a genome made mostly of dead wood and one that is alive with activity.” [Update 5/9/12 23:00: For Birney’s own, very measured, take on this, check out his post. ]

That 80 percent covers many classes of sequence that were thought to be essentially functionless. These include introns – the parts of a gene that are cut out at the RNA stage, and don’t contribute to a protein’s manufacture. “The idea that introns are definitely deadweight isn’t true,” says Birney. The same could be said for our many repetitive sequences: small chunks of DNA that have the ability to copy themselves, and are found in large, recurring chains. These are typically viewed as parasites, which duplicate themselves at the expense of the rest of the genome. Or are they?

The youngest of these sequences – those that have copied themselves only recently in our history – still pose a problem for ENCODE. But many of the older ones, the genomic veterans, fall within the “functional” category. Some contain sequences where proteins can bind, and influence the activity of nearby genes. Perhaps their spread across the genome represents not the invasion of a parasite, but a way of spreading control. “These parasites can be subverted sometimes,” says Birney.

He expects that many skeptics will argue about the 80 percent figure, and the definition of “functional”. But he says, “No matter how you cut it, we’ve got to get used to the fact that there’s a lot more going on with the genome than we knew.”

[Update 07/09 23:00 Birney was right about the scepticism. Gregory says, “80 percent is the figure only if your definition is so loose as to be all but meaningless.” Larry Moran from the University of Toronto adds, “Functional” simply means a little bit of DNA that’s been identified in an assay of some sort or another. That’s a remarkably silly definition of function and if you’re using it to discount junk DNA it’s downright disingenuous.”

This is the main criticism of ENCODE thus far, repeated across many blogs and touched on in the opening section of this post. There are other concerns. For example, White notes that many DNA-binding proteins recognise short sequences that crop up all over the genome just by chance. The upshot is that you’d expect many of the elements that ENCODE identified if you just wrote out a random string of As, Gs, Cs, and Ts. “I’ve spent the summer testing a lot of random DNA,” he tweeted. “It’s not hard to make it do something biochemically interesting.”

Gregory asks why, if ENCODE is right and our genome is full of functional elements, does an onion have around five times as much non-coding DNA as we do? Or why pufferfishes can get by with just a tenth as much? Birney says the onion test is silly. While many genomes have a tight grip upon their repetitive jumping DNA, many plants seem to have relaxed that control. Consequently, their genomes have bloated in size (bolstered by the occasional mass doubling). “It’s almost as if the genome throws in the towel and goes: Oh sod it, just replicate everywhere.” Conversely, the pufferfish has maintained an incredibly tight rein upon its jumping sequences. “Its genome management is pretty much perfect,” says Birney. Hence: the smaller genome.

But Gregory thinks that these answers are a dodge. “I would still like Birney to answer the question. How is it that humans “need” 100% of their non-coding DNA, but a pufferfish does fine with 1/10 as much [and] a salamander has at least 4 times as much?” [I think Birney is writing a post on this, so expect more updates as they happen, and this post to balloon to onion proportions].]

[Update 07/09/12 11:00: The ENCODE reactions have come thick and fast, and Brendan Maher has written the best summary of them. I’m not going to duplicate his sterling efforts. Head over to Nature’s blog for more.]

* (A cool aside: John Stamatoyannopoulos from the University of Washington mapped these protein-DNA contacts by looking for “footprints” where the presence of a protein shields the underlying DNA from a “DNase” enzyme that would otherwise slice through it. The resolution is incredible! Stamatoyannopoulos could “see” every nucleotide that’s touched by a protein – not just a footprint, but each of its toes too. Joe Ecker from the Salk Institute thinks we should be eventually able to “dynamically footprint a cellular response”. That is, expose a cell to something—maybe a hormone or a toxin—and check its footprints over time. You can cross-reference those sites to the ENCODE database, and reconstruct what’s going on in the cell just by “watching” the shadows of proteins as they descend and lift off.)

Find out more about ENCODE:

Redefining the gene

The simplistic view of a gene is that it’s a stretch of DNA that is transcribed to make a protein. But each gene can be transcribed in different ways, and the transcripts overlap with one another. They’re like choose-your-own-adventure books: you can read them in different orders, start and finish at different points, and leave out chunks altogether.

Fair enough: We can say that the “gene” starts at the start of the first transcript, and ends at the end of the final transcript. But ENCODE’s data complicates this definition. There are a lot of transcripts, probably more than anyone had realised, and some connect two previously unconnected genes. The boundaries for those genes widen, and the gaps between them shrink or disappear.

Gingeras says that this “intergenic” space has shrunk by a factor of four. “A region that was once called Gene X is now melded to Gene Y.” Imagine discovering that every book in the library has a secret appendix, that’s also the foreword of the book next to it.

These bleeding boundaries seem familiar. Bacteria have them: Their genes are cramped together in a miracle of effective organisation, packing in as much information as possible into a tiny genome. Viruses epitomise such genetic economy even better. I suggested that comparison to Gingeras. “Exactly!” he said. “Nature never relinquished that strategy.”

Bacteria and viruses can get away with smooshing their protein-encoding genes together. But not only do we have more proteins, but we also need a vast array of sequences to control when, where and how they are deployed. Those elements need space too. Ignore them, and it looks like we have a flabby genome with sequence to spare. Understand them, and our own brand of economical packaging becomes clear. (However, Birney adds, “In bacteria and viruses, it’s all elegant and efficient. At the moment, our genome just seems really, really messy. There’s this much higher density of stuff, but for me, emotionally it doesn’t have that elegance when we see in a bacterial genome.“)

Given these blurred boundaries, Gingeras thinks that it no longer makes sense to think of a gene as a specific point in the genome, or as its basic unit. Instead, that honour falls to the transcript, made of RNA rather than DNA.  “The atom of the genome is the transcript,” says Gingeras. “They are the basic unit that’s affected by mutation and selection.” A “gene” then becomes a collection of transcripts, united by some common factor.

There’s something poetic about this. Our view of the genome has long been focused on DNA. It’s the thing the genome project was deciphering. It is converted into RNA, giving it a more fundamental flavour. But out of those two molecules, RNA arrived on the planet first. It was copying itself and evolving long before DNA came on the scene. “These studies are pointing us back in that direction,” says Gingeras. They recognise RNA’s role, not as simply an intermediary between DNA and proteins, but something more primary.

Find out more about ENCODE:

What about diseases?

For the last decade, geneticists have run a seemingly endless stream of “genome-wide association studies” (GWAS), attempting to understand the genetic basis of disease. They have thrown up a long list of SNPs – variants at specific DNA letters—that correlate with the risk of different conditions.

The ENCODE team have mapped all of these to their data. They found that just 12 percent of the SNPs lie within protein-coding areas. They also showed that compared to random SNPs, the disease-associated ones are 60 percent more likely to lie within functional, non-coding regions, especially in promoters and enhancers. This suggests that many of these variants are controlling the activity of different genes, and provides many fresh leads for understanding how they affect our risk of disease. “It was one of those too good to be true moments,” says Birney. “Literally, I was in the room [when they got the result] and I went: Yes!”

Imagine a massive table. Down the left side are all the diseases that people have done GWAS studies for. Across the top are all the possible cell types and transcription factors (proteins that control how genes are activated) in the ENCODE study. Are there hotspots? Are there SNPs that correspond to both? Yes. Lots, and many of them are new.

Take Crohn’s disease, a type of bowel disorder. The team found five SNPs that increase the risk of Crohn’s, and that are recognised by a group of transcription factors called GATA2. “That wasn’t something that the Crohn’s disease biologists had on their radar,” says Birney. “Suddenly we’ve made an unbiased association between a disease and a piece of basic biology.” In other words, it’s a new lead to follow up on.

“We’re now working with lots of different disease biologists looking at their data sets,” says Birney. “In some sense, ENCODE is working form the genome out, while GWAS studies are working from disease in.” Where they meet, there is interest. So far, the team have identified 400 such hotspots that are worth looking into. Of these, between 50 and 100 were predictable. Some of the rest make intuitive sense. Others are head-scratchers.

Find out more about ENCODE:

The 3-D genome

Writing the genome out as a string of letters invites a common fallacy: that it’s a two-dimensional, linear entity. It’s anything but. DNA is wrapped around proteins called histones like beads on a string. These are then twisted, folded and looped in an intricate three-dimensional way. The upshot is that parts of the genome that look distant when you write the sequences out can actually be physical neighbours. And this means that some switches can affect the activity of far away genes

Job Dekker from the University of Massachusetts Medical School has now used ENCODE data to map these long-range interactions across just 1 percent of the genome in three different types of cell. He discovered more than 1,000 of them, where switches in one part of the genome were physically reaching over and controlling the activity of a distant gene. “I like to say that nothing in the genome makes sense, except in 3D,” says Dekker. “It’s really a teaser for the future of genome science,” Dekker says.

Gingeras agrees. He thinks that understanding these 3-D interactions will add another layer of complexity to modern genetics, and extending this work to the rest of the genome, and other cell types, is a “next clear logical step”.

Find out more about ENCODE:

How will scientists actually make sense of all of this?

ENCODE is vast. The results of this second phase have been published in 30 central papers in Nature, Genome Biology and Genome Research, along with a slew of secondary articles in Science, Cell and others. And all of it is freely available to the public.

The pages of printed journals are a poor repository for such a vast trove of data, so the ENCODE team have devised a new publishing model. In the ENCODE portal site, readers can pick one of 13 topics of interest, and follow them in special “threads” that link all the papers. Say you want to know about enhancer sequences. The enhancer thread pulls out all the relevant paragraphs from the 30 papers across the three journals. “Rather than people having to skim read all 30 papers, and working out which ones they want to read, we pull out that thread for you,” says Birney.

And yes, there’s an app for that.

Transparency is a big issue too. “With these really intensive science projects, there has to be a huge amount of trust that data analysts have done things correctly,” says Birney. But you don’t have to trust. At least half the ENCODE figures are interactive, and the data behind them can be downloaded. The team have also built a “Virtual Machine” – a downloadable package of the almost-raw data and all the code in the ENCODE analyses. Think of it as the most complete Methods section ever. With the virtual machine, “you can absolutely replay step by step what we did to get to the figure,” says Birney. “I think it should be the standard for the future.”

Find out more about ENCODE:


Compilation of other ENCODE coverage

CATEGORIZED UNDER: Genetics, Genomics

Comments (48)

  1. @Caseiokey

    I appreciate this Grand overview very much, thank you Ed!

  2. Sietse

    A masterpiece, Ed — you have surpassed yourself. Every paragraph is bursting with sense of wonder, I am grinning with joy and amazement!

  3. JimV

    Larry Moran of “Sandwalk” says this:

    The creationists are going to love this.

    You blew it Ed Yong. Why didn’t you ask him about the 50% of our genome containing DEFECTIVE transposons and the 2% that’s pseudogenes, just for starters? Then you could ask him why he believes that all intron sequences (about 20% of our genome) are functional.

    “Almost every nucleotide …”? Gimme a break. Don’t these guys read the scientific literature?

    I myself have no expertise in this (Larry Moran does), but I recall reading about an experiment in which over 30% of the genes of some lab mice were removed, and the resulting mice were breed without any loss of function (including the ability to reproduce). This would seem to conflict with your 80% number. Also we know the gene for producing vitamin C is defective in primates, and the gene for tasting sugar is defective in felines. No doubt these genes still produce some chemical reaction in cells, but that does not make them useful.

    I hope you will do more digging on this and post retractions as necessary.

  4. zackoz

    This is a classic, Ed, one of your top ten, or even no 1.

    When I read something like this, I constantly recall David Attenborough a few years ago expressing wonderment at all the discoveries made in his lifetime.

    What a great time to work in this area, or even just to be an interested layman.

  5. Torbjörn Larsson, OM

    And a masterpiece was ENCODEd.


    What I find interesting is the astrobiology perspective, the new and larger opening to the evolutionary history and the RNA world this gives.

    The specifically much looser requirement for selective pressures on some RNAs, especially on long non-coding RNA (lncRNA) like the signal recognition particle RNA shared by all cells. Apparently long stretches of RNA can accommodate the functional folding despite variation.

    This means there can be more remains of the RNA world than earlier believed. In bacteria there are several clusters of RNA akin to them having coding gene clusters, operons, for regulatory purposes. (Reading of like amounts of related genes in a sweep.)

    And there are also thousands of small RNAs in bacteria. While ENCODE finds that the correlation between selection fixation and function isn’t all that tight outside the coding transcripts, meaning some of those can have function along lineages.

    – The ENCODE project sees the functional unit as the RNA transcript, not the gene. Apparently the non-coding RNAs outnumber the proteins 10:1, at least in some eukaryotes (us).

    Importantly, it means we are more akin to the original RNA/protein cells, as Gingeras notes. Having RNA as the primary function means it is easier to accommodate the later DNA add on machinery too.

  6. “No doubt these genes still produce some chemical reaction in cells, but that does not make them useful.”

    I didn’t say useful. You said useful. I took very great pains to not say “useful”. I used the authors’ words: “functional” and then had a chunk that describes what that means, and the uncertainty around the 80% number. I also wrote:

    “The next phase is to find out how these players interact with one another. What does the 80 percent do (if, genuinely, anything)? If it does something, does it do something important? Does it change something tangible, like a part of our body, or our risk of disease? If it changes, does evolution care?”

    Which covers, for a lay audience, the various possible definitions of “functional” from “does something biochemically” to “leads to a phenotypic change” to “is important for evolution”.

    I’m aware of Larry’s disappointment. I’ve emailed him asking for his comments.

  7. Itai Bar-Natan

    “that it’s a two-dimensional, linear entity.” I think that’s a typo.

  8. amphiox

    The creationists are going to love this.

    Well, the creationists will twist anything any way they want. The only way to write about the science of evolution without having creationists twist and misrepresent your words is not to write anything at all.

  9. Alex

    Thanks for this write-up of the major findings, helps make sense of this data explosion.

  10. Benoit Bruneau

    Here’s what I responded to Larry Moran: “Come on now. Ewan, and the rest of the ENCODE team, have thought about this for a while. BUT, forget about the exact numbers. They have defined a ridiculous number of functional elements, far more and far more exquisitely interacting, let alone the amazing complexity of the transcriptome. So why quibble about the exact precise % that should be reported (is 80% excluding transposons? are transposons not a functional part of the genome? what if a transposon is co-transcribed), and just marvel at the data and its analysis.”

    Ed, well written and researched. ENCODE is a great accomplishment that will be used by hundreds of other scientists for several years to come. Don’t believe the hype, fine, but don’t knock the science.

  11. IW

    Ed, this blog post is confusing in that you normally have URLs to related articles at the end of the article itself, whereas in this one, you have “URLs” merely to different parts of the same article interspersed with sections of the article itself. Perhaps you might consider a different color to identify such “URLs” so we’re not clicking them thinking we’re being taken to a different page?

    Other than that it’s definitely something I want to read.

  12. Dan Moran, PhD

    Great news for mankind and our scientific honesty. Junk DNA was a misnomer from the start and most of us (molecular biologists) knew it. Even transposons which currently seem to have no activity will prove to be vital not viral as we learn more about control and development. Fodder for the ID position…. Yep! So what! Truth in sequencing and freedom of interpretation are hallmarks of great science.


  13. Charlie Jones

    Awesome article, but I too was curious about the prevalence of disabled genes. Not only does our vitamin C gene not work, but we have (I forget the precise numbers) roughly 1000 genes for detecting odor, of which roughly 600 do not work.

    Would these inactivated genes count as ‘functioning’ in the ENCORE study or are they truly inactive? Since these genes are among the classically defined protein producers that amount to only 1.5% of the total genome, knowing this distinction would help us to perhaps better understand the potential significance of the ‘80%’ figure.

  14. SP

    Quite brilliant Ed. Thank you.

  15. JS

    Ed this is fantastic. I’m supposed to be doing work but couldn’t keep my eyes off this piece. If I had to choose only one science blog to be on my reader list it would be yours.

  16. KAL

    Once again you have made the semi-incomprehensible comprehensible and with context. I think I might swoon. Good job.

  17. While the ENCODE effort is a major achievement, I do wonder how much of the putative transcription factor and histone interactions with DNA sequences may actually be non-specific and truly inconsequential. Such low level interactions may simply be noise and just tolerated. However, the main reason why I have a hard time accepting that about 80% of the human genome sequence is functional and important is the data from other species with a similar number of genes, but extremely divergent amounts of DNA. For example, the fruit fly Drosophila melanogaster has 0.165 billion nucleotide base pairs, whereas the butterfly Fritillaria assyriaca has 124.9 billion nucleotide base pairs. The human genome size lies between with about 3.2 billion nucleotide base pairs. While the fruit fly has 750-times less DNA than the butterfly, both insects have somewhat comparable characteristics in terms of body structure, size, life span, diet, etc.

    There appears to be strong evolutionary pressure in multicellular organisms to retain excess baggage so as to simply make sure that the important parts are retained. There are countless cases of this ranging from the extensive remodelling of embryos during early development, to the hundreds of thousands of superfluous phosphorylation sites in the proteins encoded by the human genome. At the levels of gross anatomy down to the molecular, there are so many examples of inefficiencies in biology. As I have pointed out above, DNA sequencing studies in diverse organisms have increasingly demonstrated extreme ranges in the sizes of their genomes, whilst still having a relatively similar number of genes. It just seems highly unlikely that this is for increasing the amount of regulation of the genome in certain organisms over others.

  18. Nathan Myers

    Then we can look at the amoeba, which carries around enormously more DNA than we do, while being under enormously more pressure to pare it down.

    The backlash is absurd: the burden of proof should be on anybody insisting a sequence is junk, not on somebody suggesting it might be used somehow. Even “broken” genes transcribe to something. It’s just as hard to discover what else they might do as for any other protein. That the RNA sequences that code for understood proteins cannot help but do other things, chemically, seems lost on these critics. They need to study endocrinology for a few months and develop some humility. To assume that the detailed workings of cells must be understandable to mere humans amounts to an arrogant sort of religion.

  19. Dr Ivan Hooper

    Our Genome is a work in progress. Knowing where it has been helps understand various functional groupings.
    As someone looking at evolution and the functions of some obscure genes, and without funding, I feel a bit out on a limb.
    I think that many people dont actually seem to know what a gene/protein does. They seem to put them in a box where as they seem to be multifunctional. (i dont know if i have explained that as well as i wanted to.) Although this ENCODE work looks impressive and gets a lot of funding, there are other important areas to research. Yes I am pushing my own barrow here.

  20. Assume for a moment that in fact 81.5 instead of 1.5 of “the genome” (the human genome?) does stuff, with 1.5 coding for genes and ca 80% these other functions. What about organisms like bats and birds where the “coding” proportion of the genome is way higher (closer to 100% last time I saw a paper on this)? Do molecules “dock” on coding regions as easily as on non-coding regions, for example?

    I think Larry Moran may be producing bricks for a few days. It will be interesting to see his response.

  21. mo

    Larry Moran is an idiot. He doesn’t even believe that epigenetics is anything else than Lac-Operon-style gene regulation and thinks we young ones shouldn’t hype it as a new thing. He also doesn’t believe in the existance of those hundreds of papers (!) that demonstrate transgenerational epigenetic inheritance, though I assume he can use pubmed.

    To dismiss this awesome new resource because the interviewee doesn’t say “Junk DNA” often enough is not helpful. Those 80% are about biochemical function, not phenotypical function. We’ve know for a long time that classical Junk often has specific Histone modifications and even is transcriped: Satellite DNA at the centromeres for example is transcribed and its nucleosomes have a very specific modification pattern to form pericentric heterochromatin, even though it is more silent than the rest of the genome and contains no protein-coding genes. So it is simultanously in the 80% and Junk DNA. Even though Junk DNA is a bit of an outdated concept.

    Damn his opinions.

  22. mo: OK, but where do all those functions get done on organisms with very, very little non-coding DNA? Or is that not a valid question?

  23. J.J.E.

    First off, I think the definition of functional used by ENCODE in the widely disseminated quotes and press realeases is at best tendentious. There is no reason to require that a segment of DNA completely lack specific biochemical activity to be non-functional. On to a few comments that struck me as grossly misguided.

    Nathan Myers says:

    The backlash is absurd: the burden of proof should be on anybody insisting a sequence is junk, not on somebody suggesting it might be used somehow.

    This is at best tendentious and at worst an inversion of the burden of proof. One does not simply assert a tendentious null-hypothesis and then criticize their interlocutors on that basis. The best a “junk is rare” advocate could hope for would be to assert that we don’t know the functional status of most of the genome; in other words, our priors on functionality of any particular sequence is uninformed by our experience. Thinking in a Bayesian way, I submit that for any non-coding sequence without direct functional evidence or evidence for evolutionary constraint, the “uninformative prior” shouldn’t exceed 50%.

    mo says:

    To dismiss this awesome new resource

    Hold your horses, Mr. Straw Man. The resource hasn’t been dismissed. Just the breathless, uncritical touting of a dubious figure that even a major contributor to the collaboration is loath fully endorse when he expounds at length.

    For example:

    Ewan Birney says:

    A conservative estimate of our expected coverage of exons + specific DNA:protein contacts gives us 18%, easily further justified (given our sampling) to 20%.

    and this:

    Ewan Birney says:

    Q. Ok, fair enough. But are you most comfortable with the 10% to 20% figure for the hard-core functional bases? Why emphasize the 80% figure in the abstract and press release?

    A. (Sigh.) Indeed. Originally I pushed for using an “80% overall” figure and a “20% conservative floor” figure, since the 20% was extrapolated from the sampling. But putting two percentage-based numbers in the same breath/paragraph is asking a lot of your listener/reader – they need to understand why there is such a big difference between the two numbers, and that takes perhaps more explaining than most people have the patience for.[cut…]

    We use the bigger number because it brings home the impact of this work to a much wider audience. But we are in fact using an accurate, well-defined figure when we say that 80% of the genome has specific biological activity.

    Do read the rest of whole thing, especially the comments on evolutionary ideas of functionality. If, after reading that, you can’t grasp the gulf between the highly publicized (and thinly qualified) sound bites and Birney’s own nuanced take on the issue, then you’re not paying attention. It is no wonder that many academics (especially those with training in evolution) are aghast at the clumsy and sensationalist treatment that this achievement has received in both the lay and the scientific media.

    Mo says:

    Those 80% are about biochemical function, not phenotypical function.

    In what world is biochemical function not a subset of phenotypic function? And if you misspoke and intended “biochemical activity” instead of “biochemical function”, I ask a complementary question: in what world is everything other than complete inertness considered “functional”?

  24. mo

    …But I agree with Eisen’s piece.

    Greg Laden: This is a bit complex, but the part of the genome they assigned “function” to isn’t actually the part that “does something crucial”. Instead they created a map of sequences that are bound by transcription factors, are transcribed, have a recognizable modification pattern, have a weird distribution of nucleosomes or interact with other sequences in the 3D structure of the nucleus. So this says 1.5% of the genome codes for proteins, but 80% does something funny on a biochemical or structural level. Most of these sequences my not have any effect on the wider phenotype of the organism, or even on the cell outside of the nucleus. Some of these sequences will cancel each others functions out, many will do nothing (but are still biochemically recognizable), many will just be important to determine the structure of the Chromatin in the nucleus and nucleus size/structure, many will be silenced transposons and other selfish DNA; Structural functions may scale with genome size so that small genomes and large genomes may have the same number of protein coding genes and regulation, but larger genomes may accommodate larger nuclei and cells or more chromosomes and other things (who knows?). Some (or many) will be useful for the regulation of protein coding or other genes, some may even be important for biological phenomenons we haven’t observed yet. It’s funny that much of the genome is transcribed, even though only 1.5% of the genome is translated to proteins. Why is that? Do all those RNAs do something? Is our view of gene expression completely wrong?
    It’s important to realize that many of those things which would show up in the ENCODE assays are sequences that many people would assign to non-functional DNA, like satellite/centromeric DNA, telomeres and transposons, that don’t code for proteins, are mostly repressed and mostly don’t function in gene regulation. But centromeres and telomeres have functions in genome stability (and aging) and transposons have negative fitness functions….

  25. mo


    Those 80% are about biochemical function, not phenotypical function.

    In what world is biochemical function not a subset of phenotypic function? And if you misspoke and intended “biochemical activity” instead of “biochemical function”, I ask a complementary question: in what world is everything other than complete inertness considered “functional”?

    I used the nomenclature of the corresponding author.

  26. mo

    I’m going to read all the papers in the next few days, that’s good enough for me.

  27. amphiox

    Then we can look at the amoeba, which carries around enormously more DNA than we do,

    This is true, for certain ameobas.

    while being under enormously more pressure to pare it down.

    But this is an assertion that requires some supporting evidence. The differential pressure to pare down the genome between prokaryotes and eukaryotes is clear, but between two eukaryotes? Not so much. Energy availability is not the rate-limiting step in DNA replication in either amoebas or metazoans, so neither should be expected to experience much pressure to pare down their genome sizes.

  28. Folks, given some of the critiques and commentary from across the blogosphere, I’ve updated this post with around 700 extra words. Two reasons for this. First, some have expressed the view that I didn’t give enough space to skeptical views first time round, and there is truth in this.

    Second, and more importantly, I want *this* post to continue being a useful resource about ENCODE. I could do a fresh update post, but any new reader to this one would have to click over to that as well. Which is why I’ve edited straight into this one.

    The downside is that it’s harder to track what’s new, but each new bit has a date and timestamp on it. For the record, the new bits are stamped [Update 07/09 23:00] and include three paragraphs in the first section and four paragraphs in the second section.

    I don’t know if this works or not. Somewhere out there, there’s a curmudgeon saying that this is a news piece, not Wikipedia, but I’m trying something a little different.

  29. J.J.E.

    mo says:

    I used the nomenclature of the corresponding author.

    That’s not good enough. The biggest gripe I and many others have with this result is that “functional” has been redefined in a very controversial way. There is a large existing literature regarding overzealous classification of function that the major authors are well aware of, as evidenced by Birney’s blog post (while it tends strongly towards the polemic, Gould’s and Lewontin’s Spandrels paper lays out one perspective in this area). Indeed, juxtaposing the media accounts and Birney’s actual blog post point out what appear to be clear contradictions. The dominant definition of “functional” in biology is “possessing an activity of purpose intended for a particular duty”. Evolution offers the only naturalistic escape path from this teleological briar patch via natural selection. Thus, unless we want to invoke designers (fine for fire engines, not so fine for firefighters), the criteria for functionality must be based on individuals bearing the stretches of “functional” DNA in question being more fit than their predecessors lacking such variants.

    So, unless you wanna change the damned English language or invoke intelligent designers, no, mere biochemical activity is not a sufficient definition of “functional”. It probably isn’t necessary either unless you take a very broad definition of the word “biochemical”, but that’s a discussion for another time.

  30. Nathan Myers

    It was only a very short time ago that our thymus gland was useless. Likewise our appendix. Likewise our lymph nodes. The null hypothesis must always be that a biological structure is used. Proving it is useless is very difficult, and a lot of work. Discovering what it’s used for, if anything, is a little less hard. Failing at either is expected. Assuming the more difficult conclusion is foolish.

    The principle at work here is the same as was promoted by Scott Adams: “Which is more likely?” Is it more likely that (1) an expensive structure is conserved over evolutionary time in widely divergent lineages, despite being useless baggage; or that (2) any given scientist is far more ignorant than he or she prefers to believe? I know where I’ll put my money, every time.

    If it is correct that most of bat DNA does code for proteins, then studying how bats (and the few other megabase-stingy taxa) lost the rest, and how they manage to get along without, will be enlightening. Pretending it’s the norm only stalls progress.

  31. J.J.E.

    Nathan Myers

    I’m just going to say no to SIWOTI.

    Read a bit about population genetics (esp. the neutral theory of molecular evolution), statistics and appropriate choice of prior probabilities (Bayesian statistics in particular), both forward and reverse genetics (how function is determined), the C-value paradox (esp. regarding your flip comments regarding bats, eg), and you might understand why your easy certainty is off putting to specialists. Oh, and Dilbert isn’t an argument.

  32. It is clear that this sentence “80% of the genome has a specific biochemical activity” is somehow misleading.

    I know that ‘transcription’ does not equal ‘function’ but it’s very perturbing to see that much of the genome is transcribed, even though only 1.5-2% of the genome is translated to proteins. Why is that? It can’t be only illegitimate/spurious transcription ?

    Do all those non coding RNAs do something?

    Are we only getting better at detecting noise with new technologies ?

    Is it only a question of ‘low’ versus ‘high’ levels of transcription rather than ‘what percentage’ ?

    I saw a great talk recently (at the EMBL transcription meeting) by Vicente Pelechano. Don’t think it’s published yet. Nevertheless, he mapped the transcript start sites & end points genomewide in yeast by TIF (transcript isoforms)-seq. He identified 1.88 million transcripts !!!!!!, an average of 195 TIFs/gene (bicistronic, tandem overlap, UTR variation, antisense).

    These results are truly intriguing and also emphasized the importance of post-transcriptional regulation.

    The ENCODE project has delivered an incredible amount of information (prediction ?) for all the scientific community. (Each of us in our labs) Let’s validate/invalidate/confirm/elaborate these incredible ressources.


  33. NickMatzke

    “The principle at work here is the same as was promoted by Scott Adams: “Which is more likely?” Is it more likely that (1) an expensive structure is conserved over evolutionary time in widely divergent lineages, despite being useless baggage; or that (2) any given scientist is far more ignorant than he or she prefers to believe? I know where I’ll put my money, every time.”

    This is just unsophisticated knee-jerk functionalist thinking. Read up on the longstanding literature about the well known “everything in biology must have a function/be selected” fallacy.

    But, more importantly: it’s actually wrong to think that junk DNA is “an expensive structure”. The energetic costs of replicating DNA are trivial compared to the energetic costs of transcription, translation, etc. And the time costs are minimal too, for eukaryotes, which have many origins of DNA replication (unlike prokaryotes). There is some evidence that organisms with high metabolism and fast generation times do experience some kind of relevant cost, as their genomes tend to be reduced, but if true that’s just more evidence that humans and salamanders and onions and whatnot have a lot of junk. And there are other explanations, e.g. perhaps DNA amount is a direct physical control nucleus size, and nucleus size controls cell volume (skeletal DNA hypothesis).

  34. RHV

    Ed, I thank you for doing a much better job than pretty much any other science journalist (that I’ve read) on covering this topic – I applaud your initial effort at explanation as well as all the follow-up response to the criticism of the ENCODE coverage.

    Alexis – although I always tell my students that we assume that most regulation occurs prior to transcription from an energy efficiency standpoint, this is certainly not the case for everything. Although it is surprising if you’re coming from a non-functional hypothesis that so much of the non-coding/non regulatory DNA is transcribed, it doesn’t necessarily mean it does something that affects phenotype – it might just cost more metabolically to prevent its transcription or the mechanisms just haven’t evolved to suppress it pre-transcript, and if it doesn’t do anything that ultimately affects phenotype other than a little bit of energy expenditure for the metabolic cost of transcription, then you’re just left with a local adaptive peak rather than a global peak of zero excess transcription. I do hope, as you suggest, that the ENCODE data prompts more analysis.

    The question I have is does it matter if the null hypothesis is that the DNA does something or that it does nothing? If you can identify function then you’ve refuted the does nothing hypothesis, but if you don’t you’ve just added more support for the does nothing hypothesis. If you start with “does something” hypothesis, it is very difficult to refute.

  35. Nathan Myers

    “unsophisticated knee-jerk functionalist thinking”

    No, Dilbert is not an argument, but “sophisticated” is always a relative term. Evolved organisms are astonishing Rube Goldberg (for brits: “Heath-Robinson”) contraptions full of silly random complications which only work with the help of other silly random complications. Yet life does find uses for things. The lenses of our eyes are constructed from what any rational biologist would identify as mis-coded hormones or something.

    This really is a matter of burden of proof. We know that RNA can have, and frequently has, direct biochemical effects. If an RNA molecule improves fitness without being (or in addition to being!) transcribed, a cell is happy to use it, and even to evolve extra apparatus to protect and transport it. If a better protein turns up, eons later, the cell is likely to have found some other use for the apparatus in the meantime, and maybe for the RNA fragment besides.

    That non-coding DNA would not be transcribed unless the resultant RNA were needed would be a functionalist argument. To insist that life could not find some use for some fragment of non-coding RNA would be simple arrogance. It might only be useful, at first, as a decoy so the otherwise useful bits last longer before being taken apart. Working biochemistry doesn’t have to be easy, or even possible, to understand. Sometimes we have been lucky, and part of a mechanism falls within our poor befuddled capacity.

    We can be confident that no matter how well-understood something in biochemistry is, later generations will find that understanding incomplete or even wrong.

  36. S. Pelech – Kinexus (17), I liked your example about the fruit fly and the butterfly but… _Fritillaria assyriaca_ seems to be a plant, not a butterfly :-/

  37. J.J.E.

    @Nathan Myers

    Now that you’re actually discussing it and making an argument (one that was made uncritically until the 60s, this isn’t a new perspective) you must perceive that this is an area where there is no slam dunk answer. The only slam dunk is that your earlier comments were over-simplified and did not admit our ignorance and lacked the humility of scientists that are taking their first steps into a new field. So, I accept your tacit acknowledgement that insisting a priori that, until proven otherwise, any bit of DNA is functional, was too broad and too hasty.

    You have also neglected to acknowledge my own comments regarding how to adjudicate evidence and how to frame hypotheses. Your comments seems stuck in an antiquated “H0 vs H1” paradigm when we now have far more appropriate perspectives on choosing how modify our uncertainty regarding scientific questions (in any event, the conventions of that paradigm usually require that the “uninteresting” hypothesis or the hypothesis of no effect be H0). In particular, priors in Bayesian statistics offers a very fair and natural way to apportion the burden of proof in an entirely impartial way, unlike tendentious choice of which is the null and which is the alternative hypothesis. (The answer is to pick an uninformative prior and let the data decide.)

    Additionally, you’ve ignored the results of genetics, which have already spoken pretty strongly. After a century of examining the correspondence between broken genetic systems and the phenotypes of such broken systems, we have a pretty good upper bound for how frequent important changes to the genome result from genetic changes outside of traditionally “functional” categories like exons, regulatory regions, etc. Sure, there are things like micro RNAs and indeed, we’re finding more and more stretches of the genome that are contributing to phenotype, though the magnitude of those phenotypic changes is in turn diminishing. Without a doubt, the lion’s share of “strongly functional” segments of the genome are associated with genes and their regulatory regions. (I hate to qualify this much, but I feel I must with you: I am not denying that there is a non-trivial minority of many other types of functional sequences.)

    Finally, you’ve completely neglected evolutionary arguments which have consistently found that much of the genome, especially in large genomes like the human genome, that there are huge stretches of non-conserved regions as well as large regions that have polymorphism that appears consistent with neutrality and not compatible with constraint. There are also genetic load arguments to be had, etc. So, given the mountain of evidence from genetics and evolution, it is a very generous concession to use an uninformative prior when quantifying our uncertainty regarding whether barely characterized or uncharacterized regions of the genome are functional. Again, this is not to say that we won’t some day find functions for those regions. But insisting on their functionality before those functions are found is profoundly unscientific and puts the cart before the horse in the worst possible way.

  38. wesy web

    Y’all keep babbling on about geomic size and relation of gene function to organism size/complexity like it matters or is supposed to be proportional. How about you look up dinoflagellate genomes and the genetic regulation of genes in the malaria parasite Plasmdoium falciparum. Protists, especially parasites, have nailed genomic comparmentalisation even under the selective pressure of endosymbiosis.

    They’re some top reads. Their life cycles and physiological development are remarkable, and make humans look boring in relation. Not to mention that Plasmodium infection has shaped genomic variation in humans, and the real part, not the regulat-ome.

    The true advancement that this information can afford us should be based on development of molecular tools to deliver gene technology, for example, gene therapy or to control the action of anti-biotic resistance cassettes in GMOs.

    Also can’t wait until we start naming some gene regulatory pathways after Warren G and Nate Dogg (RIP). Try to sneak it past the editors.

  39. “Junk DNA was a misnomer from the start”. The start was a 5-page Abstract reproduced at http://www.junkdna.com by the renowned scientist Ohno (1972) who (mistakenly) argued that an overwhelming amount of DNA was there “for the purpose of doing nothing”. While the very first question held the erroneous argument “suspect” (cf facsimile), a bulk of scientists embraced the mistaken axiom for (it seems) 40 years for the convenience of dismissing 98.7% of human DNA. As Kuhn presented in his classic “The Structure of Scientific Revolutions”, wrong axioms die hard (facts don’t kill theories, only a better theory can replace obsolete ones, cf Giordano Bruno), but despite a rapidly diminishing number of detractors true science marches on. A Geocentric theory had little practicality till space travel – but Recursive Genome Function is absolutely vital for too many hundreds of millions suffering and dying from “JunkDNA diseases” – most notably by the dreaded cancer. Recursive algorithms such as FractoGene are urgently demanded also by taxpayers who are sick and tired from paying though their nose for big science based on big mistakes.

  40. Nathan Myers
  41. El PaleoFreak at Comment 36. Thank you for the correction about the correct identity of Fritillaria assyriaca as a plant and not butterfly. I need to check my sources more carefully. The insect that appears to have the largest amount of DNA in its genome appears to be the mountain grasshopper Podisma pedestris, with about 14 billion nucleotide base pairs. This is still over a hundred-fold larger than the fruit fly Drosophila melanogaster genome.

  42. SP

    Ed, over the weekend I read the take on ENCODE in the science section of the Economist, which has been the best non-specialist source of science writing I know. Your piece is so far ahead in every way — clarity, content, narrative — that it is hard to credit. Kudos. For what it is worth, I think your first version was very clear and suitably sceptical on the true extent of “functional” DNA. The updates are useful and informative, but were not a necessary corrective (as you rather harshly imply).

    [Small suggestion: the extent of the updates is a little difficult to track as there is no clear marker of where they end. You might consider placing to closing bold square bracket that is currently before the text of the insert at the end of it to correct this.]

    [Thanks, SP. I like the bracket suggestion and have implemented it. – Ed]

  43. ” This is still over a hundred-fold larger than the fruit fly Drosophila melanogaster genome”


  44. JimV

    I used “useful” as a substitute for “non-junk”. If “functional” does not imply “useful”, then it seems to me it a very poor criterion for distinguishing between junk and non-junk. So the statement {According to ENCODE’s analysis, 80 percent of the genome has a “biochemical function”. More on exactly what this means later, but the key point is: It’s not “junk”.} is in fact … junk (functional, perhaps, but not useful).

    I’m very willing to believe that the rest of the post was as magnificent as many commenters here say it was – but you lost me at “it’s not junk”. I’ll owe you an apology if and when evolutionary biologists have a consensus either that a) most of the human genome is biologically useful (to individuals, not just as raw material for mutations); or b) junk DNA does not mean stuff that is biologically useless to individuals. Right now the sources I read would not agree with either.

    Other than that, keep up the good work.

  45. Claudiu Bandea

    Five reasons why my theory on the function of ‘junk DNA’ is better than theirs

    I intend to submit the paper below for publication in a peer-reviewed journal. Before submitting it, and have it reviewed by a handful (if that) of peers, I decided to post it here on the Blogosphere Preprint Server, which is rapidly becoming the front-line platform for transparent and comprehensive evaluation of scientific contributions.

    The ENCODE project has produced high quality and valuable data. There is no question about that. And, the micro-interpretation of the data has been of equal status. The problem is with the macro-interpretation of the results, which some consider to be the most important part of the scientific process. Apparently, the leaders of the ENCODE project agreed with this criterion, as they came out with one of the most startling biological paradigm since, well, since the Human Genome Project has shown that the DNA sequences coding for proteins and functional RNA, including those having well defined regulatory functions (e.g. promoters, enhancers), comprise less than 2% of the human genome.

    According to ENCODE’s ‘big science’ conclusion, at least 80% of the human genome is functional. This includes much of the DNA that has been previously classified as ‘junk DNA’ (jDNA). As metaphorically presented in both scientific and lay media, ENCODE’s results means the death of the jDNA.

    However the eulogy of jDNA (all of it) was written more than two decades ago, when I proposed (and conceptually proven) that jDNA functions as a sink for the integration of proviruses, transposons and other inserting elements, thereby protecting functional DNA (fDNA) from inactivation or alteration of its expression (see a copy of my paper posted here: http://sandwalk.blogspot.com/2012/06/tributre-to-stephen-jay-gould.html; also, see a recent comment in Science, that I posted at Sandwalk: http://sandwalk.blogspot.com/2012/09/science-writes-eulogy-for-junk-dna.html ).

    So, how does ENCODE theory stack ‘mano-a-mano’ with my theory? Here are five reasons why mine is superior:

    #5. In order to label 80% of the human genome functional, ENCODE changed the definition of ‘functional’; apparently, 80% of the human genome is ‘biochemically’ functional, which from a biological perspective might be meaningless. My model on the function of jDNA is founded on the fact that DNA can serve not only as an information molecule, a function that is based on its sequence, but also as a ‘structural’ molecule, a function that is not (necessarily) based on its sequence, but on its bare or bulk presence in the genome.

    #4. Surprisingly, ENCODE theory is not explicitly immersed in one of the fundamental tenets of modern biology: Nothing in biology makes sense except in the light of evolution. Indeed, there is no talk about how jDNA (which contain approximately 50% transposon and viral sequences) originated and survived evolutionarily. On the contrary, my model is totally embedded and built on evolutionary principles.

    #3. One of the major objectives of the ENCODE project was to help connect the human genome with health and diseases. Labeling 80% of these sequences ‘biochemically functional’ might create the aura that these sequences contain genetic elements that have not yet been mapped out by the myriad of genome wide studies; well, that remains to be seen. In the context of my model, the protective function of jDNA, particularly in somatic cells, is vital for preventing neoplastic transformations, or cancer; therefore, a better understanding of this function might have significant biomedical applications. Interestingly, this major tenet of my model can be experimentally addressed: e.g. transgenic mice carrying DNA sequences homologous to infectious retro-viruses, such as murine leukemia viruses (MuLV), might be more resistant to cancer induced by experimental MuLV infections as compared to controls.

    #2. The ENCODE theory is a culmination of a 250 million US dollars project. Mine, zilch; well, that’s not true, my model is based on decades of remarkable scientific work by thousands and thousands of scientists who paved the road for it.

    #1. The ENCODE theory has not passed yet the famous Onion Test ( http://www.genomicron.evolverzone.com/2007/04/onion-test/), which asks: why do onions have a genome much larger than us, the humans? Do we live in an undercover onion world? The Onion Test is so formidable and inconvenient that, to my knowledge, it has yet to make it through the peer review into the conventional scientific literature or textbooks. So, does my model pass the Onion Test? I think it does, but for a while, I’m going to let you try to figure it out how! And, maybe, when I’m going to submit my paper for publication, I’ll use your ideas, if the reviewers will ever ask me for an answer. Isn’t that smart?

  46. Lukasz Huminiecki

    Any thoughts on how the hypothesis of 80% functional genome fits with the occurrence of whole genome duplications, such as two rounds (2R) of whole genome duplication (WGD) that occurred at the base of vertebrates? The signature of 2R-WGD can be clearly seen in the human genome at the protein-coding gene level!


  47. Sara

    I just hope they manage to cure all diseases, especially brain disorders , particullary schizophrenia or schizoaffective disorder in my lifetime and hopelly in my young days still. Sorry about my english but is not my native language.

  48. Claudiu Bandea

    In my parodic comment above, ”Five reasons why my theory on the function of ‘junk DNA’ is better than theirs”, I brought forward an old model (1) on the genome evolution and on the origin and function of the genomic sequences labeled ‘junk DNA’ (jDNA), which in some species represents up to 99% of the genome.

    Since then, I posted in Science five mini-essays outlining some of the key tenets associated with this model, which might solve the C-value and jDNA enigmas ( http://comments.sciencemag.org/content/10.1126/science.337.6099.1159).

    As discussed in the original paper (1) and these mini-essays, the so called jDNA serves as a defense mechanism against insertional mutagenesis, which in humans and many other multicellular species can lead to cancer.

    Expectedly, as an adaptive defense mechanism, the amount of protective DNA varies from one species to another based on the insertional mutagenesis activity and the evolutionary constrains on genome size.

    1. Bandea CI. A protective function for noncoding, or secondary DNA. Med. Hypoth., 31:33-4. 1990.


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Not Exactly Rocket Science

Dive into the awe-inspiring, beautiful and quirky world of science news with award-winning writer Ed Yong. No previous experience required.

See More

Collapse bottom bar