Life is One, universal common ancestry supported

By Razib Khan | May 13, 2010 3:01 am

One of the notions implicit in most evolutionary models is that the tree of life has a common root. In other words all individuals of all species represent end points of lineages which ultimately coalesce back to the the original common ancestor. The first Earthling, so to speak. I say implicit because common ancestry isn’t necessary for evolution to be valid; after all, we presumably accept that evolutionary process is operative in an exobiological context, if such a context exists. Therefore it is possible that modern extant lineages are derived from separate independent antecedents. A “multiple garden” model. This has seemed less and less plausible as the molecular basis of biology has been elucidated; it looks like the basic toolkit is found all across the tree of life. But with a new found awareness of the power of processes such as horizontal gene transfer the open & shut case is faced with a new element of ambiguity. Or perhaps not?

Here’s a post from Wired, Life on Earth Arose Just Once:

The idea that life forms share a common ancestor is “a central pillar of evolutionary theory,” says Douglas Theobald, a biochemist at Brandeis University in Waltham, Massachusetts. “But recently there has been some mumbling, especially from microbiologists, that it may not be so cut-and-dried.”

Because microorganisms of different species often swap genes, some scientists have proposed that multiple primordial life forms could have tossed their genetic material into life’s mix, creating a web, rather than a tree of life.

To determine which hypothesis is more likely correct, Theobald put various evolutionary ancestry models through rigorous statistical tests. The results, published in the May 13 Nature, come down overwhelmingly on the side of a single ancestor.

A universal common ancestor is at least 102,860 times more probable than having multiple ancestors, Theobald calculates.

The paper is now on the Nature website, A formal test of the theory of universal common ancestry. They looked specifically at 23 very conserved proteins across 12 taxa from the three domains of life (those being eukaryotes, prokaryotes, and the archaea). Here’s where the author explains the philosophy behind the statistical technique:

When choosing among several competing scientific models, two opposing factors must be taken into account: the goodness of fit and parsimony. The fit of a model to data can be improved arbitrarily by increasing the number of free parameters. On the other hand, simple hypotheses (those with as few ad hoc parameters as possible) are preferred. Model selection methods weigh these two factors statistically to find the hypothesis that is both the most accurate and the most precise.

The sorts of models compared is illustrated by figure 2. One the left you have the universal common descent model, and on the right the prokaryotes (bacteria) have an independent origin. The lines represent connections between the 23 conserved protein sequences, either through horizontal transfer or vertical transmission.


As noted in the Wired piece there’s no contest here. Universal common descent is strongly supported. I’ll let the author’s finish:

What property of the sequence data supports common ancestry so decisively? When two related taxa are separated into two trees, the strong correlations that exist between the sequences are no longer modelled, which results in a large decrease in the likelihood. Consequently, when comparing a common-ancestry model to a multiple-ancestry model, the large test scores are a direct measure of the increase in our ability to accurately predict the sequence of a genealogically related protein relative to an unrelated protein. The sequence correlations between a given clade of taxa and the rest of the tree would be eliminated if the columns in the sequence alignment for that clade were randomly shuffled. In such a case, these model-based selection tests should prefer the multiple-ancestry model. In fact, in actual tests with randomly shuffled data, the optimal estimate of the unified tree (for both maximum likelihood and Bayesian analyses) contains an extremely large internal branch separating the shuffled taxa from the rest. In all cases tried, with a wide variety of evolutionary models (from the simplest to the most parameter rich), the multiple-ancestry models for shuffled data sets are preferred by a large margin over common ancestry models (LLR on the order of a thousand), even with the large internal branches. Hence, the large test scores in favour of UCA models reflect the immense power of a tree structure, coupled with a gradual Markovian mechanism of residue substitution, to accurately and precisely explain the particular patterns of sequence correlations found among genealogically related biological macromolecules.

Citation: Theobald, Douglas L., A formal test of the theory of universal common ancestry, Nature, doi:10.1038/nature09014

CATEGORIZED UNDER: Genetics, Genomics

Comments (18)

  1. bioIgnoramus

    “simple hypotheses … are preferred”: by us, maybe; but by God?

  2. bioIgnoramus

    “a central pillar of evolutionary theory”: even a central pillar may be structurally redundant.

  3. kirk

    Even a single common ancestor could be the 201st or the 2001st Earthling – multiple trials of the first replicator is vastly more likely that getting it fit the first time.

    And BioIgnoramus – I’m looking at you – there is an app for ignorance. It’s called science. Give a hoot; read a book.

  4. The Wired post title is misleading: in the paper as well as in Steel & Penny’s commentary on page 168, they are clear in saying that the origin of life on Earth is not under test. The hypothesis is about one single LUCA against several LUCA, one for each group. (represented as ABE versus AE+B).

    Another mistake in the Wired post are the numbers: when they say “is at least 102,860 times more probable” actually they mean 10^2,860…

  5. miko

    Alan Turing had a very elegant and parsimonious model for Drosophila segmentation that could explain the available data with a few genes, diffusion, and feedback. Of course, it turned out to be a clusterfuck of 30+ genes and counting and still defies logic. Such is biology–I’m not sure parsimony is ever a good assumption in contingent, dependent systems that evolve over time. See the Windows OS. Though in this case I’m pretty sure kirk is right about how things went… many replicators that failed until there was a really good replicator that became all of us.

    Off-topic, biologists who are good at marketing their data to peers and editors tend to be those who can make the ugly morass SEEM parsimonious through selective presentation and interpretation.

  6. mike o

    Multiple, peer-reviewed studies have shown that God does indeed prefer simple hypotheses.

    Other studies have also discovered that He prefers boxers to briefs, wakes up at precisely 11:00am EST every day, and lives in a house with about 30 cats. Mrs. God left him several years ago.

  7. bioIgnoramus

    Oh, kirk, you are a proper caution.

  8. You seem to have dropped a superscript. The ratio isn’t 102,860 to 1 but 10^2,860 to 1.

  9. Torbjörn Larsson, OM

    The sequence correlations between a given clade of taxa and the rest of the tree would be eliminated if the columns in the sequence alignment for that clade were randomly shuffled. In such a case, these model-based selection tests should prefer the multiple-ancestry model.

    That is also a good argument against fundamentalist creationists.

    When they claim that evolution is all “random chemical reactions” [when bounded molecules are anything but random assemblages of atoms – give me a break] it is in fact they with their multiple creations/kinds who suggest random shuffling.

  10. jb

    I don’t know. Common ancestry certainly seems the most reasonable possibility. But the idea that we can calculate any real world probabilities and come up with 1 in 10^2860 seems dubious to me!

  11. Torbjörn Larsson, OM

    jb, first we are discussing a posteriori likelihoods for observational models explaining a set of observations, not a priori probabilities for making individual observations.

    And second, it seems you aren’t familiar with phylogenies and Theobald’s exposition of them:

    So, how well do phylogenetic trees from morphological studies match the trees made from independent molecular studies? There are over 10^38 different possible ways to arrange the 30 major taxa represented in Figure 1 into a phylogenetic tree (see Table 1.3.1; Felsenstein 1982; Li 1997, p. 102). In spite of these odds, the relationships given in Figure 1, as determined from morphological characters, are completely congruent with the relationships determined independently from cytochrome c molecular studies (for consensus phylogenies from pre-molecular studies see Carter 1954, Figure 1, p. 13; Dodson 1960, Figures 43, p. 125, and Figure 50, p. 150; Osborn 1918, Figure 42, p. 161; Haeckel 1898, p. 55; Gregory 1951, Fig. opposite title page; for phylogenies from the early cytochrome c studies see McLaughlin and Dayhoff 1973; Dickerson and Timkovich 1975, pp. 438-439). Speaking quantitatively, independent morphological and molecular measurements such as these have determined the standard phylogenetic tree, as shown in Figure 1, to better than 38 decimal places. This phenomenal corroboration of universal common descent is referred to as the “twin nested hierarchy”. This term is something of a misnomer, however, since there are in reality multiple nested hierarchies, independently determined from many sources of data.


    Nevertheless, a precision of just under 1% is still pretty good; it is not enough, at this point, to cause us to cast much doubt upon the validity and usefulness of modern theories of gravity. However, if tests of the theory of common descent performed that poorly, different phylogenetic trees, as shown in Figure 1, would have to differ by 18 of the 30 branches! In their quest for scientific perfection, some biologists are rightly rankled at the obvious discrepancies between some phylogenetic trees (Gura 2000; Patterson et al. 1993; Maley and Marshall 1998). However, as illustrated in Figure 1, the standard phylogenetic tree is known to 38 decimal places, which is a much greater precision than that of even the most well-determined physical constants. For comparison, the charge of the electron is known to only seven decimal places, the Planck constant is known to only eight decimal places, the mass of the neutron, proton, and electron are all known to only nine decimal places, and the universal gravitational constant has been determined to only three decimal places.

    You can see from the provided statistics or the Phylogenetic Trees Calculator how the number of possible tree permutations explode by the factorial function, yet methods of determining phylogeny like the one we are discussing are able to select a subset of them.

    The use of precision here is apt, it is the precision or uncertainty of a test of the whole tree topology.

    The resolution of the tree itself becomes much poorer as one wants to observe its individual details. It’s like how one can run a collider to arrive at some precise Standard Model constants, yet have trouble to resolve the statistics of collision statistical distributions over all energies.

    Even without the math it is IMO easy to intuit how the number of possible permutations of a nested graph such as in fig a goes from “on the order of” 10^38 to 10^[insert much huger number here], as the connections goes from something closer to the ideal tree topology to the mesh topology.

  12. Torbjörn Larsson, OM

    Even a single common ancestor could be the 201st or the 2001st Earthling – multiple trials of the first replicator is vastly more likely that getting it fit the first time.

    Only if you have no selection on the protobiotic chemical population.

    And that is very unlikely, as you always have selection for robustness of (relying on) sources and biologists identifying things as photo-selectivity being important early on. Not surprisingly, AFAIU all amino acids and nucleic acids show signs of having been intensively UV photoselected.

    So it is likely that the chemical populations where environmentally fit even before they started to reproduce by direct inheritance processes instead of growth and separation.

    In fact, now that we know that a cool early Earth was in place (4.4 Ga zircons having been born in an environment having liquid water), and that any physically likely amount of LHB still left pockets for reproducing mesophilic bacteria (first model paper last year), it is likely that reproducing probiotic and later biotic populations got started right away.

    But not as likely as testing to 10^-2860. 😉

    [One more realistic problem being that the environment, especially the Sun, wasn’t as stable as today. Life could have been burnt to a crisp many times over. OTOH, that LHB paper shows how hard it is for spotty mechanisms such as solar wind outbreaks to eradicate _all_ life, especially after it has passed to free living reproducing cells.]

  13. AG

    Or other forms were simply wiped out by our common one ancestor. Imagine all wild animals and plants were replaced by our domesticated animals and plants because we simply leave no habitat for wild ones anymore. Future generation will only see lifes related to human activities. Then the conclusion is that all creatures on earth are created for human only.

  14. I wonder what hypothesis/es the “1” represents in that ratio “10^2860 to 1”. I mean, AFAIK, no one doubts that, for example, eukaryotes share a common ancestor, even if there’s been some amount of gene swapping and close-relative hybridization in the tree. So the possibility of multiple animal ancestors put on the table? If so, the numbers would naturally get skewed. (If you want to compare the known tree to all possible trees, like ones where sharks and pine trees are descended from zebras, of course you’ll get crazy big numbers.)

    While I personally believe that UCD is most likely, I’m quite surprised that the odds would be that high, considering the various recent speculations about multiple ancestral gene pools.

  15. Torbjörn Larsson, OM

    the numbers would naturally get skewed. (If you want to compare the known tree to all possible trees,

    That is what you have to do, which means there is no skewing. (Which would be unnatural btw, non-skewing is the natural (null) hypothesis.)


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at


See More


RSS Razib’s Pinboard

Edifying books

Collapse bottom bar