Data & theory, then, now, and forever

By Razib Khan | August 30, 2006 2:59 pm

In the 10 Questions for A.W.F. Edwards, a mathematical geneticist, he was asked:
Like Fisher you have worked in both statistics and genetics. How do you see the relationship between them, both in your own work and more generally?
Edwards responded in part:
Genetical statistics has changed fundamentally too: our problem was the paucity of data, especially for man, leading to an emphasis on elucidating correct principles of statistical inference. Modern practitioners have too much data and are engaged in a theory-free reduction of it under the neologism ‘bioinformatics’.

This elicited a strong response from ‘godless capitalist,’ a computational biologist himself:
In other words, they did a lot of math that was unconnected to reality, aka “it is a capital mistake to theorize in the absence of data”. You can see the results in the pages of the journal Genetics today, or in something like Gillespie’s book — written in 2004! — which doesn’t even mention genome sequencing.
This issue re: theorizing in the absence of data is particularly salient in population genetics, where basic phenomena like recombination (and its impact on evolution) could not be well modeled because of the sheer extent of fine-scale recombination variation — an extent which has only recently been apprehended and quantified.
This reminds me of what Richard Lewontin stated in 1974 about the evolutionary genetics which Fisher, Wright and Haldane created in the 1930s:
…rich and powerful theory with virtually no suitable facts on which to operate. It was like a complex and exquisite machine, designed to process a raw material that no one had succeeded in mining.

Finally, in my 10 questions with him, famed evolutionary geneticist James F. Crow stated:
It is true that the elegant theory of Fisher, Wright, Haldane, Kimura, and Malécot was less useful than might have been expected, because of lack of good data to whieh the theory was applicable. But that is no longer true. Molecular evolution has provided an abundance of data and the theory now has plenty of important applications. In particular, the neutral theory of molecular evolution has had great heuristic and predictive value, and it owes a great deal to Kimura’s earlier theoretical work, which built on the foundations of the pioneers. Lynn might change her mind if she looked at some of the striking results gotten by combining molecular measurements with population genetics theory.

Whatever the details, one thing seems clear: Fisher, Wright and Haldane, and their successors, generated a theoretical system in advance of the ability to test all their conjectures or inferences. Is this useful? All things in moderation! Science is haphazard, some might call it a memetic form of stochastic hill climbing. Of course, if theory outruns data too much then you might get stuck in a ravine with steep sides and never climb back out. Before Origin the science of biology was one of discovery and classification. Darwin’s theory of evolution gave it is a paradigmatic lens through which to comprehend the diversity of life. But Darwin himself ran ahead of data: he had no good mechanism of genetic transmission! With the rise of Mendelianism this gap was closed to some extent, finally sealed tightly by Watson and Crick’s exposition of the structure of DNA. And yet just as Mendelianism put Darwinian evolutionary theory on firmer ground, R.A. Fisher and Sewall Wright began to force the theoretical territory far ahead of what the data could arbitrate. If the data was available I doubt that the “Wright-Fisher controversies” would have been as heated. What is the role of gene-gene interactions? Population substructure? Effective population size? After the basics of the Wright-Fisher models were elucidated the rest was rhetorical shadow boxing over fine axiomatic points which resulted in wildly variant inferences. Will Provine has argued that Wright’s central contentions in contrast with Fisher were misunderstand by his acolytes, suggesting that the controversy did lead science into ravines. But that is the nature of science, theory runs ahead of data and transforms into ideology, data smashes ideology and reshapes it into a theory, which gives rise to new systematic structures and paradigms. Schumpeter would be proud!


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at


See More


RSS Razib’s Pinboard

Edifying books

Collapse bottom bar