In part, genes. Luke Jostins reported this from a conference last year, so not too surprising. Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Let me jump to the summary:
In summary, we have provided an empirical example of widespread weak selection on standing variation. We observed genetic differences using multiple populations from across Europe, thereby showing that the adult height differences across populations of European descent are not due entirely to environmental differences but rather are, at least partly, genetic differences arising from selection. Height differences across populations of non-European ancestries may also be genetic in origin, but potential nongenetic factors, such as differences in timing of secular trends, mean that this inference would need to be directly tested with genetic data in additional populations. By aggregating evidence of directionally consistent intra-European frequency differences over many individual height-increasing alleles, none of which has a clear signal of selection on its own, we observed a combined signature of widespread weak selection. However, we were not able to determine whether this differential weak selection (either positive or negative) favored increased height in Northern Europe, decreased height in Southern Europe or both. One possibility is that sexual selection or assortative mating (sexual selection for partners in similar height percentiles) fueled the selective process. It is also possible that selection is not acting on height per se but on a phenotype closely correlated with height or a combination of phenotypes that includes height.
Two points of note. First, simulations suggested that the genetic architecture is unlikely to be due to drift alone. In other words, natural selection. Selection on quantitative traits isn’t magic, there’s a whole agricultural industry based around this phenomenon. For the purposes of understanding human evolution the key is that we are now moving beyond looking for traits which emerged due to novel mutations (e.g., lactase persistence), and now trying to understand how selection and drift may work on standing variation. For example, humans have become smaller in overall size, and also in cranial capacity, over the past 10,000 years. Second, they validated their findings using a sibling cohort. This is something I always look for when people make inter-population inferences. A number of population wide correlations don’t pan out when you are looking within families. This matters in trying to understand causation.
That’s the question a commenter poses, albeit with skepticism. First, the background here. New England was a peculiar society for various demographic reasons. In the early 17th century there was a mass migration of Puritan Protestants from England to the colonies which later became New England because of their religious dissent from the manner in which the Stuart kings were changing the nature of the British Protestant church.* Famously, these colonies were themselves not aiming to allow for the flourishing of religious pluralism, with the exception of Rhode Island. New England maintained established state churches longer than other regions of the nation, down into the early decades of the 19th century.
Between 1630 and 1640 about ~20,000 English arrived on the northeastern fringe of British settlement in North America. With the rise of co-religionists to power in the mid-17th century a minority of these emigres engaged in reverse-migration. After the mid-17th century migration by and large ceased. Unlike the Southern colonies these settlements did not have the same opportunities for frontiersmen across a broad and ecological diverse hinterland, and its cultural mores were decidedly more constrained than the cosmopolitan Middle Atlantic. The growth in population in New England from the low tends of thousands to close to 1 million in the late 18th century was one of endogenous natural increase from the founding stock.
The Pith: Even traits where most of the variation you see around you is controlled by genes still exhibit a lot of variation within families. That’s why there are siblings of very different heights or intellectual aptitudes.
In a post below I played fast and loose with the term correlation and caused some confusion. Correlation is obviously a set of precise statistical terms, but it also has a colloquial connotation. Additionally, I regularly talk about heritability. Heritability is in short the proportion of phenotypic variance which can be explained by genetic variance. In other words, if heritability is ~1 almost all the variation in the trait is due to variation in genes, while if heritability is ~0 almost none of it is. Correlation and heritability of traits across generations are obviously related, but they’re not the same.
This post is to clarify a few of these confusions, and sharpen some intuitions. Or perhaps more accurately, banish them.
In earlier discussions I’ve been skeptical of the idea of “designer babies” for many traits which we may find of interest in terms of selection. For example, intelligence and height. Why? Because variation on these traits seems highly polygenic and widely distributed across the genome. Unlike cystic fibrosis (Mendelian recessive) or blue eye color (quasi-Mendelian recessive) you can’t just focus on one genomic region and then make a prediction about phenotype with a high degree of certainty. Rather, you need to know thousands and thousands of genetic variants, and we just don’t know them.
But I just realized one way that genomics might make it a little easier even without this specific information.
Kobe Bryant is an exceptional professional basketball player. His father was a “journeyman”. Similarly, Barry Bonds and Ken Griffey Jr. both surpassed their fathers as baseball players. Both of Archie Manning’s sons are superior quarterbacks in relation to their father. This is not entirely surprising. Though there is a correlation between parent and offspring in their traits, that correlation is imperfect.
Note though that I put journeyman in quotes above because any success at the professional level in major league athletics indicates an extremely high level of talent and focus. Kobe Bryant’s father was among the top 500 best basketball players of his age. His son is among the top 10. This is a large realized difference in professional athletics, but across the whole distribution of people playing basketball at any given time it is not so great of a difference.
What is more curious is how this related to the reality of regression toward the mean. This is a very general statistical concept, but for our purposes we’re curious about its application in quantitative genetics. People often misunderstand the idea from what I can tell, and treat it as if there is an orthogenetic-like tendency of generations to regress back toward some idealized value.
Going back to the basketball example: Michael Jordan, the greatest basketball player in the history of the professional game, has two sons who are modest talents at best. The probability that either will make it to a professional league seems low, a reality acknowledged by one of them. In fact, from what I recall both received special attention and consideration because they were Michael Jordan’s sons. It is still noteworthy of course that both had the talent to make it onto a roster of a Division I NCAA team. This is not typical for any young man walking off the street. But the range in realized talent here is notable. Similarly, Joe Montana’s son has been bouncing around college football teams to find a roster spot. Again, it suggests a very high level of talent to be able to plausibly join a roster of a Division I football team. But for every Kobe Bryant there are many, many, Nate Montanas. There have been enough generations of professional athletes in the United States to illustrate regression toward the mean.
I have discussed the reality that many areas of psychology are susceptible enough to false positives that the ideological preferences of the researchers come to the fore. CBC Radio contacted me after that post, and I asked them to consider that in 1960 psychologists discussed the behavior of homosexuality as if it was a pathology. Is homosexuality no longer a pathology, or have we as a society changed our definitions? In any given discipline when confronted with the specter of false positives which happen to meet statistical significance there is the natural tendency to align the outcome so that it is socially and professionally optimized. That is, the results support your own ideological preferences, and, they reinforce your own career aspirations. Publishing preferred positive results furthers both these ends, even if at the end of the day many researchers may understand on a deep level the likelihood that a specific set of published results are not robust.
This issue is not endemic to social sciences alone. I have already admitted this issue in medical sciences, where there is a lot of money at stake. But it crops up in more theoretical biology as well. In the early 20th century Charles Davenport’s research which suggested the inferiority of hybrids between human races was in keeping with the ideological preferences of the era. In our age Armand Leroi extols the beauty of hybrids, who have masked their genetic load through heterozygosity (a nations like Britain which once had a public norm against ‘mongrelization’ now promote racial intermarriage in the dominant media!). There are a priori biological rationales for both positions, hybrid breakdown and vigor (for humans from what I have heard and seen there seems to be very little evidence overall for either once you control for the deleterious consequences of inbreeding). In 1900 and in 2000 there are very different and opposing social preferences on this issue (as opposed to individual preferences). The empirical distribution of outcomes will vary in any given set of cases, so researchers are incentivized to seek the results which align well with social expectations. (here’s an example of heightened fatality due to mixing genetic backgrounds; it seems the exception rather than the rule).
Thinking about all this made me reread James F. Crow’s Unequal by nature: a geneticist’s perspective on human differences. Crow is arguably the most eminent living population geneticist (see my interview from 2006). Born in 1916, he has seen much come and go. For those of us who wonder how anyone could accept ideas which seem shocking or unbelievable today, I suspect Crow could give an answer. He was there. In any case, on an editorial note I think the essay should have been titled “Different by nature.” Inequality tends to connote a rank order of superiority or inferiority, though in the context of the essay the title is obviously accurate. Here is the most important section:
The Pith: When it comes to the final outcome of a largely biologically specified trait like human height it looks as if it isn’t just the genes your parents give you that matters. Rather, the relationship of their genes also counts. The more dissimilar they are genetically, the taller you are likely to be (all things equal).
Dienekes points me to an interesting new paper in the American Journal of Physical Anthropology, Isolation by distance between spouses and its effect on children’s growth in height. The results are rather straightforward: the greater the distance between the origin of one’s parents, the taller one is likely to be, especially in the case of males. These findings were robust even after controlling for confounds such as socioeconomic status. Their explanation? Heterosis, whether through heterozygote advantage or the masking of recessive deleterious alleles.
The paper is short and sweet, but first one has to keep in mind the long history of this sort of research in the murky domain of human quantitative genetics. This is not a straight-forward molecular genetic paper where there’s a laser-like focus on one locus, and the mechanistic issues are clear and distinct. We are talking about a quantitative continuous trait, height, and how it varies within the population. We are also using geographical distance as a proxy for genetic distance. Finally, when it comes to the parameters affecting these quantitative traits there are a host of confounds, some of which are addressed in this paper. In other words, there’s no simple solution to the fact that nature can be quite the tangle, more so in some cases than others.
Because of the necessity for subtlety in this sort of statistical genetic work one must always be careful about taking results at face value. From what I can gather the history of topics such as heterosis in human genetics is always fraught with normative import. The founder of Cold Spring Harbor Laboratory, Charles Davenport, studied the outcomes of individuals who were a product of varied matings in relation to genetic distance in the early 1920s. This was summed up in his book Race Crossing in Jamaica:
A quantitative study of 3 groups of agricultural Jamaican adults: Blacks, Whites, and hybrids between them; also of several hundred children at all developmental stages. The studies are morphological, physiological, psychological, developmental and eugenical. The variability of each race and sex in respect to each bodily dimension and many basis vary just as morphological traits do. In some sensory tests the Blacks are superior to Whites; in some intellectual tests the reverse is found. A portion of the hybrids are mentally inferior to the Blacks. The negro child has, apparently, from birth on, different physical proportions than the white child.
The Pith: There has been a long running argument whether Pygmies in Africa are short due to “nurture” or “nature.” It turns out that non-Pygmies with more Pygmy ancestry are shorter and Pygmies with more non-Pygmy ancestry are taller. That points to nature.
In terms of how one conceptualizes the relationship of variation in genes to variation in a trait one can frame it as a spectrum with two extremes. One the one hand you have monogenic traits where the variation is controlled by differences on just one locus. Many recessively expressed diseases fit this patter (e.g., cystic fibrosis). Because you have one gene with only a few variants of note it is easy to capture in one’s mind’s eye the pattern of Mendelian inheritance for these traits in a gestalt fashion. Monogenic traits are highly amenable to a priori logic because their atomic units are so simple and tractable. At the other extreme you have quantitative polygenic traits, where the variation of the trait is controlled by variation on many, many, genes. This may seem a simple formulation, but to try and understand how thousands of genes may act in concert to modulate variation on a trait is often a more difficult task to grokk (yes, you can appeal to the central limit theorem, but that means little to most intuitively). This is probably why heritability is such a knotty issue in terms of public understanding of science, as it concerns the component of variation in quantitative continuous traits which is dispersed across the genome. The traits where there is no “gene for X.” Additionally, quantitative traits are likely to have a substantial environmental component of variation, confounding a simple genotype to phenotype mapping.
Arguably the classic quantitative trait is height. It is clear and distinct (there aren’t arguments about the validity of measurement as occurs in psychometrics), and, it is substantially heritable. In Western societies with a surfeit of nutrition height is ~80-90% heritable. What this means is that ~80-90% of the variance of the trait value within the population is due to variance of the genes within the population. Concretely, there will be a very strong correspondence between the heights of offspring and the average height of the two parents (controlled for sex, so you’re thinking standard deviation units, not absolute units). And yet height is at the heart of the question of the “missing heriability” in genetics. By this, I mean the fact that so few genes have been associated with variation in height, despite the reality that who your parents are is the predominant determination of height in developed societies.
The Pith: In this post I examine how looking at genomic data can clarify exactly how closely related siblings really are, instead of just assuming that they’re about 50% similar. I contrast this randomness among siblings to the hard & fast deterministic nature of of parent-child inheritance. Additionally, I detail how the idealized spare concepts of genetics from 100 years ago are modified by what we now know about how genes are physically organized, and, reorganized. Finally, I explain how this clarification allows us to potentially understand with greater precision the nature of inheritance of complex traits which vary within families, and across the whole population.
Humans are diploid organisms. We have two copies of each gene, inherited from each parent (the exception here is for males, who have only one X chromosome inherited from the mother, and lack many compensatory genes on the Y chromosome inherited from the father). Our own parents have two copies of each gene, one inherited from each of their parents. Therefore, one can model a grandchild from two pairs of grandparents as a mosaic of the genes of the four ancestral grandparents. But, the relationship between grandparent and grandchild is not deterministic at any given locus. Rather, it is defined by a probability. To give a concrete example, consider an individual who has four grandparents, three of whom are Chinese, one of whom is Swedish. Imagine that the Swedish individual has blue eyes. One can assume reasonably then on the locus which controls blue vs. non-blue eye color difference one of the grandparents is homozygous for the “blue eye” allele, while the other grandparents are homozygous for the “brown eye” alleles. What is the probability that any given grandchild will carry a “blue eye” allele, and so be a heterozygote? Each individual has two “slots” at a given locus. We know that on one of those slots the individual has only the possibility of having a brown eye allele. Their probability of variation then is operative only on the other slot, inherited from the parent whom we know is a heterozygote. That parent in their turn may contribute to their offspring a blue eye allele, or a brown eye allele. So there is a 50% probability that any given grandchild will be a heterozygote, and a 50% probability that they will be a homozygote.
The above “toy” example on one locus is to illustrate that the variation that one sees among individuals is in part due to the fact that we are not a “blend” of our ancestors, but a combination of various discrete genetic elements which are recombined and synthesized from generation to generation. Each sibling then can be conceptualized as a different “experiment” or “trial,” and their differences are a function of the fact that they are distinctive and unique combinations of their ancestors’ genetic variants. That is the most general theory, without any direct reference to proximate biophysical details of inheritance. Pure Mendelian abstraction as a formal model tells us that reproductive events are discrete sampling processes. But we live in the genomic age, and as you can see above we can measure the variation in genetic relationships among siblings today in an empirical sense. The expectation, as we would expect, is 0.50, but there is variance around that expectation. It is not likely that all of your siblings are “created equal” in reference to their coefficient of genetic relationship to you.
Two of the main avenues of research which I track rather closely in this space are genome-wide association studies (GWAS), which attempt to establish a connection between a trait/disease and particular genetic markers, and inquiries into the evolutionary parameters which shape the structure of variation within the human genome. Often with specific relation to a particular trait/disease. By evolutionary parameters I mean stochastic and deterministic forces; mutation, migration, random drift, and natural selection. These two angles are obviously connected. Both focus on phenomena which are proximate in relation to the broader evolutionary principle: the ultimate raison d’être, replication. Stochastic forces such as random genetic drift reflect the error of sampling of genes from generation to generation during the process of reproduction, while adaptation through natural selection is an outcome of the variation of reproductive fitness as a function of variation of heritable traits. Both of these forces have been implicated in diseases and traits which come under the purview of GWAS (and linkage mapping).
GWAS are regularly in the news because of their relevance in identifying the causal genetic factors for specific diseases. For example, schizophrenia. But they can be useful in a non-disease context as well. Human pigmentation is a character whose genetic architecture has been well elucidated thanks to a host of recent association studies. The common disease-common variant has yielded spectacular results for pigmentation; it does seem a few common variants are responsible for most of the variation on this trait. But this has been the exception rather than the rule.
One reason for this disjunction between the promise of GWAS and the concrete tangible outcomes is that many traits/diseases of interest may be polygenic and quantitative. This implies that variation in phenotype is controlled by variation across many genes, and, that the variation itself exhibits gradual continuity (a continuity which can be modeled as a normal distribution of values). The power of GWAS to detect correlated variation across genes and traits of small marginal effect is obviously limited. In contrast, it seems that about half a dozen genes can explain most of the between population variation in pigmentation. One SNP is able to account for 25-40% of the difference in shade between Europeans and Africans. This SNP is fixed in Europeans, nearly absent in Africans and East Asians, and segregating in both ancestral and derived variants in groups such as South Asians and African Americans. In contrast, though traits such as schizophrenia and height are substantially heritable, much of the variation at the population level of the trait is explainable by variation in genes. The effect size at any given locus may be small, or the variation may be accumulated through the sum of larger effect variants of low frequency. In other words, many common variants of small effect, or numerous distinctive rare variants of large effect.
In a nation of ~1 billion, even one where a large minority are positively malnourished, you’d expect some really tall people. So not that surprising: NBA Awaits Satnam From India, So Big and Athletic at 14:
In a country of 1.3 billion people, 7-foot, 250-pound Satnam Singh Bhamar has become a beacon for basketball hope.
At age 14.
That potential starts with his size, which is incredible itself. At age 14, he is expected to grow for another couple of years. For now, he wears a size-22 basketball shoe. His hands swallow the ball. His father, Balbir Singh Bhamara, is 7-2. His grandmother on his father’s side is 6-9.
Punjab is one of India’s more prosperous states. Interestingly this kid’s paternal grandmother is as tall in standard deviation units as her son or grandson. In Western developed societies height is 80-90% heritable. That means that there’s very little expected regression back to the population mean for any given child. The article doesn’t mention the mother’s height though. If she is of more normal size then Satnam is either a fluke, or, there are dominant large effect rare alleles being passed down by the father, perhaps from the paternal grandmother.
In the early 20th century there was a rather strange (in hindsight) debate between two groups of biological scientists attempting to understand the basis of inheritance and its relationship to evolutionary processes. The two factions were the biometricians and Mendelians. As indicated by their appellation the Mendelians were partisans of the model of inheritance formulated by Gregor Mendel. Like Mendel many of these individuals were experimentalists, with a rough & ready qualitative understanding of biological processes. William Bateson was arguably the model’s most vociferous promoter. Set against the Mendelians were more mathematically minded thinkers who viewed themselves as the true inheritors of the mantle of Charles Darwin. Though the grand old patron of the biometricians was Francis Galton, the greatest expositor of the school was Karl Pearson.* Pearson, along with the zoologist W. F. R. Weldon, defended Charles Darwin’s conception of evolution by natural selection during the darkest days of what Peter J. Bowler terms “The Eclipse of Darwinism”.** One aspect of Darwin’s theory as laid out in The Origin of Species was gradual change through the operation of natural selection upon extant genetic variation. There was a major problem with the model which Darwin proposed: he could offer no plausible engine in regards to mode of inheritance. Like many of his peers Charles Darwin implicitly assumed a blending model of inheritance, so that the offspring would be an analog constructed about the mean of the parental values. But as any old school boy knows the act of blending diminishes variation! This, along with other concerns, resulted in a general tendency in the late 19th century to accept the brilliance of the idea of evolution as descent with modification, but dismiss the motive engine which Charles Darwin proposed, gradual adaptation via natural selection upon heritable variation.
A major issue in human genomics over the past few years has been the case of the “missing heritability“. Roughly, we know that for many traits, such as height, most of the variation in the trait within the population is controlled by variation in the genes of the population. The height of your parents is an extremely good predictor of your height in a developed nation. If you’re adopted, the height of your biological parents is an extremely good predictor of your height in a developed nation, not the height of your adoptive parents. Though a new paper claims to have resolved some of the difficulty, one of the major issues in human height genetics has been the lack of large effect quantitative trait locus. In plain English, a gene which can explain a lot of the variation in the trait. Rather, many have posited that continuous quantitative traits like height are controlled by variation in innumerable common genes of small effect size, or, by innumerable rare genes of large effect size. The same may be an issue with personality genetics, or so is claimed by a recent paper unable to find common variants (though an eminent geneticist pointed out in the comments some problems with the paper itself).
One would assume that the same problem would crop up across the tree of life. But a geneticist once told me that he considered biology the science where all rules have exceptions. Many exceptions. A new paper in PLoS Biology paints a fundamentally different picture of the genetic architecture of many morphological traits in the domestic dog, A Simple Genetic Architecture Underlies Morphological Variation in Dogs:
I knew that Yao Ming’s parents are very tall. Though his father, at 6’7, arguably contributed less than his mother, at 6’3, which is farther above the female mean in standard deviation units. But check this out from Superfusion: How China and America Became One Economy and Why the World’s Prosperity Depends on It:
Yao had essentially been bred. Both his parents played basketball. His 6’2 [different height from Wikipedia -Razib] mother, Fang Fengdi, perhaps the tallest woman in China, had been married to an even taller man. She had served as a Red Guard during the height of the Cultural Revolution and had been an ardent Maoist. She enthusiastically participated in the glorious plan of the local government to use her and her husband to produce a sports superstar. The Shanghai authorities who encouraged the match had gone back several generations to ensure that size was embedded in the bloodline. The result was Yao, a baby behemoth who just kept getting bigger.