When little differences matter a great deal

By Razib Khan | March 31, 2012 11:37 pm

In the comment below Clark alludes to the fact that Jonathan Haidt kept reiterating that even if there were differences between populations due to recent evolution, if it was due to selection on standing variation upon quantitative traits then the between group variation would be dwarfed by within group variation. He didn’t quite say it like that, but I’m sure that’s what he meant. For example, there is now evidence that alleles which can explain the small height difference between Northern and Southern Europeans have been subject to natural selection. Most of the variation obviously remains within the groups; you can’t guess that someone is Italian or Dutch just based on their height. There are many tall Italians, and many short Dutch. But on average there are differences between the groups which can be attributed to genes, and those genes seem to have been targets of selection.

This is good as fair as it goes…but small average differences may not necessarily be marginal. That is because sometimes you select from the tails of a distribution. For example, if you want to ascertain which population will produce more N.B.A. players, it is less important that there is a small average differences, so the populations mostly overlap, than that that average difference can result in a large disproportion at the tails of the distributions.

In the context of Jonathan Haidt’s argument, let’s talk about altriusm. Imagine that there is an altruism scale from 0 to 200, with a mean about 100. The standard deviation is 25, which implies that only ~2 percent of the population will be more than 150 in altruism, or less Â 50 in altruism (good in the latter case). Now let’s call this population A. Imagine a population B, which differs only in that the mean altruism is 10 instead of 0. This is not that large of a difference, less than half a standard deviation. But what’s the difference at 2 standard deviations? Below is a plot of the two putative populations, with a line at the 2 standard deviation mark for population A:

As you can see population A and B overlap a great deal. But at 150 altruism 2.2 percent of population A is above that threshold, while 5.5 percent of population B is above it. A factor of 2 difference. At three standard deviations the difference becomes a factor of 3.5. Why does this matter? Because there are some models of social change which are predicated upon small exceptional minorities. Haidt seems to be minimizing inter-group differences by emphasizing their small aggregate difference. But for many traits the exceptional few matter much more than the banal and pedestrian many. Small differences in distribution might be the difference between the existence or non-existence of these marginal slivers of the distribution.

So, for example, many would suggest that Mother Theresa was a representative of extreme sot of altruism (yes, I am aware of Christopher Hitchens’ book on this subject, I am referring here to the public perception). She was an ethnic Albanian. One might explain the fact that she was Albanian by supposing that this population is ever so slightly more altruistic than the norm! Where, after all, are the Mother Theresa’s of the Kalahari Bushmen?*

* For the literal minded (e.g., Onur) I am joking here.

CATEGORIZED UNDER: Anthroplogy
• Dr Duck

This is an important and under appreciated point in regard to the interpretation of many social data. In some work that I did on disadvantage and crime many years ago, census indicators such as the proportion of single parent families and the proportion of the population who claimed to be indigenous correlated very strongly with crime rates. Not because of a simple causal link, but simply because these were sensitive indicators of the extreme tail of the disadvantage distribution.

• http://emilkirkegaard.com Emil

You appear to have fucked up the numbers. You write that you use a mean of 0, but actually you use a mean of 100 (same as IQ). The mentioning of -150 (should be 50) and 150 is particularly strange as with a mean of 0 and SD of 25 are really, really far out, not just 2SD (i.e. 6SD).

Anyway, the point you want to make is correct (ofc). This is applicable for the recent discussion about equality in Scandinavia (especially Sweden and Norway) and in EU where many feminists want to enforce women quotas. However, if there is a small sex difference in g, then it looks very different at the extremes (say >2SD). Such a small sex difference appears to exist, with a size of 3-4 g. Even if we ignore differences in variation (males seem to have +~15% variation), there is a huge effect on the relative frequency of women and men in the >2SD area.
Using a difference of 4 g (i.e. means of 98 and 102) and no differences in variance, the relative frequency of >2SD g is about 2:1 , men per women. This fits with data from Mensa memberships in Denmark.

• Justin Giancola

The end is great! no offense Onur

Here’s the 2009 Edge piece by Haidt on “fast evolution” mentioned in Razib’s earlier post:
http://www.edge.org/q2009/q09_4.html#haidt

And here’s another, somewhat more recent (2010) CHE piece:
http://chronicle.com/article/Fast-Evolution/124128/

The book features more extended discussion (with citations).

• marcel

1) I’m with Emil about your numbers: as expressed the example is strangely (for you, anyway) incoherent. I’m guessing (I emphasize guessing) that lack of sleep from living with and caring for a new born is having an effect. ðŸ˜‰

2) This is good as fair as it goesâ€¦but small average differences may not necessarily be marginal. That is because sometimes you select from the tails of a distribution. For example, if you want to ascertain which population will produce more N.B.A. players, it is less important that there is a small average differences, so the populations mostly overlap, than that that average difference can result in a large disproportion at the tails of the distributions.

This bears some resemblance to the point Larry Summers so inarticulately attempted to articulate, although IIRC he was focusing on differences in variance rather than small differences in the mean. Either way (means, variances), it’s difficult to argue against if you take statistics and probability at all seriously.

A 2nd order point to consider in your example is relative population sizes. One can imagine that Han Chinese may be, on average, slightly shorter than the US sub-population from which NBA players are drawn (after correcting for current environmental influences like standard of living). According to this premise (I hesitate to dignify it by the term hypothesis), even after China is fully developed, the average heights of these two groups will differ with that advantage adhering to the US. However, if there continue to be so many more Han Chinese, most NBA players may eventually be Han Chinese because among the tallest people in the world, Han Chinese will be the most numerous.

• http://blogs.discovermagazine.com/gnxp Razib Khan

#2, oversight with my use of R’s plot(). but i just changed ’em.

• marcel

will be more than 150 in altruism, or -150 in altruism

May have “changed ’em” but still not completely fixed! That you are still lurking here suggests to me that dave chamberlin, pconroy, and antonio pedro are correct.

• Chuck

“then the between group variation would be dwarfed by within group variation.”

What is typically implied when people cite the small between group variation is that the between group differences are also small. In the social sciences, group differences are commonly reported in terms of effect size (e.g., Cohen’s d). Commonly, a Cohen’s d of 0.8 is interpreted as large. Yet, an effect size of 0.8, between equally numerous populations with equal standard deviations, translates to a between group variance of around 15%. So is that variance really small? I find that it is said to be so when genes are discussed, but oddly not when social outcomes differences are. I guess that that’s the politics of quantification.

• Euler

I’ve seen these kinds of arguments about tails before, but there something that always bothers me about them.

Data often turns out to be normally distributed. The central limit theorem implies that in the limit where a random variable is the sum of an infinite number of infinitely small independent random variables, the distribution of the sum is Gaussian. In real life, a lot of quantitative outcomes are the result of many small inputs that are nearly additive and nearly independent. It is reasonable then to expect a nearly Gaussian distribution.

The problem is that the definition of convergence (convergence in distribution) applies to the absolute difference between the two distributions getting close to zero. If the tails of two distributions are both close to 0, then they will be close to each other, even if one is 1,000,000 times greater than the other (e.g. 10^-100 and 10^-106).

You should not expect the tails to be Gaussian in any meaningful sense. For instance, a Gaussian distribution would imply that there is some chance of a person having a negative height or SAT score. If you decide to quantify a trait in such a way that the distribution is Gaussian in some reference population (e.g. the entire population) by definition, you should still not expect it to be Gaussian in a different population or sub-population. The truth is that you don’t know what the distribution will be like at the tails. The mean and s.d. are not enough information.

In many cases, while the factors affecting the bulk of the population are close to small, additive, and independent, the factors affecting the tail are not. For instance, the tallest person in the world usually has some problem with his pituitary gland, like a tumor, which is a single factor with a massive impact. For this reason, it is not so surprising if the tallest person in the world comes from some country where the average height is lower than in other countries. For instance, there was a time when the world’s tallest person lived in China, where the average is about 2-3 inches less than in the U.S., or about 1 standard deviation. Just the mean and s.d. are not enough to tell you about who will get pituitary gland tumors. The C.L.T. is useful, but it does not reduce all statistical distributions to two numbers.

• Chris T

I find that it is said to be so when genes are discussed, but oddly not when social outcomes differences are. I guess that thatâ€™s the politics of quantification.

The standard response when this topic comes up is to try to confuse the issue by recasting it in terms of individuals – where intra-group differences are far more important. However, this does not stop policies from being implemented at the population level – where inter-group differences become far more important.

• Bryan

Re: guessing Italian versus Dutch by height alone.

If two groups show significant mean differences (even small ones), then by definition one can predict Italian / Dutch just by height. The accuracy would be less than perfect but better than chance…

B

• Violet

Re: #9 Euler

Following those thoughts, I am also wondering about the limitations of normal distribution in the right hand side tail. With regards to IQ, normal distribution places no limit on the upper bound, but there should be some bound due to the physical limitations such as size of cranium or other brain matter features. No?

• Euler

#12
IQ scores are not scaled to grow linearly with any physical property of the brain, so that would not present a limit to the scale. The maximum on the scale is limited by the number of people who take the test when it is scaled. The way modern scales are defined, the distribution for the whole population is forced into a Gaussian distribution with mean 100 and s.d. 15. (That’s what I had in mind with my comment about the distribution of a reference population being defined as Gaussian.)

That means if, say 26 people are used to norm the test, named A,B,C…Z, and the raw scores come out with A<B<C…<Z, then to convert the raw scores into IQ, M and N are given an IQ close to 100. In the Gaussian distribution 84.1% of people have to get less than 1 s.d. above the mean, so V gets about 115, and D gets 85. With 26 people the most deviation from the mean is about 1.77 s.d., so the highest possible score is about 127 and the lowest is about 73.

In principle, since there are something like 7 billion people in the world, if an IQ test was normed on everyone in the world, the maximum IQ would be around 194.5 and the minimum would be 5.5. This would be the case no matter how well the person with the highest IQ did on the test. In practice IQ tests are not normed with nearly that many people, so IQ scores stop meaning anything well before you hit 194.5.

When a population that is different in distribution from the one the test was scaled on, the distribution of that population does not need to be Gaussian. So if you gave the test to, say, only high school graduates, you would likely get a non-Gaussian distribution. The left tail might be truncated, for instance.

This even applies when the tests are re-normed periodically to reflect changes over time in the whole population. For instance, a few years ago James Flynn did a study of British teenagers where a reversal of the Flynn effect was found between 1980 and 2008. You can read about it here
http://www.prospectmagazine.co.uk/2012/03/intelligence-quotient-james-flynn/
It was more pronounced at the higher end, with the mean dropping by 2 points, but it dropped by 6 points in the top half of the curve. This asymmetry means that the 2008 teenager’s scores had a non-Gaussian distribution for the 1980 test, but for 2008 scaling, this is put back to Gaussian by definition.

• Violet

#13 Thank you for the details on mapping of test scores to IQ numbers.
If I understand correctly, IQ numbers tell us the number of standard deviations one is away from the mean in the normal distribution (instead of using standard normal distribution and making life easy, it has mean = 100, s.d. = 15, right?). One has to have a huge sample to get a realization far in to the tails — say 4-sigma or 5-sigma away from mean.

Does this imply that the sample used to norm the IQ test is critical? So, any test score -to-IQ conversion is dependent on that reference sample?

I am still a bit confused about the re-normed tests. Would the reference population for test-to-IQ conversion be changed for update?

For example, if the test scores for a large reference sample (1980) follow a lognormal (with lowest score as zero), in order to get IQ, the lognormal is transformed to normal. If the test scores of a different population (2008) is a gamma distribution, then the test-to-IQ conversion is done with the assumption that they are still following lognormal. This converted IQs won’t follow the normal distribution. Is that what you are saying?

Instead, if the gamma to normal transformation is performed with the 2008 population, then IQ of 2008 population would become Gaussian again.

• Euler

14
Yes, basically everything you said is right. The only clarification I would make would be in those examples the lognormal and gamma distributions would be for the raw score, and the IQ score then ends up coming out either normal or not normal depending on which reference population is used for the scale. I’m sure that’s what you meant anyway, I just wanted to make explicit when we are talking about a raw score and when we are talking about an IQ score.

• Violet

# 15, Yeah, I meant raw scores when I said test scores.

But this discussion has been an eye-opener for me in seeing where the assumptions about raw score distributions are entering in the IQ comparisons.

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!