When you know how many ancestral populations there are with some accuracy, the program fits the data to the model without computational agony for the user. But, this program is not designed to figure out how many clusters there are in a set.

There are other statistical tools that simply look for clusters. PCA analysis and eyeballing the data is pretty good, although the problem it has that computer programs can solve, is that people have a very hard time seeing more than three or four dimensions at once (motion and colors and 3D can get you to five), while computers can see in many dimensions at once. When you have good reason to believe that a tree-like model is appropriate, there are some very good statistical computer programs that create tree-like clusters of data in a phylogenetic relationship.

Neither admixture nor cluster analysis tells you anything about how closely related the clusters are to each other in an absolute sense as opposed to relative to each other. Tools like Fst measure that aspect.

There is also nothing wrong with going into statistical analysis with strong Baysean priors about how you expect the data to come out IF YOUR PRIORS ARE ACCURATE. A lot of the time in anthropology, your priors may actually be more accurate than your main data set. You may know exactly how many ancestral populations there are and when they came along, but not what they looked like genetically.

Indeed, statistics are at their most powerful when you ask them simple questions. For example, the statistics of hypothesis testing, where one compares a small number of possibilities for likelihood (e.g. did modern European populations dervive from predominantly hunter-gatherer populations, predominantly from LBK agriculturalists or predominantly from some other source) can have much more power at resolving a question in a way that supercedes your biases about the choices than when you ask them open ending questions without clear choices.

One problem with the statistics of dating divergence dates from genetic mutation rates is that the priors that are used to calibrate the dating aren’t very good themselves.

]]>As far as the analysis is concerned, in the first series, the pattern of three distinguishable populations is fairly obvious in the first chart and continues in the others. The second group isn’t quite as clear. If the higher Ks are trying to tell me anything, it’s not at all obvious what that should be. On the other hand, I’ve long been an advocate of the viewpoint that staring at anything for too long will show you patterns that simply aren’t there.

]]>In a larger sense isn’t this why we have real statistical tests? One has to =beforehand= decide what they are looking for and come up with some number for a test where they could say “I found it!” or “It isn’t there” or “I can’t tell”. What you are doing by playing is absolutely necessary to get the “some number” I mentioned above. In the end, in most cases, that number and test are really based on playing around.

]]>I was suspicious all along, but since I didn’t have an inside understanding of the science my opinions were worthless. I did guess right though. There was a pattern whereby increasing mathematical virtuosity made the science intelligible to increasingly fewer people. A similar pattern was seen in the business world, where unemployed mathematical physicists were hired by finance to produce increasingly sophisticated financial instruments which no one could understand. It turns out that these instruments were booby-trapped, as we now see. It really happened twice, just recently and in 1998 with Long Term Capital Management, which had two Nobelists on the board of directors.

But it’s not really that if political preferences were finally excluded econ would be science. Political preferences can’t be excluded, after 75 years of trying, and it’s always going to be political, since econ is an applied science.

]]>