Last month I noted that a paper on speculative inferences as to the phylogenetic origins of Australian Aborigines was hampered in its force of conclusions by the fact that the authors didn’t release the data to the public (more accurately, peers). There are likely political reasons for this in regards to Australian Aborigine data sets, so I don’t begrudge them this (Well, at least too much. I’d probably accept the result more myself if I could test drive the data set, but I doubt they could control the fact that the data had to be private). This is why when a new paper on a novel phylogenetic inference comes out I immediately control-f to see if they released their data. In regards to genome-wide association studies on medical population panels I can somewhat understand the need for closed data (even though anonymization obviates much of this), but I don’t see this rationale as relevant at all for phylogenetic data (if concerned one can remove particular functional SNPs).
Yesterday I noticed PLoS Genetics published a paper on the genomics of Middle Eastern populations, Genome-Wide Diversity in the Levant Reveals Recent Structuring by Culture. The results were moderately interesting (I’ll review the paper in detail later), but bravo to the authors for putting their new data set online. The reason is simple: reading the paper I wanted to see an explicit phylogenetic tree/graph to go along with their figures (e.g., with TreeMix). Now that I have their data I can do that tonight, time permitting.
One major aspect of science is reproducibility. Because of capital outlays this is not always viable, and often occurs in a haphazard fashion. But with phylogenetics done on a computer this is less of an issue. I have a desktop at home devoted 99% to running data sets, in part for my own interest, and in part because I want to check the robustness of some of the inferences I see in papers like the ones above.