Several companies provide tests that can confirm whether adoptees are related to individuals they already know. Others cast a wider net by plugging DNA results into databases that contain tens of thousands of genetic samples, provided mostly by people searching for their ancestral roots. The tests detect genetic markers that reveal whether people share a common ancestor or relative.
Some experts on adoption and genetics have criticized ancestry and genealogy testing companies, saying they are, at times, connecting people whose genetic links are tenuous — in effect stretching the definition of a relative. Nevertheless, the growing popularity of the tests, combined with social media sites that connect people day to day, has given some adoptees a sense of family that feels tangible, intimate and immediate.
Image Credit: Anirudh Koul
One of the great things about the mass personal genomic revolution is that it allows people to have direct access to their own information. This is important for the more than 90% of the human population which has sketchy genealogical records. But even with genealogical records there are often omissions and biases in transmission of information. This is one reason that HAP, Dodecad, and Eurogenes BGA are so interesting: they combine what people already know with scientific genealogy. This intersection can often be very inferentially fruitful.
But what about if you had a whole population with rich robust conventional genealogical records? Combined with the power of the new genomics you could really crank up the level of insight. Where to find these records? A reason that Jewish genetics is so useful and interesting is that there is often a relative dearth of records when it comes to the lineages of American Ashkenazi Jews. Many American Jews even today are often sketchy about the region of the “Old Country” from which their forebears arrived. Jews have been interesting from a genetic perspective because of the relative excess of ethnically distinctive Mendelian disorders within their population. There happens to be another group in North America with the same characteristic: the French Canadians. And importantly, in the French Canadian population you do have copious genealogical records. The origins of this group lay in the 17th and 18th century, and the Roman Catholic Church has often been a punctilious institution when it comes to preserving events under its purview such as baptisms and marriages. The genealogical archives are so robust that last fall a research group input centuries of ancestry for ~2,000 French Canadians, and used it to infer patterns of genetic relationships as a function of geography, as well as long term contribution by provenance. Admixed ancestry and stratification of Quebec regional populations:
Population stratification results from unequal, nonrandom genetic contribution of ancestors and should be reflected in the underlying genealogies. In Quebec, the distribution of Mendelian diseases points to local founder effects suggesting stratification of the contemporary French Canadian gene pool. Here we characterize the population structure through the analysis of the genetic contribution of 7,798 immigrant founders identified in the genealogies of 2,221 subjects partitioned in eight regions. In all but one region, about 90% of gene pools were contributed by early French founders. In the eastern region where this contribution was 76%, we observed higher contributions of Acadians, British and American Loyalists. To detect population stratification from genealogical data, we propose an approach based on principal component analysis (PCA) of immigrant founders’ genetic contributions. This analysis was compared with a multidimensional scaling of pairwise kinship coefficients. Both methods showed evidence of a distinct identity of the northeastern and eastern regions and stratification of the regional populations correlated with geographical location along the St-Lawrence River. In addition, we observed a West-East decreasing gradient of diversity. Analysis of PC-correlated founders illustrates the differential impact of early versus latter founders consistent with specific regional genetic patterns. These results highlight the importance of considering the geographic origin of samples in the design of genetic epidemiology studies conducted in Quebec. Moreover, our results demonstrate that the study of deep ascending genealogies can accurately reveal population structure.
Last spring I posted ‘Beyond visualization of data in genetics’ in the hopes that people wouldn’t take PCA too far in assuming that the method was a reflection of reality in a definite fashion. Remember, PCA visualizations are showing you two, and at most three, dimensions in genetic variation within the data set at any given time. The fine print is important; e.g., “PC 1 15%”, “PC 2 4.5%”, etc., which points to the magnitude of the dimensions within the data. You see the largest, and likely historically most significant on a population wide scale, genetic variances, but there’s still a large remainder left over. But when I look at referrals from message boards people obviously aren’t careful with what PCA is telling them.
As an illustration, in the 23andMe user interface you can “compare genes” genes across people who you “share genes” with. This comparison operates over ~550,000 single nucelotide polymorphisms out of 3 billion base pairs (you can constrain it to traits, but I’m going to talk about the comparison to the whole data set below). For example, a man of European descent shares 83.2% with his daughter, who is Eurasian (the mother is Burmese, with some recent Indian admixture). Another man of European descent shares 84% with his daughter, whose mother is also European (in fact, both parents are western European). The “gene sharing” with other people of European descent of these two men is in the 75-74% range (for reference, a Chinese person is 71%, and Nigerian 68.5%). On the PCA plot the European and his Eurasian daughter are very far apart, while the European man and his European daughter cluster together. What you’re seeing on the PCA chart is population level information, not the genetic uniqueness within families and across parents and offspring.