Tag: PCA

A zoom in on Western Eurasia

By Razib Khan | September 27, 2012 2:00 am

CATEGORIZED UNDER: Uncategorized
MORE ABOUT: PCA

Re-imagining genetic variation

By Razib Khan | September 26, 2012 12:39 am

To the left is a PCA from The History and Geography of Human Genes. If you click it you will see a two dimensional plot with population labels. How were these plots generated? In short what these really are are visual representations of a matrix of genetic distances (those distances being general FST), which L. L. Cavalli-Sforza and colleagues computed from classical autosomal markers. Basically what the distances measure are the differences across populations in regards to their genetics. The unwieldy matrix tables can be visualized as a neighbor-joining tree, or a two dimensional plot as you see here. But that’s not the end of the story.

In the past ten years with high density SNP-chip arrays instead of just representing the relationship of populations, these plots often can now illustrate the position of an individual (the methods differ, from components analysis or coordinate analysis, to multi-dimensional scaling, but the outcomes are the same).

 

Read More

CATEGORIZED UNDER: Genetics, Genomics
MORE ABOUT: PCA

The genetic world in 3-D

By Razib Khan | March 24, 2011 6:44 pm

When Zack first mooted the idea of the Harappa Ancestry Project I had no idea what was coming down the pipe. I wonder if his daughter and wife are curious as to what’s happened to their computer! Since collecting the first wave of participants he’s been a result generating machine. Today he produced a fascinating three dimensional PCA (modifying Doug McDonald’s Javascript) using his “Reference 1″ data set. He rescaled the dimensions appropriately so that they reflect how much of the genetic variance they explain. The largest principal component of variance is naturally Africa vs. non-Africa, the second is west to east in Eurasia, and the third is a north to south Eurasian axis.

I decided to be a thief and take Zack’s Javascript and resize it a bit to fit the width of my blog, blow up the font size, as well as change the background color and aspects of positioning. All to suit my perverse taste. You see the classic “L” shaped distribution familiar from the two-dimensional plots, but observe the “pucker” in the third dimension of South Asian, and to a lesser extent Southeast Asian, populations.

Read More

CATEGORIZED UNDER: Anthroplogy, Genetics, Genomics

D.I.Y. PCA

By Razib Khan | February 11, 2011 1:50 am

Long time readers know that I have a fixation on people not taking PCA too literally as something concrete. Tonight I finally merged the HGDP data set with some of the HapMap ones I’ve been playing with, and tacked my parents onto the sample. I took the ~50 HGDP populations, added the Tuscans, the two Kenyan groups, and the Gujaratis, and merged them. I thinned the marker set to 105,000 SNPs (I had to flip the HGDP strand too). Then I just let Eigensoft do its magic, and 2 hours on I produced my own plot. I’m still getting a hang of the labeling issues, but first let’s look at what 23andMe produces (I’m green):

Now let’s see what I outputted:

I suspect that the gap between my parents and the main South Asian cluster is just an artifact of the lack of South and East Indians in the sample. Additionally, things would look different if I removed the Africans, since the first principal component would be freed up. More on that later. All in all, still pretty awesome that circa 2011 this sort of thing is just an evening’s concentration.

CATEGORIZED UNDER: Genetics, Genomics
MORE ABOUT: Genetics, Genomics, PCA

Visualizing variation, input → output

By Razib Khan | January 26, 2011 2:11 pm

I have noted a few times that one thing you have to be careful about in two dimensional plots which show genetic variance is that the dimensions in which the data are projected upon are often generated from the data itself. So adding more data can change the spatial relationships of previous data points. Additionally, in 23andMe’s global similarity advanced plot you are projected onto the dimensions generated from the HGDP data set. There are some practical reasons for this. First, it’s computationally intensive to recalculate components of variance every time someone is added to the data set. Second, it isn’t as if the ethnic identity of any given individual is validated. What would you do if an alien sent in a kit and spuriously put “French” as their ancestry?

So, in reply to this comment: “Let me rephrase: is there any difference when you switch to the world-wide plot? I imagine not, or you would’ve mentioned it.” Actually, there is a slight difference. Below on the right you have a “world view,” with my position being marked with green, and on the left a “zoom in” for Central/South Asia in the HGDP data set.

Read More

Just pushing buttons

By Razib Khan | August 24, 2010 12:05 am

Mike the Mad Biologist, whose bailiwick is the domain of the small, asks in the comments:

I don’t mean to bring up a tangential point to the post, but why does the field of human genetics use PCA to visualize relationships? When I see plots like those shown here that have a ‘geometric pattern’ to them (the sharp right angles; another common pattern is a Y-shape), that tells me that there are lots of samples with zeros for many of the Y-variables (i.e., alleles that are unique to certain populations). Thus, the spatial arrangement of the points is largely an artifact of an inappropriate method: how does one calculate a correlation matrix when many of things one is correlating have values of zero?

If one really was keen on using PCA, one could calculate a pairwise distance matrix and then use that instead of the correlation matrix (Principal Coordinates Analysis).

Since I know some human geneticists do read this weblog, I thought it was worth throwing the question out there.

CATEGORIZED UNDER: Genetics
MORE ABOUT: Analysis, PCA, Tools
NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!
ADVERTISEMENT

See More

ADVERTISEMENT

RSS Razib’s Pinboard

Edifying books

Collapse bottom bar
+

Login to your Account

X
E-mail address:
Password:
Remember me
Forgot your password?
No problem. Click here to have it e-mailed to you.

Not Registered Yet?

Register now for FREE. Registration only takes a few minutes to complete. Register now »