DISCOVER Magazine. Science, Technology and The Future
Current Issue
Subscribe Today »
  • Renew
  • Give a Gift
  • Archives
  • Customer Service
  • Facebook
  • Twitter
  • Newsletter
  • Health & Medicine
  • Mind & Brain
  • Technology
  • Space
  • Human Origins
  • Living World
  • Environment
  • Physics & Math
  • Video
  • Photos
  • Podcast
  • RSS
Gene Expression
« How Google Wave can work
What average human heights don’t tell »

How Columbus was not a seer

A week ago I pointed out that in some visualizations of world wide population variation South Asians & mestizos seem to overlap which each other to a great extent. The reason for this is that both populations can be modeled as admixtures between two separate, but related, populations. Mestizos are the products of pairings between Europeans and indigenous America populations, while South Asians seem to be a stabilized hybrid population which emerged from the fusion of a West Eurasian (closely related to European) and East Eurasian (distantly related to East Asians) populations. The East Eurasian ancestors of South Asians may be distantly related to indigenous American populations, but in a world wide scale the relationship is relatively close (i.e., compared to Europeans vs. indigenous Americans). So when mapped onto a plot of genetic variation incorporating world wide populations South Asians and mestizos naturally resemble each other. That said, a commenter observes:

Great example of how two dimensions lose information.
Given how different the two populations are genetically, guarantee that the third component separates them pretty cleanly.

Correct. A new paper illustrates this. Magnitude of Stratification in Human Populations and Impacts on Genome Wide Association Studies:

Genome-wide association studies (GWAS) may be biased by population stratification (PS). We conducted empirical quantification of the magnitude of PS among human populations and its impact on GWAS. Liver tissues were collected from 979, 59 and 49 Caucasian Americans (CA), African Americans (AA) and Hispanic Americans (HA), respectively, and genotyped using Illumina650Y (Ilmn650Y) arrays. RNA was also isolated and hybridized to Agilent whole-genome gene expression arrays. We propose a new method (i.e., hgdp-eigen) for detecting PS by projecting genotype vectors for each sample to the eigenvector space defined by the Human Genetic Diversity Panel (HGDP). Further, we conducted GWAS to map expression quantitative trait loci (eQTL) for the ~40,000 liver gene expression traits monitored by the Agilent arrays. HGDP-eigen performed similarly to the conventional self-eigen methods in capturing PS. However, leveraging the HGDP offered a significant advantage in revealing the origins, directions and magnitude of PS. Adjusting for eigenvectors had minor impacts on eQTL detection rates in CA. In contrast, for AA and HA, adjustment dramatically reduced association findings. At an FDR = 10%, we identified 65 eQTLs in AA with the unadjusted analysis, but only 18 eQTLs after the eigenvector adjustment. Strikingly, 55 out of the 65 unadjusted AA eQTLs were validated in CA, indicating that the adjustment procedure significantly reduced GWAS power. A number of the 55 AA eQTLs validated in CA overlapped with published disease associated SNPs. For example, rs646776 and rs10903129 have previously been associated with lipid levels and coronary heart disease risk, however, the rs10903129 eQTL was missed in the eigenvector adjusted analysis.


The main point of the paper is to smoke out population substructure which might generate spurious false positives in health-related genome-wide association studies. The problem is pretty obvious. Imagine you have a medical study with a lot of blacks and whites, and you just assume they’re all genetically basically the same. Then you look for associations of particular genetic variants within the population which has disease X. Of course, it could be that blacks or whites tend to have more of disease X than the other population, and, it turns out hat blacks and whites also tend to differ on a whole lot of genes. Modern human population genetics might have “disproved race,” but it sure is very interested in “population substructure.”
Patterns of between population variation can be visualized by extracting out the independent dimensions of variance, and plotting them against each other. Generally the charts I post on this illustrate the two dimensions which can explain the most variance in the data set (the alleles frequencies across all the SNPs in this case), principal components 1 and 2. But the comment above highlights that there are many other dimensions, though they explain less of the variance.
One issue that the authors of the above paper pinpoint is that the nature of these dimensions are sensitive to the populations which you include in your original data set to generate them. They distinguish here between the dimensions generated from the full HGDP data set, which includes ~50 world populations, and visualizations which rely only on one population. In this study they project their own samples of European, African and Hispanic Americans on the dimensions extracted out of the HGDP data set, and also onto dimensions generated from the populations themselves. As an example, consider Hispanic American projected upon the dimensions of variation constructed from Asians, Africans and Europeans, or, Hispanic Americans projected upon the dimensions of variation extracted from only the variance extant within their own population. From what I could tell they actually didn’t find that correcting for total genome variation using these two methods was particularly helpful in generating greater clarity as to the role of population substructure in producing false positives. So let’s focus on on the visualizations, which go back to the title of the post.
The first chart has PC 1 & PC 2 from the HGDP populations, with their sample of about 50 African Americans projected onto it:
col1.png
Pretty much zero surprise here. I would be willing to assume that the self-identified African American who clusters with Europeans is an error of some sort (e.g., a sample mix-up), but other studies show the same tendency quite frequently. I conclude then that there are actually people who are inadvertently “passing” as black, at least culturally (on the outside they probably look whiter than G. K. Butterfield).
The second chart now has PC 1 & PC 3. So the dimension of variation which explains the second largest proportion of variance has now been replaced by the dimension which explains the third largest proportion.
col2.png
Now Native Americans are distinct from East Asians in the HGDP sample. This is because of PC 3. This goes to the commenter’s point that looking at more dimensions of variation gives us a better sense of real population differences.
Jumping back to PC 1 and 2, but with Hispanics projected onto the HGDP generated space:
col3.png
I don’t know the provenance of the Hispanics, but it looks to me that they’re likely to include many Puerto Ricans, seeing as there’s a large amount of African admixture here. Nevertheless, you still see the overlap between Hispanics and South Asians that you did with the Gujarati-Mexican comparison, though attenuated. So let’s look at PC 1 & PC 3.
col4.png
And yes, all of a sudden mestizos and South Asians do not overlap, and in fact South Asians are further from mestizos than Europeans or Middle Easterners. One could have predicted this from the previous chart.
Finally, I want to round out the inspection by looking at two charts which project European Americans onto PC 1, PC 2 and PC 3. The European Americans are black points.
col5.png
col6.png
Note that European American outliers seem to have a bias toward drifting in the direction of the Native Americans and African Americans. I don’t discount the possibility of errors here, but it is important to note that deviations away from the HGDP European cluster in the last chart are toward the two groups which European Americans have historically been in contact with in North America.
Note: The subjects specific to this study seem to have been resident in the eastern half of the United States. This would tend to support my supposition that they are less likely to be Mexican Americans, and more likely to be Puerto Rican or Cuban Americans, if they were Hispanic.
Citation: Hao K, Chudin E, Greenawalt D, Schadt EE (2010) Magnitude of Stratification in Human Populations and Impacts on Genome Wide Association Studies. PLoS ONE 5(1): e8695. doi:10.1371/journal.pone.0008695

Share

January 13th, 2010 by Razib Khan in Genetics | 6 comments | RSS feed | Trackback >

6 Responses to “How Columbus was not a seer”

  1. 1.   John Emerson Says:
    January 13th, 2010 at 7:06 am

    I believe that in some contexts this is called “churning” — the loss of information and approach to entropy. I’ve talked to linguists about whether something like this could happen to a language, making its antecedents unrecoverable. Two processes that happen to language are creolization and the development of a Sprachbund”.
    In the first, a language is stripped down to a minimum for contact with foreigners and made into a pidgin trade language which usually has vocabulary from two or several languages and a structure which is characteristic of most pidgins, but not necessarily of either donor language. The pidgin then becomes a creole when residents of the trade center grow up speaking mostly pidgin and the language develops beyond its rudimentary beginnings.
    The Sprachbund is the trading back and forth of features between neighboring languages which are not historically related (from different language groups). One example is Romanian, which has picked up features from the surrounding Slavic languages — the Balkan Sprckbund. Another is East Asia, where it is now thought that languages from several different unrelated or distantly related language groups (Austronesian, Tibeto-Burman, and maybe others) have picked up enough common features to make up a sort of adoptive family. The Sprachbund wikis are worth reading.
    My theory was that of an unwritten language has endured, say, two cycles each of creolization and sprachbundization during (say) three thousand years (not impossible) ancestors before that time might be unrecoverable. It would essentially be a new isolate (as opposed to a survivor isolate).
    The two processes are not unrelated, either. Every creole language would be part of a sprachbund comprised by its neighboring languages.
    There’s an example of this in the novel “The Good Soldier Schweik”. Over the period of a century or more educated Czechs had picked up a German-type pronoun usage, whereas uneducated Czechs tended to stick to the Czech form, and nationalist Czechs insisted on the old form.
    These factors might throw a monkey-wrench into attempts to build superfamilies larger than the known families (most famously Nostratic). They don’t in any way discredit the established families (Indo-European, Semitic, Bantu, Malayo-Polynesian) but make work going beyond them difficult or impossible. The Turkish-Mongol-Manchu family has been questioned, though, and the language relationships of SE Asia are still up in the air.

  2. 2.   bioIgnoramus Says:
    January 13th, 2010 at 10:22 am

    Razib, I wonder whether it would ever be illuminating to plot (some) genetic data on a triangular diagram, rather than on multiple rectangular diagrams?
    http://en.wikipedia.org/wiki/Ternary_plot

  3. 3.   razib Says:
    January 13th, 2010 at 2:56 pm

    bio, follow the link to puerto ricans. there’s a ternary plot in there.

  4. 4.   Charles Iliya Krempeaux Says:
    January 13th, 2010 at 3:48 pm

    Why don’t people also do 3D plots? With computers, it really isn’t that difficult.
    And if the data isn’t too dense, then taking a snapshot of the 3D plot from a small set of different angles should be quite telling. (Or, you could make interactive 3D plots, that could rotate, hide and show data, etc.)

  5. 5.   razib Says:
    January 13th, 2010 at 4:00 pm

    they often don’t look that good on 2-D paper i think. OTOH, seems like there’d be a good place for it in the supplemental information with visualization software.

  6. 6.   Melykin Says:
    January 18th, 2010 at 7:28 pm

    Maple can make 3d scatter plots. You can move them around with the mouse to look at them from different angles.





    • About Gene Expression

      Razib Khan’s degrees are in biochemistry and biology. He has blogged about genetics since 2002, previously worked in software development, is an Unz Foundation Junior Fellow and lives in the western US. He loves habaneros.

    • Search

    • Recent Comments

      • Jason G. Goldman on Kkkhhhaaannn!!!
      • Wulf Kurtoglu on The social and biological construction of race
      • Donn on The Iranian Genome Project
      • Razib Khan on The Iranian Genome Project
      • Donn on The Iranian Genome Project
    • Must Read List

      • Principles of Population Genetics
      • Quantitative Genetics
      • The Horse, the Wheel, and Language
      • Albion's Seed
      • The Blank Slate
    • Links

      Blogroll

      Blogroll

      • A Replicated Typo
      • Archives at unz.org
      • Brown Pundits
      • Deep Sea News
      • Dienekes
      • Gene Expression Classic
      • Harappa Ancestry Project
      • John Hawks
      • Less Wrong
      • Randall Parker
      • Razib on Books
      • Razib's Aggregator Blog
      • Secular Right
      • Sepia Mutiny
      • Steve Sailer
      • West Hunter
      Q & A

      Q & A

      • A. W. F. Edwards
      • Adam K. Webb
      • Armand Leroi
      • Bruce Lahn
      • Charles C. Mann
      • Charles Murray
      • Dan Sperber
      • David Haig
      • Heather Mac Donald
      • Hugh Pope
      • James F. Crow
      • John Derbyshire
      • Jon Entine
      • Judith Rich Harris
      • Justin L. Barrett
      • Ken Miller
      • Matthew Stewart
      • Parag Khanna
      • Peter Turchin
      • Warren Treadgold
      Books

      Books

      • 1491
      • 1848
      • A Beautiful Math
      • A Concise Economic History of the World
      • A Farewell to Alms
      • A History of Christianity
      • A History of Iran
      • A History of the Byzantine State and Society
      • A Reason for Everything
      • A Separate Creation
      • A Splendid Exchange
      • A Theory of Religion
      • A World History
      • Aboriginal Australians
      • Adaptation and Natural Selection
      • After Tamerlane
      • After the Ice
      • Age of Abundance
      • Albion's Seed
      • American Judaism
      • Banana
      • Before the Dawn
      • Behavioral Genetics in the Postgenomic Era
      • Biometry
      • Blood of the Isles
      • Bones, Stones and Molecules
      • Born That Way
      • Calculus Made Easy
      • Castes of Mind
      • Catholicism and Freedom
      • Causes of Evolution
      • Children of the Revolution
      • China in World History
      • China's Cosmopolitan Empire
      • China: A New History
      • Clash of Extremes
      • Contours of the World Economy 1-2030 AD
      • Darwin's Cathedral
      • Dawn of Human Culture
      • Deep Ancestry
      • Defenders of the Truth
      • Descartes' Baby
      • Divided by the Faith
      • Dragon Bone Hill
      • Empires and Barbarians
      • Empires of the Silk Road
      • Empires of the Word
      • End of the Bronze Age
      • Endless Forms Most Beautiful
      • Epistasis and Evolutionary Process
      • Europe
      • Europe After Rome
      • Europe Between the Oceans
      • Evolution
      • Evolution and the Genetics of Populations
      • Evolution for Everyone
      • Evolutionary Dynamics
      • Evolutionary Genetics
      • Evolutionary Human Genetics
      • Evolutionary Quantitative Genetics
      • Explaining Culture
      • Fooled By Randomness
      • Fourth Crusade & the Sack of Constantinople
      • Freedom Just Around the Corner
      • From Plato to Nato
      • Genetical Theory of Natural Selection
      • Genetics and Analysis of Quantitative Traits
      • Genetics and Origins of Species
      • Genetics of Populations
      • Genghis Khan & the Making of the Modern World
      • Genome
      • Geography of Thought
      • Global Capitalism
      • God's War
      • Grand New Party
      • Grooming, Gossip, and the Evolution of Language
      • Guns, Germs, and Steel
      • Historical Dynamics
      • History of Rome
      • How Pleasure Works
      • How Rome Fell
      • How We Decide
      • In Gods We Trust
      • In Search of the Trojan War
      • India: A New History
      • Infidels
      • Journey of Man
      • Keepers of the Keys of Heaven
      • Knowledge and the Wealth of Nations
      • Mapping Human History
      • Marketplace of the Gods
      • Mathematical Models in Biology
      • Molecular Evolution
      • Molecular Markers, Natural History, and Evolution
      • Mother Nature
      • Mutants
      • Narrow Roads of Gene Land 1
      • Narrow Roads of Gene Land 2
      • Narrow Roads of Gene Land 3
      • Natural Selection and Social Theory
      • Nature via Nurture
      • No Two Alike
      • Of Moths and Men
      • Origin and Evolution of Cultures
      • Origins of Theoretical Population Genetics
      • Out of Thin Air
      • Pandora's Seed
      • Plagues and Peoples
      • Population Genetics and Microevolutionary Theory
      • Population Genetics, Molecular Evolution, and the Neutral Theory
      • Postwar
      • Power and Plenty
      • Predictably Irrational
      • Prehistory of the Mind
      • Principles of Population Genetics
      • Pursuit of Glory
      • Quantitative Genetics
      • R.A. Fisher, the Life of a Scientist
      • Reading in the Brain
      • Religion Explained
      • Rome and Jersalem
      • Sailing to Byzantium
      • Sewall Wright and Evolutionary Biology
      • Sociobiology
      • Speciation
      • Statistical Methods in Molecular Evolution
      • Supernatural Selection
      • Survival of the Prettiest
      • Synaptic Self
      • Tempo and Mode in Evolution
      • The 10,000 Year Explosion
      • The Age of Confucian Rule
      • The Age of Lincoln
      • The Altruism Equation
      • The Ancestor's Tale
      • The Ascent of Money
      • The Barbarian Conversion
      • The Black Swan
      • The Blank Slate
      • The Classical World
      • The Creationists
      • The Cultural Origins of Human Cognition
      • The Darwin Wars
      • The Descent of Man
      • The Early Chinese Empires
      • The Essential Difference
      • The Evolutionists
      • The Faith Instinct
      • The Fall of Rome
      • The Fall of the Roman Empire
      • The g Factor
      • The Genetics of Human Populations
      • The Germanization of Early Medieval Christianity
      • The Great Arab Conquests
      • The Great Divergence
      • The Great Human Diasporas
      • The Great Upheaval
      • The History and Geography of Human Genes
      • The Horse, the Wheel, and Language
      • The Human Web
      • The Imitation Factor
      • The Invisible Gorilla
      • The Language Instinct
      • The Making of a Christian Aristoracy
      • The Math Gene
      • The Mating Mind
      • The Meme Machine
      • The Moral Animal
      • The Number Sense
      • The Nurture Assumption
      • The Origin of Species
      • The Origin Of The Mind
      • The Origins of Virtue
      • The Power of Babel
      • The Price of Altruism
      • The Red Queen
      • The Reformation
      • The Rise of Western Christendom
      • The Sacred Chain
      • The Selfish Gene
      • The Seven Daughters of Eve
      • The Stuff of Thought
      • The Symbolic Species
      • The Tenth Parallel
      • The Troubled Empire
      • The Vertigo Years
      • The Vikings
      • Throes of Democracy
      • Unknown Quantity
      • Unto Others
      • War and Peace and War
      • War, Wine, and Taxes
      • We Are Doomed
      • Wealth and Poverty of Nations
      • What Hath God Wrought
      • When Baghdad Ruled the Muslim World
      • When Genius Failed
      • Why Sex Matters
      • Why Some Like It Hot
    • Elsewhere on DISCOVER

      RSS Genetics in DISCOVER mag

      Genetics in DISCOVER

      • The Spider Assassin That Acts Like Prey and Cloaks Itself With Wind
      • How Did LEGO Become More About Limits Than Possibilities?
      • Top 100 Stories of 2011: #48: Strongest Repellent Found

      • Top 100 Stories of 2011: #35: Fossil Stirs Debate Over 
Dinosaurs’ Last Days
      • Top 100 Stories of 2011: #30: New Fossil Casts Doubt on Oldest Bird

      • Top 100 Stories of 2011: #63: How Many Species Inhabit the Earth?

      • Top 100 Stories of 2011: #74: Meet the Megavirus

      • Top 100 Stories of 2011: #61: Aging Effects 
Reversed in Mice

    • Gene Expression content

      RSS Recent Posts

      Recent Posts

      • Kkkhhhaaannn!!!
      • The social and biological construction of race
      • The Iranian Genome Project
      • Socialized personal genomics?
      • A personal note
      • Everlasting permanence
      • ChromoPainter & fineSTRUCTURE on a South Asian data set
      • Secular liberals the tip of the Islamist spear
      Categories

      Categories

      • Administration
      • Agriculture
      • Anthroplogy
      • Ask a ScienceBlogger
      • Barbarism
      • Behavior Genetics
      • Bioethics
      • Biology
      • Biotech
      • Blog
      • Books
      • Cognitive Science
      • Creationism
      • Culture
      • Data Analysis
      • Demographics
      • Development
      • Ecology
      • Economics
      • Education
      • Environment
      • Evolution
      • Evolutionary Genetics
      • Evolutionary Psychology
      • Fantasy
      • Food
      • Futurism
      • Genetics
      • Genomics
      • Geography
      • GSS
      • Health
      • History
      • Human Evolution
      • Human Evolutionary Genetics
      • Human Evolutionary Genomics
      • Human Genetics
      • Human Genomics
      • International Affairs
      • Linguistics
      • Medicine
      • Paleontology
      • Personal Genomics
      • philosophy
      • Politics
      • Population Genetics
      • Psychology
      • Quantitative Genetics
      • Religion
      • Science
      • Science Fiction
      • Select
      • Social Science
      • Space
      • Sports
      • Statistics
      • Technology
      • Transhumanism
      • Uncategorized
      Archives

      Archives

      • February 2012
      • January 2012
      • December 2011
      • November 2011
      • October 2011
      • September 2011
      • August 2011
      • July 2011
      • June 2011
      • May 2011
      • April 2011
      • March 2011
      • February 2011
      • January 2011
      • December 2010
      • November 2010
      • October 2010
      • September 2010
      • August 2010
      • July 2010
      • June 2010
      • May 2010
      • April 2010
      • March 2010
      • February 2010
      • January 2010
      • December 2009
      • November 2009
      • October 2009
      • September 2009
      • August 2009
      • July 2009
      • June 2009
      • May 2009
      • April 2009
      • March 2009
      • February 2009
      • January 2009
      • December 2008
      • November 2008
      • October 2008
      • September 2008
      • August 2008
      • July 2008
      • June 2008
      • May 2008
      • April 2008
      • March 2008
      • February 2008
      • January 2008
      • December 2007
      • November 2007
      • October 2007
      • September 2007
      • August 2007
      • July 2007
      • June 2007
      • May 2007
      • April 2007
      • March 2007
      • February 2007
      • January 2007
      • December 2006
      • November 2006
      • October 2006
      • September 2006
      • August 2006
      • July 2006
      • June 2006
      • May 2006
      • April 2006
      • March 2006
      • February 2006
      • January 2006
    • Meta

      • Log in
      • Entries RSS
      • Comments RSS
      • WordPress.org
    • RSS Razib’s Pinboard Feed

      • Archaeologists strike gold in quest to find Queen of Sheba's wealth | Science | The Observer
      • The missing heritability: rare variants of large effect? « reaction norm
      • In Vermont, Bronx Players Help Team, but Stir Outcry - NYTimes.com
      • Online Dating Sites Don’t Match Hype - NYTimes.com
      • Big Data’s Impact in the World - NYTimes.com
      • If you’ve seen one elephant, have you seen them all? | Uda Walawe Elephants
      • Functional genomics: The changes that count : Nature : Nature Publishing Group
      • College Rankings :: Political Affiliation of the Students
      • Economics of Family Life, as Taught by a Power Couple - NYTimes.com
      • Steve Sailer's iSteve Blog: Why does Britain have so many yobs these days?
      • Which population in the 1000 Genomes Project samples has the most Neandertal similarity? | john hawks weblog
      • Neanderthal demise due to many influences, including cultural changes
      • Atheism in America: Why won’t the U.S. accept its atheists? - Slate Magazine
      • For Ron Paul, a Distinctive Worldview of Long Standing - NYTimes.com
      • Killers’ Families Left to Confront Fear and Shame - NYTimes.com
      • 911 IS A JOKE - WWW.THEDAILY.COM
      • When Counseling and Conviction Collide — Beliefs - NYTimes.com
      • Rhodes Trust Gives Account of Quarterback’s Candidacy - NYTimes.com
      • The Powerful Resist Change to Greek Tax System - NYTimes.com
      • Effort to Rebrand Arab Spring Backfires in Iran - NYTimes.com


  • Kalmbach Publishing Co.

    Copyright © 2012, Kalmbach Publishing Co.

    Privacy - Terms - Reader Services - Subscribe Today - Advertise - About Us