DISCOVER Magazine. Science, Technology and The Future
Current Issue
Subscribe Today »
  • Renew
  • Give a Gift
  • Archives
  • Customer Service
  • Facebook
  • Twitter
  • Newsletter
  • Health & Medicine
  • Mind & Brain
  • Technology
  • Space
  • Human Origins
  • Living World
  • Environment
  • Physics & Math
  • Video
  • Photos
  • Podcast
  • RSS
Gene Expression
« Personal genomics: more than fun & games
Rick Perry – a matter of optics? »

The problem of false positives

False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant:

In this article, we accomplish two things. First, we show that despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process.

Since the paper is behind a paywall, I’ve cut & pasted the solutions belows:

We propose the following six requirements for authors.

  1. Authors must decide the rule for terminating data collection before data collection begins and report this rule in the article. Following this requirement may mean reporting the outcome of power calculations or disclosing arbitrary rules, such as “we decided to collect 100 observations” or “we decided to collect as many observations as we could before the end of the semester.” The rule itself is secondary, but it must be determined ex ante and be reported.

  2. Authors must collect at least 20 observations per cell or else provide a compelling cost-of-data-collection justification. This requirement offers extra protection for the first requirement. Samples smaller than 20 per cell are simply not powerful enough to detect most effects, and so there is usually no good reason to decide in advance to collect such a small number of observations. Smaller samples, it follows, are much more likely to reflect interim data analysis and a flexible termination rule. In addition, as Figure 1shows, larger minimum sample sizes can lessen the impact of violating Requirement 1.

  3. Authors must list all variables collected in a study. This requirement prevents researchers from reporting only a convenient subset of the many measures that were collected, allowing readers and reviewers to easily identify possible researcher degrees of freedom. Because authors are required to just list those variables rather than describe them in detail, this requirement increases the length of an article by only a few words per otherwise shrouded variable. We encourage authors to begin the list with “only,” to assure readers that the list is exhaustive (e.g., “participants reported only their age and gender”).

  4. Authors must report all experimental conditions, including failed manipulations. This requirement prevents authors from selectively choosing only to report the condition comparisons that yield results that are consistent with their hypothesis. As with the previous requirement, we encourage authors to include the word “only” (e.g., “participants were randomly assigned to one of only three conditions”).

  5. If observations are eliminated, authors must also report what the statistical results are if those observations are included. This requirement makes transparent the extent to which a finding is reliant on the exclusion of observations, puts appropriate pressure on authors to justify the elimination of data, and encourages reviewers to explicitly consider whether such exclusions are warranted. Correctly interpreting a finding may require some data exclusions; this requirement is merely designed to draw attention to those results that hinge on ex post decisions about which data to exclude.

  6. If an analysis includes a covariate, authors must report the statistical results of the analysis without the covariate. Reporting covariate-free results makes transparent the extent to which a finding is reliant on the presence of a covariate, puts appropriate pressure on authors to justify the use of the covariate, and encourages reviewers to consider whether including it is warranted. Some findings may be persuasive even if covariates are required for their detection, but one should place greater scrutiny on results that do hinge on covariates despite random assignment.

Guidelines for reviewers

We propose the following four guidelines for reviewers.

  1. Reviewers should ensure that authors follow the requirements. Review teams are the gatekeepers of the scientific community, and they should encourage authors not only to rule out alternative explanations, but also to more convincingly demonstrate that their findings are not due to chance alone. This means prioritizing transparency over tidiness; if a wonderful study is partially marred by a peculiar exclusion or an inconsistent condition, those imperfections should be retained. If reviewers require authors to follow these requirements, they will.

  2. Reviewers should be more tolerant of imperfections in results. One reason researchers exploit researcher degrees of freedom is the unreasonable expectation we often impose as reviewers for every data pattern to be (significantly) as predicted. Underpowered studies with perfect results are the ones that should invite extra scrutiny.

  3. Reviewers should require authors to demonstrate that their results do not hinge on arbitrary analytic decisions. Even if authors follow all of our guidelines, they will necessarily still face arbitrary decisions. For example, should they subtract the baseline measure of the dependent variable from the final result or should they use the baseline measure as a covariate? When there is no obviously correct way to answer questions like this, the reviewer should ask for alternatives. For example, reviewer reports might include questions such as, “Do the results also hold if the baseline measure is instead used as a covariate?” Similarly, reviewers should ensure that arbitrary decisions are used consistently across studies (e.g., “Do the results hold for Study 3 if gender is entered as a covariate, as was done in Study 2?”).5 If a result holds only for one arbitrary specification, then everyone involved has learned a great deal about the robustness (or lack thereof) of the effect.

  4. If justifications of data collection or analysis are not compelling, reviewers should require the authors to conduct an exact replication. If a reviewer is not persuaded by the justifications for a given researcher degree of freedom or the results from a robustness check, the reviewer should ask the author to conduct an exact replication of the study and its analysis. We realize that this is a costly solution, and it should be used selectively; however, “never” is too selective.

To preempt angry and offended psychology professors: this problem is not limited to their discipline. It is probably a bigger problem in medicine because it costs us a lot of money and likely kills people.

Share

November 10th, 2011 Tags: Psychology
by Razib Khan in Uncategorized | 6 comments | RSS feed | Trackback >

6 Responses to “The problem of false positives”

  1. 1.   zkkz Says:
    November 10th, 2011 at 9:47 pm

    What about the most effective solution: report Bayesian statistics!

  2. 2.   Stephen Bounds Says:
    November 11th, 2011 at 12:13 am

    Awesome. There are many, many disciplines in the social sciences that should adopt these rules.

  3. 3.   Lab Lemming Says:
    November 11th, 2011 at 2:03 am

    xkcd covered this problem a while ago:
    http://xkcd.com/882/

  4. 4.   ohwilleke Says:
    November 11th, 2011 at 2:45 am

    Mostly good. The most problematic is the small sample size issue. As this blog itself has illustrated, and is amply illustrated from archaeology and linguistics and neuroscience and medicine, sample sizes of one are frequently useful and sample sizes of a dozen can often be powerful.

    Sometimes statistical significance isn’t the most important issue. In ethnography or any other kind of research method where you get depth (e.g. each whole genome has thousands of data points), smaller samples can work. Likewise, if you have a good model to fit your data to (e.g. a likely recessive gene pattern) a small sample can say a lot. Strategic sampling can also be quite powerful – e.g., the Dow Jones Industrial Average is remarkably good at matching larger market trends with just twenty carefully chosen data points and losing one or two of those points would still leave an instrument that was almost as good.

    Very frequently, a small number of convincing outliers can make a powerful point. Lots of important neuroscience discoveries are based on samples of one where an outlier individual lacks this or that feature of the brain and the result is described. A single dig can greatly change the dating of an archaeological period.

    More generally failure of the guidelines to broadly recognize these issues suggest that the domain of research activities over which the guidelines are useful is narrower than suggested.

  5. 5.   Rob Says:
    November 11th, 2011 at 8:58 am

    Paper available here:
    http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1850704

  6. 6.   Dick Says:
    November 11th, 2011 at 7:18 pm

    Single subjects can be useful for generating hypotheses, small samples with large effects can be useful for strengthening hypotheses, and very large samples with small differences can mislead by giving clinically meaningless but highly statistically significant differences. In the past I worked with sample sizes of 640,000 to 1,000,000 and was showered with findings at the .00001 level, few of which were meaningful. Therefore sample size in and of itself is not a valid criterion of warranted outcomes.





    • About Gene Expression

      Razib Khan’s degrees are in biochemistry and biology. He has blogged about genetics since 2002, previously worked in software development, is an Unz Foundation Junior Fellow and lives in the western US. He loves habaneros.

    • Search

    • Recent Comments

      • Razib Khan on An Orientalist fantasy
      • Wulf Kurtoglu on An Orientalist fantasy
      • Larry, San Francisco on Vaccination as heterodoxy
      • Onur on The utility and reality of species
      • DK on The utility and reality of species
    • Must Read List

      • Principles of Population Genetics
      • Quantitative Genetics
      • The Horse, the Wheel, and Language
      • Albion's Seed
      • The Blank Slate
    • Links

      Blogroll

      Blogroll

      • A Replicated Typo
      • Archives at unz.org
      • Brown Pundits
      • Deep Sea News
      • Dienekes
      • Gene Expression Classic
      • Harappa Ancestry Project
      • John Hawks
      • Less Wrong
      • Randall Parker
      • Razib on Books
      • Razib's Aggregator Blog
      • Secular Right
      • Sepia Mutiny
      • Steve Sailer
      • West Hunter
      Q & A

      Q & A

      • A. W. F. Edwards
      • Adam K. Webb
      • Armand Leroi
      • Bruce Lahn
      • Charles C. Mann
      • Charles Murray
      • Dan Sperber
      • David Haig
      • Heather Mac Donald
      • Hugh Pope
      • James F. Crow
      • John Derbyshire
      • Jon Entine
      • Judith Rich Harris
      • Justin L. Barrett
      • Ken Miller
      • Matthew Stewart
      • Parag Khanna
      • Peter Turchin
      • Warren Treadgold
      Books

      Books

      • 1491
      • 1848
      • A Beautiful Math
      • A Concise Economic History of the World
      • A Farewell to Alms
      • A History of Christianity
      • A History of Iran
      • A History of the Byzantine State and Society
      • A Reason for Everything
      • A Separate Creation
      • A Splendid Exchange
      • A Theory of Religion
      • A World History
      • Aboriginal Australians
      • Adaptation and Natural Selection
      • After Tamerlane
      • After the Ice
      • Age of Abundance
      • Albion's Seed
      • American Judaism
      • Banana
      • Before the Dawn
      • Behavioral Genetics in the Postgenomic Era
      • Biometry
      • Blood of the Isles
      • Bones, Stones and Molecules
      • Born That Way
      • Calculus Made Easy
      • Castes of Mind
      • Catholicism and Freedom
      • Causes of Evolution
      • Children of the Revolution
      • China in World History
      • China's Cosmopolitan Empire
      • China: A New History
      • Clash of Extremes
      • Contours of the World Economy 1-2030 AD
      • Darwin's Cathedral
      • Dawn of Human Culture
      • Deep Ancestry
      • Defenders of the Truth
      • Descartes' Baby
      • Divided by the Faith
      • Dragon Bone Hill
      • Empires and Barbarians
      • Empires of the Silk Road
      • Empires of the Word
      • End of the Bronze Age
      • Endless Forms Most Beautiful
      • Epistasis and Evolutionary Process
      • Europe
      • Europe After Rome
      • Europe Between the Oceans
      • Evolution
      • Evolution and the Genetics of Populations
      • Evolution for Everyone
      • Evolutionary Dynamics
      • Evolutionary Genetics
      • Evolutionary Human Genetics
      • Evolutionary Quantitative Genetics
      • Explaining Culture
      • Fooled By Randomness
      • Fourth Crusade & the Sack of Constantinople
      • Freedom Just Around the Corner
      • From Plato to Nato
      • Genetical Theory of Natural Selection
      • Genetics and Analysis of Quantitative Traits
      • Genetics and Origins of Species
      • Genetics of Populations
      • Genghis Khan & the Making of the Modern World
      • Genome
      • Geography of Thought
      • Global Capitalism
      • God's War
      • Grand New Party
      • Grooming, Gossip, and the Evolution of Language
      • Guns, Germs, and Steel
      • Historical Dynamics
      • History of Rome
      • How Pleasure Works
      • How Rome Fell
      • How We Decide
      • In Gods We Trust
      • In Search of the Trojan War
      • India: A New History
      • Infidels
      • Journey of Man
      • Keepers of the Keys of Heaven
      • Knowledge and the Wealth of Nations
      • Mapping Human History
      • Marketplace of the Gods
      • Mathematical Models in Biology
      • Molecular Evolution
      • Molecular Markers, Natural History, and Evolution
      • Mother Nature
      • Mutants
      • Narrow Roads of Gene Land 1
      • Narrow Roads of Gene Land 2
      • Narrow Roads of Gene Land 3
      • Natural Selection and Social Theory
      • Nature via Nurture
      • No Two Alike
      • Of Moths and Men
      • Origin and Evolution of Cultures
      • Origins of Theoretical Population Genetics
      • Out of Thin Air
      • Pandora's Seed
      • Plagues and Peoples
      • Population Genetics and Microevolutionary Theory
      • Population Genetics, Molecular Evolution, and the Neutral Theory
      • Postwar
      • Power and Plenty
      • Predictably Irrational
      • Prehistory of the Mind
      • Principles of Population Genetics
      • Pursuit of Glory
      • Quantitative Genetics
      • R.A. Fisher, the Life of a Scientist
      • Reading in the Brain
      • Religion Explained
      • Rome and Jersalem
      • Sailing to Byzantium
      • Sewall Wright and Evolutionary Biology
      • Sociobiology
      • Speciation
      • Statistical Methods in Molecular Evolution
      • Supernatural Selection
      • Survival of the Prettiest
      • Synaptic Self
      • Tempo and Mode in Evolution
      • The 10,000 Year Explosion
      • The Age of Confucian Rule
      • The Age of Lincoln
      • The Altruism Equation
      • The Ancestor's Tale
      • The Ascent of Money
      • The Barbarian Conversion
      • The Black Swan
      • The Blank Slate
      • The Classical World
      • The Creationists
      • The Cultural Origins of Human Cognition
      • The Darwin Wars
      • The Descent of Man
      • The Early Chinese Empires
      • The Essential Difference
      • The Evolutionists
      • The Faith Instinct
      • The Fall of Rome
      • The Fall of the Roman Empire
      • The g Factor
      • The Genetics of Human Populations
      • The Germanization of Early Medieval Christianity
      • The Great Arab Conquests
      • The Great Divergence
      • The Great Human Diasporas
      • The Great Upheaval
      • The History and Geography of Human Genes
      • The Horse, the Wheel, and Language
      • The Human Web
      • The Imitation Factor
      • The Invisible Gorilla
      • The Language Instinct
      • The Making of a Christian Aristoracy
      • The Math Gene
      • The Mating Mind
      • The Meme Machine
      • The Moral Animal
      • The Number Sense
      • The Nurture Assumption
      • The Origin of Species
      • The Origin Of The Mind
      • The Origins of Virtue
      • The Power of Babel
      • The Price of Altruism
      • The Red Queen
      • The Reformation
      • The Rise of Western Christendom
      • The Sacred Chain
      • The Selfish Gene
      • The Seven Daughters of Eve
      • The Stuff of Thought
      • The Symbolic Species
      • The Tenth Parallel
      • The Troubled Empire
      • The Vertigo Years
      • The Vikings
      • Throes of Democracy
      • Unknown Quantity
      • Unto Others
      • War and Peace and War
      • War, Wine, and Taxes
      • We Are Doomed
      • Wealth and Poverty of Nations
      • What Hath God Wrought
      • When Baghdad Ruled the Muslim World
      • When Genius Failed
      • Why Sex Matters
      • Why Some Like It Hot
    • Elsewhere on DISCOVER

      RSS Genetics in DISCOVER mag

      Genetics in DISCOVER

      • Can Stuffing Germs up Ferrets Unleash a Human Pandemic?
      • 20 Things You Didn't Know About... Allergies
      • The Brain: Hidden Epidemic: 
Tapeworms Living Inside People's Brains
      • The Hagfish's Special Trick for Warding Off Predators: Thick, Sticky Mucus
      • The Big, Overlooked Factor in the Rise of Pandemics: The Human Vector
      • Does Rain Come From Life in the Clouds?
      • Gallery | 6 Creepy-Crawlies We Hate But Couldn't Do Without
      • Plants Repel Bacteria's Assaults by Spying on Their Chatter
    • Gene Expression content

      RSS Recent Posts

      Recent Posts

      • A quick note on comments policy
      • An Orientalist fantasy
      • Vaccination as heterodoxy
      • Hispanos and Sephardic ancestry
      • Are Hispanics that socially conservative?
      • The utility and reality of species
      • The American Community Survey: mend it, don’t end it!
      • GEDmatch
      Categories

      Categories

      • Administration
      • Agriculture
      • Anthroplogy
      • Ask a ScienceBlogger
      • Barbarism
      • Behavior Genetics
      • Bioethics
      • Biology
      • Biotech
      • Blog
      • Books
      • Cognitive Science
      • Creationism
      • Culture
      • Data Analysis
      • Demographics
      • Development
      • Ecology
      • Economics
      • Education
      • Environment
      • Evolution
      • Evolutionary Genetics
      • Evolutionary Psychology
      • Fantasy
      • Food
      • Futurism
      • Genetics
      • Genomics
      • Geography
      • GSS
      • Health
      • History
      • Human Evolution
      • Human Evolutionary Genetics
      • Human Evolutionary Genomics
      • Human Genetics
      • Human Genomics
      • International Affairs
      • Linguistics
      • Medicine
      • Paleontology
      • Personal Genomics
      • philosophy
      • Politics
      • Population Genetics
      • Psychology
      • Quantitative Genetics
      • Race
      • Religion
      • Science
      • Science Fiction
      • Select
      • Social Science
      • Space
      • Sports
      • Statistics
      • Technology
      • Transhumanism
      • Uncategorized
      Archives

      Archives

      • May 2012
      • April 2012
      • March 2012
      • February 2012
      • January 2012
      • December 2011
      • November 2011
      • October 2011
      • September 2011
      • August 2011
      • July 2011
      • June 2011
      • May 2011
      • April 2011
      • March 2011
      • February 2011
      • January 2011
      • December 2010
      • November 2010
      • October 2010
      • September 2010
      • August 2010
      • July 2010
      • June 2010
      • May 2010
      • April 2010
      • March 2010
      • February 2010
      • January 2010
      • December 2009
      • November 2009
      • October 2009
      • September 2009
      • August 2009
      • July 2009
      • June 2009
      • May 2009
      • April 2009
      • March 2009
      • February 2009
      • January 2009
      • December 2008
      • November 2008
      • October 2008
      • September 2008
      • August 2008
      • July 2008
      • June 2008
      • May 2008
      • April 2008
      • March 2008
      • February 2008
      • January 2008
      • December 2007
      • November 2007
      • October 2007
      • September 2007
      • August 2007
      • July 2007
      • June 2007
      • May 2007
      • April 2007
      • March 2007
      • February 2007
      • January 2007
      • December 2006
      • November 2006
      • October 2006
      • September 2006
      • August 2006
      • July 2006
      • June 2006
      • May 2006
      • April 2006
      • March 2006
      • February 2006
      • January 2006
    • Meta

      • Log in
      • Entries RSS
      • Comments RSS
      • WordPress.org
    • RSS Razib’s Pinboard Feed

      • Abortion polls, gay marriage polls: Why are we becoming liberal on some issues but not others? - Slate Magazine
      • At CUNY’s Top Colleges, Black and Hispanic Freshmen Enrollments Drop - NYTimes.com
      • Megafaunal Extinctions
      • New Details Are Released in Shooting of Trayvon Martin - NYTimes.com
      • White American babies are now in the minority. Why does the census divide people by race, anyway? - Slate Magazine
      • When you eat matters, not just what you eat
      • Can You Call a 9-Year-Old a Psychopath? - NYTimes.com
      • A Circle of Tech in Silicon Valley - Collect Payout, Do a Start-Up - NYTimes.com
      • Archaeologists Unearth Ancient Maya Calendar Writing - NYTimes.com
      • Repeat act: Parallel selection tweaks many of the same genes to make big and heavy mice
      • Blond as a window to ancient pigmentation variation
      • Eugenics, Malthusianism, and Trepidation, Bryan Caplan | EconLog | Library of Economics and Liberty
      • Textuality: The Jews Are a Race, Geneticist Says
      • The designer baby factory: Eggs from beautiful Eastern Europeans. Sperm from wealthy Westerners. And embryos implanted in desperate women. | Mail Online
      • Arab Spring Stirs Palestinian Journalists to Test Free Speech Limits - NYTimes.com
      • Barack Obama | Racial Diversity | Civil Rights | 2012 Election | The Daily Caller
      • Could These Start-Ups Become the Next Big Thing? - NYTimes.com
      • Steve Sailer's iSteve Blog: Pym Fortuyn, RIP
      • Never mind Europe; worry about India's economic growth - The Economic Times
      • 9 Swing States, Critical to Presidential Race, Are Mixed Lot - NYTimes.com


  • Kalmbach Publishing Co.

    Copyright © 2012, Kalmbach Publishing Co.

    Privacy - Terms - Reader Services - Subscribe Today - Advertise - About Us