Primed by expectations – why a classic psychology experiment isn’t what it seemed

By Ed Yong | January 18, 2012 5:00 pm

In the early 20th century, the world was captivated by a mathematical horse called Clever Hans. He could apparently perform basic arithmetic, keep track of a calendar and tell the time. When his owner, Wilhelm von Osten, asked him a question, Hans would answer by tapping out the correct number with his hoof.

Eventually, it was the psychologist Oskar Pfungst who debunked Hans’ extraordinary abilities. He showed that the horse was actually responding to the expectations of its human interrogators, reading subtle aspects of their posture and expressions to work out when it had tapped enough. The legend of Hans’ intellect was consigned to history. But history, as we know, has a habit of repeating itself.

For the last few decades, psychologists have been using a technique called priming. With subtle hints of words or concepts, they can trigger impressive changes in behaviour. Words of cleanliness can make people behave more morally. Words related to age can slow their bodies. Words of power sharpen our mental abilities. All of these studies have suggested that our behaviour is influenced by subtle things that lie beneath the watch of our conscious awareness.

This view could well be right, but not always in the way that psychologists believe. Stephane Doyen from the Université Libre de Bruxelles has repeated one of the classic experiments in priming and shown that, in this case at least, it’s not the words that create the effect. It’s the experimenters’ expectations.

Back in 1996, John Bargh and his colleagues found that infusing people’s minds with the concept of age could slow their movements (PDF). The volunteers in the study had to create a sentence from scrambled words  pick the odd word from a group of scrambled ones. When these included a word related to being old, the volunteers walked more slowly when they left the laboratory. They apparently didn’t notice anything untoward about the words, but their behaviour changed nonetheless. It was a classic result, and it turned the paper into one of the most widely cited studies in social psychology.

Two other groups have since replicated the effect, but neither stuck to the original set-up. That’s what Doyen wanted to do, but with three important tweaks. First, in Bargh’s study, a researcher timed the volunteers with a stopwatch. This time, Doyen would use infrared sensors for more accurate readings. Second, Bargh recruited 60 volunteers, but Doyen recruited twice as many. Third, Doyen also recruited four experimenters who carried out the study, but didn’t know what the point of it was.

This time, the priming words had no impact on the volunteers’ walking speed. They left the test room neither more slowly nor more quickly than when they arrived. The famous result hadn’t replicated. Why?

Doyen suspected that Bargh’s research team could have unwittingly told their volunteers how they were meant to behave, just as von Osten unconsciously told Clever Hans when to stamp. Perhaps they themselves moved more slowly if they expected the volunteer to do so. Maybe they spoke more languidly, or shook hands more leisurely.

We know that this sort of thing goes on all the time. It’s why people who run medical trials use “double-blind” designs, where neither experimenters nor patients know who is being given what. That wasn’t the case in Bargh’s study. The experimenter who clocked the stopwatch in the corridor didn’t know which volunteers had been primed, but the experimenter in the test room did. They could have unconsciously amplified the effect of their primes. Maybe they were responsible for creating the very behaviour they expected to see.

To test that idea, Doyen repeated his experiment with 50 fresh volunteers and 10 fresh experimenters. The experimenters always stuck to the same script, but they knew whether each volunteer had been primed or not. Doyen told half of them that people would walk more slowly thanks to the power of priming, but he told the other half to expect faster walks. Again, he measured the volunteers’ speed with infrared sensors, but he also gave the experiments a stopwatch to take some back-up readings.

When Doyen looked at the data from the infrared sensors, he found that the volunteers moved more slowly only when they were tested by experimenters who expected them to move slowly. If Doyen relied on the experimenters’ own stopwatch-based measurements, things were even worse. The ones who anticipated faster walks measured faster walks. The ones that presumed slower walks found those too. Let that sink in: the only way Doyen could repeat Bargh’s results was to deliberately tell the experimenters to expect those results.

It’s a fascinating result, but one that isn’t a deathblow for priming as a method. Note that Doyen isn’t suggesting that Bargh’s team were simply making up their results to fit what they expected. Rather, their expectations affected their behaviour, which then affected the volunteers’ behaviour. The volunteers were still being primed, albeit by the experimenters rather than the word tasks. “Either possibility is a confirmation for the power of priming,” says Tom Stafford from the University of Sheffield.

Bargh himself says, “The basic ‘stereotype-priming of behavior’ effect has been replicated dozens of times. There are many reasons for a study not to work, and as I had no control over [this] attempt, there’s not much I can say.”

Joshua Ackerman, a psychologist from MIT, says, “There have been hundreds, if not thousands, of priming studies conducted, many of which use designs that make experimenter bias essentially impossibl. It would be a huge mistake to draw the implication here that [these] studies refute this body of work in any way.”

Doyen’s study doesn’t show a radically new flaw, or one that’s unique to this branch of social psychology. We’ve known for over a century that scientists can very easily bias their own experiments, even in the most carefully controlled cases. “It’s a neat paper that re-emphasises some highly important and widely relevant warnings for everyone who might want to conduct experiments with people,” says Stafford. “Expectations – participants’ and experimenters’ – and inaccurate measurement can combine to give you biased results.” Ackerman adds, “It’s a lesson that behavioural researchers are all trained in, but one that bears repeating from time to time.”

“Our results don’t completely rule out the possibility of unconscious priming,” says Doyen, “but they point to the fact that the (generally weak) effects may also be influenced by many other factors that are almost never controlled in such studies.”

The study also serves as a good reminder about how important it is for scientists to try and repeat each others’ results. “The need for independent replications of important results such as those of Bargh cannot be overstated,” says Doyen. “The literature relies far too much on findings that have been produced using different methods dating back to 30 or 40 years ago.”

Reference: Doyen, Klein, Pichon & Cleeremans. 2011. Behavioral Priming: It’s all in the Mind, but Whose Mind? PLoS ONE

Image by Alex Proimos

CATEGORIZED UNDER: Neuroscience and psychology

Comments (25)

  1. Brian Too

    Double blind is the gold standard. Always has been and always will be.

  2. Unfortunately, it’s very difficult to verify that a study is truly double-blind from beginning to end, just because a researcher says it is (verbal report, including the methodology section of journal articles, is often subject to error, bias, or incompleteness).
    It can even be difficult to discern what a true “independent replication” is when it’s done with different subjects, in a different locale, in different buildings, at a different time of year, etc. etc. and all the ensuing variables that are altered or uncontrolled.

  3. You misspelled ‘impossible.’

    Yet another stark warning to experimenters to look out for known unknowns and the ever present threat of unknown unknowns.

  4. Very interesting, thanks!

  5. Ian

    As a science teacher, the first thing this made me think was that this is really important for designing classroom practicals and demonstrations. We use the idea of priming so that students know what effects to look for, but this inevitably means they ‘see’ effects which are imaginary or noise. Humans are, after all, very good at seeing imaginary patterns, for sound evolutionary reasons.

    Secondly, I’d point out that we talk about ‘double-blind’ as a clear idea. In fact it’s surely an aspiration. I presume (getting to read original papers only rarely) that how scientists *aim* for double blind (and document their methods) is more significant than the simple claim ‘it was double blinded’.

  6. Alison

    Fascinating. I loved the label on your picture ‘Old Men Walking’. From my perspective, while obviously not in their first flush of youth, they didn’t look so old! Another potential confound?

  7. Melissa


    The failure to replicate is a null effect. There are a host of explanations for null effects, only one of which is the non-existence of the prime effect. I agree that ruling out priming is premature. That said, this does serve as a reminder for researchers to employ clean research methods.

  8. Joe

    You know, it’s also “not exactly rocket science” to read the original article on which you’re reporting. *Every study* in the Bargh paper has the experimenter blind to condition, the elderly walking study is replicated *within the same paper* (and subsequently in labs across the world), and the third study uses subliminal priming, meaning that it is effectively double-blind. Whatever else might be wrong with those studies (and other priming studies), it has nothing whatsoever to do with experimenter effects. It’s akin to saying, “Because somebody produced the same effect using a different manipulation, that manipulation is responsible for the original effect.” Absolutely a career-worst for “reporting.”

  9. Hooray! A career-worst, and so early! It can only go up from here.

    From the paper, regarding the “experimenter blind to condition” bit:

    “The purpose of this setup was to ensure that the experiment was following a double-blind principle. No such precautionary measures are reported concerning the experimenter who administered the task to the participants… In Bargh et al.’s study, the experimenter who administered the task could thus very well have been aware of whether the participant was in the prime condition or not and tune his or her behavior accordingly. This possibility was in fact confirmed informally in our own study, as we found that it was very easy, even unintentionally, to discover the condition in which a particular participant takes part by giving a simple glimpse to the priming material.”

    You’ll also note that I sent the study to Bargh and asked for a comment, and got the quote that I included in the piece.

  10. Joe

    Again, read the original paper.

    Page 236 reads “The experimenter kept himself blind to condition by prepackaging the various scrambled-sentence tasks and picking packets randomly when the participant arrived at the laboratory waiting area.” That’s for Studies 2a and 2b.

    Page 234 reads “Neither the experimenter nor the confederate (see below) knew the priming condition to which a particular participant had been assigned until after the experimental session was over.” That’s for Study 1

    Page 238 reads “…keeping the participant’s condition from the experimenter’s knowledge.” That’s for Study 3, and by the way, it’s a mere couple paragraphs from the sentence “completely ruling out experimenter demand or other explanations.”

    Being a responsible and thorough science journalist means *actually reading the material you’re reporting on.* Not simply regurgitating what another author has to say about a paper. You want to make the case in your article that the original effects were due to experimenter demands. To make that case, the first thing you should have done is ask “Does that explanation make any sense given the methodology of the original finding?” You have clearly failed in that respect.

    As for your comment that you included the quote Bargh gave you, if someone came to me and said, “I haven’t bothered to read any of your work but I have a completely hare-brained explanation for your effects.. do you have any comments?,” I think I would probably say “there’s not much I can say” as well.

  11. @Joe: I assume it’s not your intent, but your comments come across as unnecessarily derisive (“it’s also ‘not exactly rocket science’…”) and judgmental. I’ve been reading Not Exactly Rocket Science for some time, and feel strongly that Ed is one of the most thorough and careful science journalists around, and that he’s not at all cavalier about the underlying science he reports on.

    Nonetheless, I do understand that you’ve have found some language in the Bargh paper supporting the notion that, notwithstanding the view of the Doyen researchers, the age tests in Bargh may in fact have been double-blind. Very interesting and it’s good you bring this up, although I did look up this language in Bargh and want to note/add a couple of things. First, the quotes from pages 234 and 238 aren’t relevant to the age experiments discussed in Ed’s article or in the Doyen paper, as they relate to experiments 1 and 3, separate studies on rudeness and racial stereotypes. Also, while the language on page 236 is relevant, I’m not sure that it tells the whole story. This quote describes the researcher’s state of knowledge before the participant took the test, but then the procedures section goes on to state that, after giving instructions to the participant, the experimenter left the room until after the test was completed, at which point he “re-entered the lab room and partially debriefed the patient.” I assume – as you apparently do – that the experimenter still was unaware of the participant’s status, but it isn’t really clear. Perhaps this is why the Doyen team decided that the Bargh research wasn’t clear enough, and that it made sense to attempt to replicate the results and focus in on potential priming effects?

    Given this background, given that Doyen team apparently didn’t feel that it was clear enough that there was no priming by the experimenters, and given that Bargh didn’t add any comments (and I think your characterization may be a bit unfair, as there’s a real difference between a journalist asking a researcher to comment on a directly-relevant, peer-reviewed paper, and someone saying “I haven’t bothered to read any of your work but I have a completely hare-brained explanation for your effects…”), I guess it just doesn’t seem to me that Ed’s approach to this article was particularly careless or inappropriate. Also, in any event, the overall discussion of priming effects and risks in Ed’s piece remains valid and interesting, and the reporting on the Doyen paper is accurate.

  12. Joe


    I’m sorry but I have to disagree with most counts in your post.

    With respect to the blog post itself: Ed is not just reporting on a paper. He’s *advancing one position*: that the original effect was a ‘clever hans’ effect, in which the participants were simply acting out the expectations of the experimenter. That’s the entire framing of the report (e.g., “why a classic psychology experiment isn’t what it seemed”). I don’t fault him as a journalist for advancing a position; all journalists do. What I fault him for is not exercising good judgment and reasoning. If you’re going to advance a position, it’s your responsibility to do the necessary background work to figure out whether that position makes sense and not just to take the word of one person. And if you don’t do that, to at least be a bit more cautious in the certainty with which one reports. (Yes, I know there are some hedging statements… only after all the bold claims are made.)

    With respect to the science: Here again Ed is wrong in his reporting, in terms of how he describes the Doyen finding. In fact it directly contradicts Doyen et al.’s own claims. In the 3rd paragraph of their discussion we find “Our results, however, cannot be explained solely in terms of a pure self-fulfilling prophecy effect [14], as the primed participants did not walk faster when tested by an experimenter who believed they would walk faster.” Read that again: The expectations of the experimenter alone DID NOT have an effect on participants’ walking speed. Thus “primed by expectations” is wrong. Saying history “has a habit of repeating itself” is wrong. Saying “it’s not the words that create the effect. It’s the experimenters’ expectations” is wrong. Saying “Rather, their expectations affected their behaviour, which then affected the volunteers’ behaviour” is wrong. (Indeed, that’s the classic SFP effect.) Relating the findings to Clever Hans is wrong. etc. etc. etc.

    Additionally, if it’s merely an expectation effect, why should the method of timing matter at all? In the Bargh paper it’s clear that the confederate who timed the participant had no way of knowing the condition. Using a stopwatch or infrared sensor shouldn’t make a difference. (I agree with you, by the way, that it is possible that the experimenter, upon returning, looked at the condition and then behaved differently depending on condition. It’s not clear from the report that this didn’t happen. However it’s completely clear that in both Study 1 and 3 the experimenter could not have known the condition.)

    More generally, findings have to be considered in the broader context of other, directly supportive work. That’s why I referenced the other two quotes: it is *LITERALLY IMPOSSIBLE* that experimenter effects can account for the results of Studies 1 or 3 (or my own results and other researchers’ results using subliminal priming methods). Let me repeat that: experimenter effects absolutely cannot account for other findings in which stereotype primes influence behavior. Even those findings in the same paper as the original Bargh paper, and experimenter effects alone can’t even account for Doyen’s own findings, **as he writes in the discussion**.

    I did not mean my comments to come across as unnecessarily derisive. As far as I can tell, however, Ed is wrong in his reporting of this research and, *importantly for a science journalist,* the broader scientific body of results (even just the broader context of the two papers in question) in which this work appears.

  13. @Joe: Thanks for the thoughtful reply. It’s clear that you know a lot about this area and feel strongly about it. Which is good! My initially negative reaction to your comments was based on a sense that you were less interested in setting the record straight than in attacking Ed personally (and with a bit of hyperbole). That said, I totally get it: it can be maddening when you read a piece on at topic where you happen to have a bit of knowledge, and it seems like the writer has screwed it all up. I actually suspect we may not disagree all that much on substance, although I haven’t yet had a chance to go back through your analysis of the Doyen findings.

    I love it when knowledgeable scientists weigh in (it’s a great way for clueless people like me to gain real insight into new areas!), but it’s hard to get the benefit when the insights are transmitted by flamethrower. :-) Maybe it’s a little like road rage: you start watching the spectacle and don’t really care so much about who cut off whom….

    Anyhow, thanks again for the further analysis/thoughts on all of this – gives me something to read and ponder this weekend between football games!

  14. Bobbie

    There is a place to publish these types of “failures to replicate” previous studies:
    (And this study wouldn’t be the first to fail to replicate the Bargh study.)

    Still — we have to agree that it one failure to replicate doesn’t mean the original was wrong. But we do need to collect a lot of studies together to figure out what is going on and what is needed for an effect to show up (or not).

  15. Sharon_C

    Joe, you say the walking/elderly study was replicated “…subsequently in labs across the world”. Are any of these replications published, and if so, have you got any references? There are a lot of rumors out there of many failures to replicate by people who believed in the result and wanted to go to work on extending it.


  16. MattK

    Joe, you said “I did not mean my comments to come across as unnecessarily derisive.” (gratuitously mean-spirited and undignified would be more precise, IMHO), but if that is actually the case, maybe you should consider how source delivery style influences message effectiveness.

  17. Tom

    John Bargh has published a reply to the Doyen article (also mentioning this blog post):

  18. Interesting. I note again that I asked John Bargh for a comment when I wrote about this paper, and I contacted other psychologists for their views. If there were problems with the methodology, I would gladly have highlighted them, or changed the focus of the piece. Instead, no one threw up such views and Bargh himself deliberately chose to say very little – note the quote. “There’s not much I can say,” quoth he.

    Well, that’s not really true, is it? Because he has now written a two-page opinion piece outlining the supposed flaws in the Doyen et al. study.

  19. Jane

    John Bargh should not have had to do your work for you- this article far overstates the findings of Doyen et al., and Bargh is not to blame for that. You are.

  20. Chris

    I am a professional research psychologist at a major university and I have heard from more than one lab that couldn’t replicate the original study. These labs didn’t publish the results because the costs of publishing a failure to replicate are high (read: pushback from the authors of the original paper) and more importantly the rewards are low: most of the top-tier for-profit journals are not interested in null results, even when they are failures to replicate.

    Ed – a worthwhile message to push from your pulpit: granting agencies should reward researchers who publish null results and researchers who submit to journals that do not discriminate against null results (such as PLoS ONE, where this study was published).

  21. Eric R.

    Chris wrote: “I have heard from more than one lab that couldn’t replicate the original study.” This points to another reason why it is important that non-replications get peer-reviewed and published. Of course is it essential that researchers get a balanced view on the robustness and scope of an effect. But another thing is that as long as non-replications do not get published, researchers cannot judge the quality of those studies, nor can they try to systematically test and rule out alternative explanations based on those studies. Non-replications that do not get published are doomed to an existence as persistent rumours, and rumours are not science.

  22. Ed, you are wrong about a fact in Bargh’s original paper. You say “Back in 1996, John Bargh and his colleagues found that infusing people’s minds with the concept of age could slow their movements. The volunteers in the study had to pick the odd word from a group of scrambled ones.” A scrambled sentence test is not at all what you describe here: Participants never have to “pick up” the odd word from a group of scrambled ones. If they had to do that, the whole priming procedure would be compromised, because you would be surprised by participants’ ability to detect an objective of a study or a prime. A scrambled sentence test is not an intelligence test in which you need to say which word doesn’t ‘go’ with the others! Instead, in a scrambled sentence test participants need to form grammatically correct sentences with, say, 5 words out of 6. The whole aim is to expose them to primes illustrating a certain theme (the stereotype of the elderly, in Bargh’s 1996 paper) without them noticing this! I am simply surprised that you don’t describe this procedure correctly – because from what you say, readers with no social psychology knowledge understand something completely different and this is misguiding them in forming their impression of this whole controversy.

  23. Fair point. Corrected. Thanks, Gabriela.

  24. Thanks Ed. I think you should also correct it in your rebuttal after Bargh’s post in Psychology Today. In there, you’ve copy-pasted the section where you describe his procedure.


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Not Exactly Rocket Science

Dive into the awe-inspiring, beautiful and quirky world of science news with award-winning writer Ed Yong. No previous experience required.

See More

Collapse bottom bar