Data, Truth and Null Results

By Neuroskeptic | June 9, 2017 2:38 pm

Have you heard of the idea that smiling actually makes you joyful? Perhaps you know of the experiment where researchers got people to hold a pen in their mouth, so they had to smile, and it made them find cartoons funnier.


If you’re familiar with this idea, then you’re familar with the work of German psychologist Fritz Strack, who carried out the famous pen-based grinning study, back in 1988.

Now, Strack has just published a new piece, called From Data to Truth in Psychological Science. A Personal Perspective. Ironically, this reflection on recent smiling studies is not a happy one.

Last year, a group of researchers published a registered replication report (RRR) – an attempt to directly reproduce the 1988 pen-in-the-mouth effect on cartoon appreciation. 17 different sets of researchers carried out identical studies in parallel around the world. The total sample size was 1,894 people. The results were firmly null: there was no evidence that the ‘smiling’ pen condition made cartoons seem funnier.

In the new piece, Strack responds to this disappointing result. He says that he “volunteered” to help facilitate the RRR, submitting the experimental materials to make it possible, but he questions the meaning and importance of the null result. Following a rather opaque discussion, Strack seems to come close to suggesting that the truth of claims in science can be known even in the absence of statistical support:

[Scientists whose work fails to replicate] may arrive at the insight that there exists no direct route from data to truth. Instead, they may come to the conclusion that science is about arguments that should be based on empirical evidence whose validity, however, is not merely determined by probabilistic parameters. Although, power, effect size, significance level, etc. provide useful information, they deliver no immediate link to the truth or falsehood of a hypothesis. Instead, they must be critically evaluated (Popper, 1959), not only by statisticians but by scientists who are experts in the field.

The obvious problem here is that Strack himself used hypothesis-testing statistics in the original smiling-pen study. Presumably, if the “probabilistic parameters” of that original study had come out null (non-significant), he would have revised his views on the influence of facial muscle contraction on emotion, at least to some extent. In which case, perhaps he should revise his views now that (much) more evidence has arrived in the form of the RRR.

The alternative is that Strack would not have revised his views if the original smiling-pen study had given a null result, although this would raise the question of what the purpose of it was (and would be rather ironic for someone who cites Popper.)

However he might have hypothetically reacted to a null result 30 years ago, Strack today views null results as uninformative:

It has been proposed to preregister the procedures of a study with the editors to assure that the results will be published regardless of their results. Although, this seems impartial, such a publication may not be informative and journals run the risk of becoming mere archives instead of media of the debate. As a consequence, it is not surprising that most journal editors prefer positive outcomes that add something new to what is already known.

This argument, which reminds me of Jason Mitchell’s infamous Emptiness of Failed Replications post, takes aim at study preregistration, an idea that I have long championed as a way to combat publication bias. So, as you’d predict, I strongly disagree with Strack here.

In my view, a well-designed study is one that will be interesting and informative whatever the results are (assuming the study ‘works’ in a technical sense.) Sure, there are cases such as “does this pill cure cancer?” where a positive result would be more exciting than a negative answer. Yet the negative result would still be an informative contribution to the search for a cure for cancer. There is a nice quote about this which I will leave here (h/t @jones_the_chris):

ResearchBlogging.orgStrack F (2017). From Data to Truth in Psychological Science. A Personal Perspective. Frontiers in Psychology, 8 PMID: 28559859

  • smut clyde

    not only by statisticians but by scientists who are experts in the field.

    One cannot help wondering whether “expertise in the field” is here defined by “agreement with the hypothesis despite evidence against it”.

    Chico Marx had a useful quote here too.

  • smut clyde

    I liked this part:

    … Susan Fiske, Dan Schacter, and Shelley Taylor point out that a replication failure is not a scientific problem but an opportunity to find limiting conditions and contextual effects. … To show that an effect occurs only under one but not under another condition is more informative than simply demonstrating non-effects

    So if evidence rejects a hypothesis, the right response is to keep the hypothesis — propping it up with subsidiary hypotheses, to explain why the purported effect worked in the original experiment but not in all the subsequent ones (weather dependence, temperature dependence, holding-your-mouth-the-right-way dependence). This preserves the original researcher’s reputation, and creates work for other researchers, so it’s a win-win… and the whole paradigm worked so well for medieval scholasticism.

    It’s like they’re trolling Andrew Gelman.

    Strack seems to reject the idea of “falsification”.

    • Neuroskeptic

      Evidence for a hypothesis is simply evidence for the hypothesis.

      Evidence against a hypothesis is complicated.

    • jrkrideau

      So if evidence rejects a hypothesis, the right response is to keep the hypothesis

      Why does the word “economist” keep occurring to me?

      • OWilson

        Most economists, political scientists (and climatologists) make a good living because they are smart enough to give their clients what they want. They are on professionally dangerous ground when asked to predict the future, but they will, qualified with a few “ifs”, “and”, and “could”, and “should”..

        The end user, public advocates, unscrupulous politicians and click bait advocacy journalists never emphasize the small print, so when the “if” and “then” doesn’t happen and the theory or model eventually blows up, there is a very unsatisfactory,outcome for their opponents, nobody actually to blame!

        The same “experts” are still there “explaining” everything all over again!

    • searcherseeker

      Or… maybe the cartoons weren’t funny!

      • Not_that_anyone_cares, but…

        I find The New Yorker cartoons are not as funny now as they were when I was a freshman in college.

        • smut clyde

          IKR? Punch cartoons stopped being funny some time in the late 1950s.

  • Dorothy Bishop

    My goodness, this really does confirm that some people just don’t get it, that ‘significant’ results can come out of random data. I regard it as crucial in statistical training that students are taught how to generate and analyse random data, so they can grasp this point. For a simple starting set of exercises, see

    • Alan_McIntire

      Right you are. After reading about the arcsine law, and an article on “‘R” posted in a blog by statistician William Briggs,
      I created a random number generator, +1 for heads, -1 for tails.
      When I took running totals of 50 coinflips or more, and checked the significance, the p value was on the order of 10^-12 or more despite the fact that the data was from randomly generated coinflips.

      Here’s the simple program for “R”

      v<-c( rbinom(50,size=1,p=0.5))

      Of course, the v2 showed insignificant correlation, as was to be expected, but when the running totals were taken, the v4 result showed a bogus "high significance" thanks to the arcsine rule.

  • redlobster

    The real conclusion to be drawn here is that he is a bad scientist.

  • OWilson

    Evolution has given the human a mental shortcut to processing information quickly. It insists on “connecting the dots”.

    Something moves in the forest, is it a tree branch, or a snake? Your brain reacts immediately, as if it is the latter, and tends to keep you safe.

    So we involuntary see patterns in every random accumulation, Abe Lincoln on a potato chip, Noah’s Ark in a Rock outcrop, and a human sculpture on Mars.

    The problem occurs when different folks connect the dots in different ways.

    Some will see a Great Bear in a star formation in the sky, while others see a ladle, or dipper. It’s a human construct born of personal experience.

    An elaborately shaped crop circle can be seen as an alien sign, because it doesn’t exist in nature.

    People can look at a chart and derive any manner of projection they desire to see, it depends on the individual fractal they are looking at. Step back a little and the trend disappears.

    Amongst my post career present job duties in the health and welfare field is charting and statistical analysis.

    We can take any data series and make it look like a positive, negative, or a null trend just by changing the scale of the chart. It is very easily manipulated.

    Some patterns that appear to the observer and give rise to studies to find cause and correlation, can disappear during rigorous testing, to the dismay of the researcher, who may even then be unwilling to give up her initial premise.

    We have seen many such cases in the study of the climate!

  • polistra24

    If a null result is “uninformative” then there’s exactly no reason to perform or pay for the experiment. Asking a yes/no question is only worth the trouble if you use the answer.

    “Do you want a cup of coffee?”

    “No thanks.”

    “Okay, here’s your coffee.”

    “Hey! Why did you ask me?”

  • P.Mathivanan

    In Natural Science, any number of replications should produce the same result as the original research. It then becomes a law. However, in Social Science, this need not happen because of the change that takes place during the period of first to replicated research irrespective of whether one gives importance to social actors or not. For example, ‘fear about unknown’ is a known phenomenon in social science. However, the degree of fearfulness many vary from time to time depending upon the events we experience in real life. The strike of a meteorite is not a curse of God if know the laws of natural science. This, in effect, reduces the degree of fear. However, if you come to know that the strike of a meteorite is the work of aliens, then the degree of fear will go up. People may even believe in the ‘curse of God’ phenomenon.

  • Uncle Al

    Fritz Strack…famous pen-based grinning study
    17 different sets of researchers… around the world…1894 people
    The results were firmly null” Test blue, red, and yellow pencils of varying lengths, hexagonal, elliptical (carpenters), and round, new and sharpened; 9H, 8H, 7H, 6H, 5H, 4H, 3H, 2H, H; F is middle hardness; then HB, B, 2B, 3B, 4B, 5B, 6B, 7B, 8B, 9B (the softest). Track variance.

    First write papers then make observations to maximize DCF/ROI. Social intent cannot be erroneous, but it does require mumble factories for justification. “We have always been at war with Eastasia.”

  • David Lane

    Great article but I think there are exceptions to “In my view, a well-designed study is one that will be interesting and informative whatever the results are (assuming the study ‘works’ in a technical sense.)” One that I can think of is the pioneering work of Garcia on taste aversion in which his results were so counter to the prevailing beliefs that his article was rejected by the most prestigious journal (the editor later stated rejecting the article was his biggest mistake as editor). However, if Garcia’s experiment had produced no evidence for his then “way-out” theory, it’s hard to argue the results would have been interesting and informative. They would have confirmed what everyone already “knew.”

    • Neuroskeptic

      Thanks for the comment. I agree that in a case like Garcia, null results would “merely” confirm the existing theory, but I think this confirmation would still be informative. It would show that the accepted model does apply to a particular case. Which might not be very exciting, but it’s not a trivial result (the case might have been an exceptional one – which indeed is what Garcia found.)

      I would be more confident in a theory which had been tested and found to hold in all cases by a would-be Garcia, than a theory which had never been tested in such a way.

      • David Lane

        I agree that knowing that kind of null result would be valuable. However, until paper journals that necessarily ration publication space are fully replaced with online journals with negligible marginal costs for articles, null results consistent with conventional wisdom will never see the light of day.

  • Zachary Stansfield

    Glad to see this study being tested directly, I was just reading some of Kahneman’s book Thinking Fast and Slow, and this was up there with some of the more unbelievable results that should be questioned.

    I think what we’re going to find over the next while in social psychology is that beyond the obvious problems with poor research practices that produce lots of false positives and a strong pressure to publish non-intuitive findings, will be a recognition of how little this field has attempted to reconcile its findings with other areas of cognitive science.

    A lot of these social priming studies rely on proposed “associative” effects that don’t really fit with the typical time course over which true priming effects occur, and further often rely on a naive assumptions about how the brain works that require interactions between multiple brain circuits in ways that we know are not likely to be true (a great example is the famous semantic prime study where priming the word “old” lead people to walk slowly).

    Unfortunately, a lot of people have made their careers on this field, and they are unlikely to give in easily to criticism. In 20 years, however, they will mostly retire and a new group of researchers will undoubtedly replace them with hopefully more valid ideas.

    • smut clyde

      how little this field has attempted to reconcile its findings with other areas of cognitive science.

      Strack would have the interpretation of negative evidence handled “by scientists who are experts in the field”. By true believers, if you like, of proven fealty. No role there for input from scientists who are experts in some related field (like cog.sci.).

  • John F. Bramfeld

    I agree that intuition is extremely important to science. As a non-scientist, When I and millions of others read about something like the smile research, we say “that’s interesting” and move on, never to think of it again. Science 0, Intuition 1.

    Psychologists apparently do that, too. If they didn’t, it would have been necessary to re-examine ever therapy based on getting past various kinds of “repression”. How exhausting it would be for them to be real scientists.

  • Jim Croft

    The problem is as I see it: human behavior is not really as reliable as say the yield from a chemical reaction. The biggest problem is that researchers are required to produce results in Social Science they therefore become bullshitters and it builds up until the pile falls over.
    I read about this microagresser theory and cognitive dissonance it’s BS on top of BS.
    I wonder how many social scientists have a background in mathematical statistics?

    • GeorgeHanshaw1

      I know a few that can do a Student’s T statistic. They may or may not understand it, but they can get it to run on their laptop at least three times out of five.

  • GeorgeHanshaw1

    This is old news. The very “Bible” of psychologists and psychiatrists, the DSM, is simply based upon contemporary opinion of those who create it, devoid of anything resembling actual research.

    That’s why the diagnoses whipsaw from psychopathology to normal variant and vice versa, depending on the prevailing social norms of those voting. Psychobabble is a science free zone.

    • Grim Beard

      You might want to learn the difference between psychology and psychiatry.

      • GeorgeHanshaw1

        What makes you believe I DON’T know the difference? In fact, I’ve supervised both. And they both use the DSM. For that matter, so did I, but it was DSM-IV back then.

        • Grim Beard

          What makes me believe you don’t? Your reference to the DSM as the “Bible of psychology”, your dismissal of psychology as “psychobabble”, and now your copy-and-pasted comments about the DSM.

          Most psychologists do *not* use the DSM (or the ICD-10) because most psychologists are not clinical psychologists. For example, I am a cognitive psychologist, for whom the DSM is a “Bible” only in the sense of being largely irrelevant to me.

          Of particular importance is that the article on which you are commenting is nothing to do with clinical psychology. Why you would choose to even mention the DSM, if not through ignorance of the topic, is a mystery.

          • Neuroskeptic

            Grim Beard is right. The APA is called the “Bible of Psychiatry” because it contains definitions of psychiatric disorders. It is also used by some clinical psychologists, but the majority of psychologists are not clinical psychologists. There is no “Bible” of psychology, psychology is a very diverse field.

          • GeorgeHanshaw1

            Apparently the American Psychological Association disagrees with you concerning the DSM’s importance to the profession. Obviously they must be wrong.

            Do you often have these feelings if grandiosity…?

          • GeorgeHanshaw1


            An excerpt:

            The next DSM
            A look at the major revisions of the Diagnostic and Statistical Manual of Mental Disorders, due out next month.
            By Rebecca A. Clay
            April 2013, Vol 44, No. 4
            Print version: page 26
            A look at the major revisions of the Diagnostic and Statistical Manual of Mental Disorders
            After a 14-year revision process and a lot of contentiousness, the latest version of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) debuts May 22. What changes will affect psychologists?
            The new manual reflects the wealth of research and knowledge researchers and clinicians have generated since the last revision, says James H. Scully, MD, medical director and chief executive officer of the American Psychiatric Association, which publishes the DSM. “Our hope is that by more accurately defining disorders, diagnosis and clinical care will be improved and new research will be facilitated to improve our understanding,” he says.
            The updated DSM incorporates new findings while taking into account mental health professionals’ need for consistency, says Chris Hopwood, PhD, an assistant professor of psychology at Michigan State University. “There’s been a tension between the desire to move … toward more evidence-based models on the one hand and the need to not disrupt clinical practice as it stands,” says Hopwood, who will be presenting workshops on the DSM-5 at APA this April and at various state psychological associations throughout the year.

      • GeorgeHanshaw1

        From the American Psychiatric Association website:

        DSM History

        The need for a classification of mental disorders has been clear throughout the history of medicine, but until recently there was little agreement on which disorders should be included and the optimal method for their organization.

        The many different classification systems that were developed over the past 2,000 years have differed in their relative emphasis on phenomenology, etiology, and course as defining features. Some systems included only a handful of diagnostic categories; others included thousands. Moreover, the various systems for categorizing mental disorders have differed with respect to whether their principal objective was for use in clinical, research, or administrative settings.

        Because the history of classification is too extensive to be summarized here, this summary focuses only on those aspects that have led directly to the development of the Diagnostic and Statistical Manual of Mental Disorders (DSM) and to the mental–disorders sections in the various editions of the International Classification of Diseases (ICD).

      • GeorgeHanshaw1

        From the American Pychological Association website:

        APA Recommends
        1.The Diagnostic and Statistical Manual of Mental Health Disorders, 5th ed.
        The Diagnostic and Statistical Manual of Mental Health Disorders, 5th edition (DSM-5) is the standard classification of mental disorders published by the American Psychiatric Association.
        Web Page
        2.The next DSM
        A look at the major revisions of the DSM, including a developmental focus, new diagnostic criteria, emphasis on race and gender, and inclusion of ICD-10 codes.
        Magazine Article (April 2013)

  • Mark Johnson

    The problem to me about this article, about the failed replication,
    and many of the comments it that Strack, Martin and Stepper is being evaluated in isolation and out of the broader scientific context. Honestly, did anyone take a second to look at original sources before reporting or commenting.

    First of all, this is conceptual research—the focus is on underlying
    causal mechanisms, it’s about theory, not the artificial operations used to manipulate those causal variables. The purpose of the research was to investigate “the hypothesis that people’s facial activity influences their affective responses”, taken from the 1st sentence of Strack’s et al. (1988) abstract. Pens in mouths, smiling or frowning, they really are not important—they are not part of the conceptual relationship being tested. Yes, the pen-in-mouth manipulation attracted media attention because it is somewhat ridiculous, but the study wasn’t conducted to show that holding a pen in your mouth different ways affects your mood. Nobody who cares about understanding
    causal mechanisms cares whether pens-in-mouth or whether some different operation was used to manipulation facial expression.

    Second, it is very important to realize that the Strack,Martin and Stepper study that failed to replicate was a replication itself.
    What about Laird (1974), Chupchik & Leventhal (1974), Duncan & Laird, (1977), and Zuckerman et al., (1981) that all found comparable findings? The failed replication article itself (Wagenmaker et al., 2016) notes that the “facial feedback hypothesis is supported by a number of [more recent] related studies (Kraft & Pressman, 2012; Larsen, Kasimatis, & Frey, 1992; Soussignan, 2002)”. So we are talking about a body of interconnected research conducted over several
    decades all showing the same results, but one cherry-picked study seems to fall apart under scrutiny. The Neuroskeptic article and most comments seem to approach this like Strack decided out of the blue to shove pens in people’s mouths and see what happens, rather than address is as a part line of thinking and empirical study dating back to William James (1884) and continuing to this day.

    Now, I have no idea if Strack’s results were a Type I error, an artifact, or rather whether something was amiss with the (multiple) replication
    attempts; nor do I have response to Strack’s comments. The facial feedback hypothesis has always been controversial and well before Wagenmaker et al., 2016, some studies failed to find evidence in support of the theory (e.g., Buck, 1980; Ellsworth & Tourangeau, 1981; Matsumoto, 1987). Fiske & Taylor (1991) postulated some time ago that contradictory findings might be due to differences in asking people to exaggerate spontaneous expressions versus posing expressions.

    At the end of the day, the Wagenmaker et al. (2016) failed
    replication study is pretty meaningless. A failed replication doesn’t simply cancel out a significant effect, it does not necessarily falsify anything, nor is it the case that the most recent study results is automatically the correct one. Enough failed replications can indicate that that a theory is not correctly articulated, and that it needs to be modified, but unless the failed replication research can explain why significant effects were found in the first place, a failed replication does not undo them. [This is true when there is a body of empirical literature already exists; it is different case entirely when the research being challenged is based on only one or two studies and does
    not have a large base of interconnected research supporting it].

    Failing to replicate a study does not mean the original was
    based on bad science. But it is bad science not to consider the broader body of research that lead up to the original study being challenged in the first place.

    • smut clyde

      Strack, Martin and Stepper is being evaluated in isolation and out of the broader scientific context.

      NS did not evaluate Strack, Martin & Stepper. The topic of the post was Strack (2017), and Strack’s response to Acosta et al.. Honestly, did you take a second to look at original sources before reporting or commenting?

      the Wagenmaker et al. (2016) failed replication study is pretty meaningless.

      If a massive study with 17 research groups and 1894 subjects is “meaningless”, then we’re all wasting our time doing research and we might as well go home and wash the cat.

      • Uncle Al

        Stanley Milgram/Yale randomly dropping unstamped addressed envelopes in random places. That remains solid The pencil study is another cherished psychological avatar. It is conclusively crap.

        Falsification is absolute (Galileo, Popper). If it empirically fails, it is wrong. Dump the garbage or your discipline stinks. Stinky disciplines fear, abhor, and deny falsification, Psychology deeply stinks, hurting untold numbers of people while stinking.

    • PsyoSkeptic

      No one was saying the original was based on bad science nor that it directly addresses the facial feedback hypothesis when smiles are used. This was to address one of many possible reasons for that feedback effect. It wasn’t supported. Strack should have conceded instead of being ridiculous.

      The bad science is not Strack’s initial study, it’s his unscientific response to the RRR.

  • hownow

    …one version of old saying…don’t believe anything you hear and only half of what you read/see…might need an update…don’t believe anything you hear, less than half of what you read/see, and be skeptical of that…

  • Jenny H

    I think the really big trouble is that only ‘successful’ (aka those that conform the researcher’s hypothesis) trials make it into the research paper :-(
    Who in their right mind would write, we had to redesign this study 10 (or more) times before we got a ‘result’??
    It all reminds me of my Chem II titrations. Instructions — repeat the titrations 3 time then average the results. Actuality — repeat the titration as many times as you need to to get three readings that are reasonably close together.

  • Jenny H

    Not to mention: Charles Darwin.
    “I had . . . followed a golden rule, namely, that whenever a published fact, a new observation of thought came across me, which was opposed to my general results, to make a memorandum of it without fail and at once; for I had found from experience that such facts and thoughts were far more apt to escape from the memory that favourable ones. Owing to this habit, very few objects were raised against my views which I had not at least noticed and attempted to answer.”

    • OWilson

      The logic of a genius.

      They don’t come along often, these days!

  • Fritz Strack

    Thanks for all the attention. But keep your gunpowder dry.
    There is more to come 😉

    • Neuroskeptic

      Thanks for the comment! I will be waiting with interest

      • Fritz Strack

        On Andrew Gelman’s blog, EJ Wagenmakers says:
        July 9, 2017 at 9:33 am
        There is an interesting new development here. I recently read on Twitter that Tom Noah and his advisor have conducted an experiment that purports to show that the effect is present when there are no cameras to monitor whether the pen is held correctly. I have not seen the data and I find the hypothesis (i.e., that the cameras somehow make the effect disappear) implausible. I’m also not sure how many other experiments have been conducted that do not find the result (and we don’t learn about). But if I can’t find a serious flaw in the design I’ll certainly propose an adversarial collaboration to sort this out. The effect is in most psych textbooks so it deserves to be looked at from all angles. I have been told that Tom shared his materials on the OSF, and I’m excited about such a high level of transparency.

  • Fritz Strack
  • Pingback: June 11, 2017 – Stuff I Found Interesting – Musing Codger()

  • Pingback: The Kinkiest Scientific Study Ever? Neuro-BDSM – KESIMPULAN()



No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.


See More

@Neuro_Skeptic on Twitter


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar