Inherited Memories: Too Good To Be True?

By Neuroskeptic | October 16, 2014 3:18 pm

In December last year, researchers Brian Dias and Kerry Ressler made a splash with a paper seeming to show that memories can be inherited.

This article, published in Nature Neuroscience, reported that if adult mice are taught to be afraid of a particular smell, then their children will also fear it. Which is pretty wild. Epigenetics was proposed as the mechanism.

Now, however, psychologist Gregory Francis says that the data Dias and Ressler published are just too good to be true: Too much success for recent groundbreaking epigenetic experiments.

ressler_francisFrancis notes that the Dias and Ressler paper reported many individual experiments on mice behavior and each one found statistically significant evidence of inherited fear. However,

The probability of a set of 10 behavioral experiments like these all succeeding is the product of the probabilities: 0.023. This value is an estimate of the reproducibility of the statistical outcomes for these behavioral studies.

That 0.023 means that even if the epigenetic memory effect is real and just like the paper claims it is, the probability of getting uniformly positive results is 2.3%. Francis later writes that, at if you also consider the neuroanatomical evidence also presented in the paper (likewise all positive, he claims), the probability drops to 0.004 or 0.4%.

How could the findings of Dias and Ressler (2014) have been so positive with such low odds of success? Perhaps there were unreported experiments that did not agree with the theoretical claims; perhaps the experiments were run in a way that improperly inflated the success and type I error rates, which would render the statistical inferences invalid.

Researchers can unintentionally introduce these problems with seemingly minor choices in data collection, data analysis, and result interpretation.

Dias and Ressler aren’t going down without a fight, however. In their reply, they say that they did include some negative results, in the Supplementary Material, and accuse Francis of ignoring these. They also protest that:

We wholeheartedly disagree with the shadow that Francis… casts on our experimental design and data analysis. All experiments conducted were reported in the article, which means that no experimental data were excluded.

When one conducts transgenerational studies that are dependent on the vagaries of breeding and husbandry, one uses all that are given and is never wasteful… we stand by our results as robust, reproducible, and verified by blinded assessment.

I’m not sure what to make of this. Francis’s method is a good one. If there were a paper whose results were (say) so perfect that they were one in a million, that would make me seriously concerned.

But it’s all a matter of degree. In a case like this, with a probability of about 1 in 300, then I don’t know what to think. Some people are worried that, for all we know, Francis might be calculating these probabilities for lots of papers and only reporting the lowest ones – which might be chance findings.

After all, this isn’t the first time Francis has made the ‘too much success’ argument. In the past two years he’s written five articles much like the latest one (1,2,3,4,5), taking aim at various psychology papers. All of these analyses have revealed low probabilities. Francis has never (AFAIK) announced that a given paper is not suspect. That’s 6/6 “positive results”. Too good to be true?

ResearchBlogging.orgFrancis G (2014). Too much success for recent groundbreaking epigenetic experiments. Genetics, 198 (2), 449-51 PMID: 25316784

  • Philippe Belanger

    “That’s 6/6 “positive results”. Too good to be true?” No because there’s a process of self-selection at work here.

    • Jonathan Mace

      I think the author was trying to be flippant there…

      • Neuroskeptic


        “A process of self-selection” is what I was flippantly suggesting.

  • Guest

    The issue that I’m having thinking through this is that it’s boiled down the results to exists/doesn’t exist, which I think is really missing something. This manifests in the statement “The probability of a set of 10 behavioral experiments like these all succeeding is the product of the probabilities: 0.023.” Do all 10 need to “succeed” (really end up at p < .05) to prove that this effect is real? I think "yes" could be a valid answer here, but I'm not sure how I'd answer it.

    I have no problem with the general message that some of the inferences are underpowered, but I'm not sure I would jump to "this is too good to be true," but rather to "perhaps this effect size is smaller than we think."

  • Felonious Grammar

    “We wholeheartedly disagree with the shadow that Francis… casts on our experimental design and data analysis. ”

    That’s not an emotional response, is it?

    Yes. Yes, it is an emotional response.

    No “shadow casting”, fellas. It’s unprofessional.

    • A Vulcan

      It is natural to have an emotional response to someone outside your field of study attempting to discredit you. An argument is not discredited by the accompaniment of an emotional response. To suggest otherwise is illogical.

  • Jespersen

    Oh well. With a claim this controversial, surely some other team will go ahead and replicate Dias and Resslers’ research sooner or later? I find it odd that a mechanism this useful could be so obscure as to be overlooked until now.

  • Kyle Jasmin

    It would be a real shame if it turns out Francis had selected multiple papers for analysis and only reported extreme ones. That is reporting bias as well: a “too much too much success” effect…

    I hope the authors and others will try to replicate this work, because it is extremely interesting if it’s real.

    • Neuroskeptic

      It would indeed be a shame. There is no evidence that he is, AFAIK, but equally there is nothing to stop him.

  • Jayarava

    It’s only half a response to question it on statistical grounds. Indeed it sounds like lazy science to me. The criticism ought to have been accompanied by an attempt to reproduce the results based on the hypothesis that they were a statistical blip. If the critic could not reproduce the results with the same experimental set up, then he’s have a case.

    • Greg Francis

      I do not study genetics, epigenetics, or mice, so I am not scientifically qualified to run these kinds of experiments. If I were so qualified, I think the statistics I reported would convince me that it was not worth the expense to run a replication study of these findings, but other scientists are welcome to have a different interpretation.

      Moreover, your faith in replication seems unwarranted in this situation. If we accepted the Dias and Ressler findings as being valid, then an experiment that reproduced the findings in their Figure 1a should only have around a 50% chance of producing a successful outcome. An unsuccessful outcome for that study (with its design and sample sizes) should be common. Showing such an outcome in a single experiment should hardly be convincing.

      • Jayarava

        “I do not study genetics, epigenetics, or mice, so I am not scientifically qualified to run these kinds of experiments.”

        If you don’t understand the experiments how are you qualified to estimate the likelihood or otherwise of their results? I’m not the only one asking this question.

        • Greg Francis

          I understand the experiments (to a large extent they are classic behavioural techniques, and I am a psychologist). But it is another step in expertise to know how to run an experiment of this type (e.g., I don’t know what kind of chow to fed mice or where to buy acetophenone). Dias & Ressler used classic statistical techniques to make their scientific arguments. Those techniques I understand well.

          I do not think this is an uncommon situation. For example, I can detect that a table is not level even though I cannot build a level table.

          • Neuroskeptic

            I’m with Greg on this one: effect sizes are effect sizes, whether they are from epigenetics, behaviours, or anything else.

  • Greg Francis

    Thanks for sharing my article with your readers and for raising interesting questions about the analysis and its interpretation.

    I wanted to address a few points.

    1) Dias & Ressler claim that they did include some negative results. Interpreting this claim requires understanding what counts as a “negative result”, which is always relative to the proposed theory. The non-significant effects reported by Dias & Ressler were not characterised by them as being “unsuccessful” but were either integrated into their theoretical ideas or were deemed irrelevant (some were controls that helped them make other arguments). Of course scientists have to change theories to match data, but if the data are noisy then this practice means the theory chases noise (and the findings show excess success relative to the theory).

    2) I think Dias & Ressler misunderstand the nature of my critique. They say that they stand by their results as being robust and reproducible, but that seems difficult to accept given that their own data suggests that, with experiments similar to those they reported, multiple successful outcomes should be quite rare. Moreover, their opening paragraph reports that they have replicated the findings multiple times within their laboratory. In general, this makes their situation even worse because such replication efforts should have produced some unsuccessful outcomes. The success probability of their original experiments, low as it was, must be larger than the success probability for the original experiments and additional successful replications.

    3) Neuroskeptic suggested that one in a million results would lead to serious concerns but that maybe 1 in 300 is not so bad. I think each scientist can set up their own criteria for such judgments; but these odds are estimates of replication success, and it seems that most scientists want odds better than 1 in 2. Many scientists become concerned when a single effect does not replicate even once, even though for some such cases one would estimate success for only 2 in 3 such experiments. At any rate, the impetus is on the original scientists to provide data that convinces readers to believe their theoretical ideas. If the support for such ideas comes from experiments that should only occur once out of 300 studies I don’t think many scientists are going to be convinced by the theoretical ideas. The 1 in 300 odds might be high enough to imply that Dias & Ressler were simply (un)lucky with their experimental findings, but their data still does not provide good support for their theoretical ideas.

    4) Neuroskeptic noted that there might be bias in my reports of publication bias, which I think represents a common misunderstanding about what the analysis concludes. It certainly is true that (in my one-off analyses) I have not published many analyses of articles that pass the test for excess success (TES). [There is one example in Francis 2012, Psychonomic Bulletin & Review, 975-991.] This selective publishing means that if someone tries to use my one-off analyses to estimate the proportion of articles that pass the TES, then they will get a biased estimate. The solution is simple: do not use those reports to estimate the proportion of articles that pass the TES. However, the selective reporting does not change the conclusion about an article that fails the TES. An article that fails the TES does so regardless of the status (or reporting) of other articles. This issue is discussed in depth in:

    Francis, G. (2013). Replication, statistical consistency, and publication bias. Journal of Mathematical Psychology, 57, 153-169.

    and in accompanying commentaries from several statistically-minded scientists and a reply from me.

    5) An estimate of the proportion of articles that pass/fail the TES can be produced by systematically applying the analysis to a sample of articles. I did this for articles in the journal Psychological Science, and the results are reported in

    Francis, G. (2014). The frequency of excess success for articles in Psychological Science. Psychonomic Bulletin & Review, 21, 1180-1187.

    The take home message is that 82% of the analysed articles fail the TES.

    • Neuroskeptic

      Thanks very much for the reply!

  • Anonymouse

    I commend Francis for his scrupulousness in attempting to assess these results, but I feel that the stats are somewhat flawed, and I imagine Genetics will receive some correspondence on this.

    Post-hoc power is almost always a bad idea, because the probability to obtain a significant result is realised as = 1 or = 0 once the study has been performed. It is not clear what exactly a power calculation, using the obtained effect sizes actually means, conceptually.

    Which brings me to the key unknown in Francis’ analysis – using the effect size estimates presented. These are of course estimates of real population effects, and there is the tendency (noted in Francis’ article) for regression to the mean, i.e. the true population effects are smaller. But equally, the effect might be also larger than the observed ones. This might be less likely, but it is not clear how less likely it is.

    It is impossible to say, without a thorough knowledge of the underlying biology, of the correlations between the various effects, and of the magnitude of the population effects, what the post-hoc probability of observing the results one did, is. I’m sorry to say this a bit of a “what if?” study..

    • Greg Francis

      Indeed, it is silly to compute the probability of an event that has already happened (such as the significance outcome of a reported experiment). What I am doing is estimating the probability of success for experiments like those originally reported. Another concern with power is that it provides no new information beyond the other statistics (such as the sample sizes and the t-value). The observation is valid, but sometimes we want to organize information in a way that promotes understanding of certain patterns, and power (or success probability) is one such organization. Looking at the distribution of p-values is another approach.

      Regarding the use of the effect sizes from the original experiments, a variation of the test for excess success (TES) supposes that the reported effect size describes a distribution and then uses the distribution to compute expected power (or something similar). As it turns out, this typically gives smaller power values than just using the point estimates (power calculations are not symmetric). With a few exceptions (see Francis, 2014; as referenced in my earlier comment), the TES is extremely conservative about judging an article to have excess success. Even with the 0.1 criterion that is typically used, the probability of concluding a problem when everything was actually done properly is close to 0.01.

      If the analyses reported by Dias and Ressler were valid for their data sets, then it is not necessary to know the underlying biology and so forth to estimate success probability. It’s just mathematics. On the other hand, if their analyses were not valid (e.g., there really were correlations between some samples), then scientists should be skeptical about their presented findings/theory regardless of the TES analysis. That does not mean the effects do not exist, but that the presented findings do not provide a convincing argument for their existence.

      • Anonymouse

        Thanks for replying. There’s a lot to think about in your response. I’m not sure this is the best format to have the discussion in, either.

        But to take your first point – that you are not calculating post-hoc power for this study, but rather the power for “experiments like those originally reported”, i.e. with the same sample size and design. Of course, if someone intended to attempt a replication of the exact set of results obtained by Dias & Ressler, your findings show that it would not be worthwhile. Not without increasing the sample size, which any sensible scientist would then proceed to do.

        But I don’t really think that is your intention, is it? To power future studies? Even the title of your paper implies that you are interested in *this particular study* which has “too much success”.

        I’m not surprised that a TES variation assuming a distribution rather than point estimates of the observed effect sizes would give lower power, since you’re adding in extra variability. I guess that wouldn’t work very well since you have only one realisation to work from, unless you are meta-analysing multiple experiments and estimating the variance of a random (study) effect?

        I guess this really isn’t the right forum – if I have a doubt about your method, I should go through the same peer review process you had to! :)

        But I would really query your claim that knowing the biology isn’t important. I can believe that the D&R study had insufficient sample size, and that that implies that positive findings are (post hoc) more likely to be false positives than true positives. You can estimate the family-wise type I error rate by assuming all nulls are true, you can estimate the “family-wise power” by assuming all the nulls are false and the estimates are true population parameters. But what if some nulls are true and some false? Biology might guide that assessment.

        If there is some real and strong (large effect size) phenomenon, then the significance pattern reported by Dias & Ressler might be very likely, even if only one contrast reflects a “real” effect, and the others are piggybacking due to correlation. (of course the converse is true too for a false positive) Your TES probability didn’t take any correlation into account (because it’s unknown, not your fault!), which would have an unpredictable effect.

        OK, I sense my arguments are all over the place, I will stop here. Good discussion, though.

        • Greg Francis

          I would be happy to continue the conversation off-line. You can easily find my email. Just a few comments.

          The majority of experimental results are interesting precisely because of what they tell us about future events. Thus, all inference has this interesting aspect of flipping back and forth between the present findings and possible future outcomes. That’s a philosophical discussion that probably does not play out well in web comments.

          One curious aspect of the Dias & Ressler (2014) experiments is that they did not follow your advice to increase their sample sizes. For example, the findings in Figure 1a rejected the null with sample sizes of 16 and 13 in each group. The findings reported in Figure 5a were based on a similar (not identical) experiment but used the same sample sizes (13 and 16). Maybe the experiments were not run in the order they are presented in the paper, but some early experiments should have informed some later experiments.

          Regarding knowing the biology, I am only saying that it does not matter for the analysis of the findings reported by Dias and Ressler relative to their theory. Part of the reason their findings appeared (at first) convincing was that they reported (seemingly) independent findings. If those findings were not really independent (and a PubMed comment suggests they might not be), then their findings are less convincing because they may (sort of) just be showing multiple representations of an effect that happened to be in one sample. To sort that all out definitely requires knowledge in biology (and related fields). Indeed, a biologically-driven correlation across samples (e.g., pups and parents are related) might be a source of the apparent excess success. But then the analysis methods used by Dias and Ressler were not appropriate. As you noted, if they had considered and reported those correlations, then I could include them in my analysis. As it is, I can only do the analysis based on what Dias and Ressler actually did.

      • Emkay

        stats are always flawed…depends on who writes them up and who reads them…

  • Alex Trotter

    My question here is, if Mr. Francis is only a statistician with no knowledge of biology or genetics, why is he so intent on trying to discredit these researchers? This seems a rather misguided way to further one’s own career.

    • Greg Francis

      In general, I do not think my intent is relevant (it does not change the mathematics). Regarding my career, it helps that I am already a full professor with tenure.

      It was not my intent to discredit Dias and Ressler as researchers, and the statistical analysis does not at all indicate dishonesty on their part. There are many ways for experimental findings to have an excess success. (I have argued in other papers that standard scientific methods can easily generate excess success.) Nevertheless, the findings published by Dias & Ressler (2014) do not seem to provide a good scientific argument for their theory. As a scientist, I have a responsibility to analyze and critique other work and to accept criticism (Dias and Ressler recognise this in their reply). Scientists do not have to do that for every finding they encounter, but there is no question that the findings in Dias & Ressler (2014) are important, if they are true. Future scientists were going to spend a lot of time, money, and resources pursuing these experimental techniques. Maybe they still will, but the statistical properties of the findings in Dias & Ressler (2014) are something scientists should consider before committing such resources.

      • Emkay

        tenure? isn’t that kinda like a labor union, where you don’t really have to work hard or even care if you do a good job?

  • Pingback: ¿La momoria puede heredarse? | Pablo Della Paolera()

  • Pingback: A week of links | EVOLVING ECONOMICS()

  • Pingback: Improve Brain Memory IQ Mind Brain News and Informative Articles | Brain Training / Inherited Memories: Too Good To Be True?()

  • Pingback: Inherited Memories: Too Good To Be True? - Neur...()

  • jvkohl

    Is there a null hypothesis that has been tested to provide biologically-based experimental evidence of cause and effect for comparison? If not, their results attest to the likelihood that conserved molecular mechanisms link the epigenetic landscape to the physical landscape of DNA in the organized genomes of species from microbes to man via amino acid substitutions that differentiate cell types.

    Perhaps what we we see next is experimental evidence that transgenerational epigenetic inheritance is linked via amino acid substitutions to learning and memory in other model organisms, like nematodes, honeybees, birds, and other vertebrates.

    For example, see: Oppositional COMT Val158Met effects on resting state functional connectivity in adolescents and adults

    “The A or Met allele is associated with lower enzymatic activity (due to thermoinstability), and with exploratory behaviour.”

    What good is your exploratory behavior if what you learn about what to eat and who to mate with is not epigenetically inherited by your offspring? How could biodiversity in grazing and predatory nematodes if the molecular mechanisms for the basis of Dias & Ressler’s claims were not conserved across species?

  • disqus_mSbMpSZflX

    I really hope we can get some more research on this done by other scientists. It would be really cool if it could be confirmed.



No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.


See More

@Neuro_Skeptic on Twitter


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar