OXTR gene produces differences in kind behaviour that people can spot in 20 seconds

By Ed Yong | November 15, 2011 9:00 am

Update: I’ve amended this post following some harsh critical comments on the study from geneticists on Twitter, which I really should have noted while going through the paper.

Our genes can influence our behaviour in delicate ways, and these effects, while subtle, are not undetectable. Scientists can pick them up by studying large groups of people, but individuals can sometimes be sensitive to these small differences.

Consider the OXTR gene. It creates a docking station for a hormone called oxytocin, which has far-ranging effects on our social behaviour. People carry either the A or G versions of OXTR, depending on the “letter” that appears at a particular spot along its length. People with two G-copies tend to be more empathic, sociable and sensitive than those with at least one A-copy. These differences are small, but according to a new study from Aleksandr Kogan at the University of Toronto, strangers can pick up on them after watching people for just a few minutes.

Kogan flimed 23 people talking to their partner about a time of personal suffering. He then asked 116 volunteers to watch clips of the conversations and judge the listening partner on how trustworthy, compassionate and kind they seemed. The clips were each 20 seconds long, and none of them had any sound. The viewers could only judge their targets by what they did, rather than what they said.

These listeners differed in whether they had the A or G-versions of OXTR, but neither the viewers nor the people who made the clips knew who had which copies. Nonetheless, Kogan found that people with two G-copies came across better than their peers, regardless of gender. Of the ten most trusted listeners, six were double G-carriers, while nine of the ten least trusted listeners had at least one A-copy.

When Kogan asked two independent people to analyse the clips, it became clear why the listeners gave different impressions. Those with two G-copies made more physical social cues, including as head nods, eye contact, and open arms. Through these gestures, they wordlessly portrayed a kinder and more trustworthy social style. As Kogan points out, OXTR is almost certainly just one of many genes that can affect our behaviour. His study is less about the gene’s power and more about how exquisitely sensitive people are to social cues, such that even slight genetic variations can communicate information through “a few brief moments of behaviour”.

But the paper has drawn harsh criticism from geneticists because of its small sample size. Daniel MacArthur, who blogs at Genetic Future, said, “[A simple size of] 23 for genetics means the paper might as well not exist. It carries no useful info. Without a larger sample and independent replication, it’s safest to simply assume these results are false.” Joe Pickrell from Harvard Medical School, agreed: “If the sample size is 23… there’s no way that’s a real association.”And Chris Gunter from the HudsonAlpha Institute for Biotechnology added, “The literature is full of behavioural genetics studies which don’t replicate, with similarly small numbers.”

We don’t really know how the OXTR gene, or the hormone that it interacts with, exerts its influence. For a start, oxytocin has a misleadingly rosy reputation. This simple chemical has been caricatured as the “love hormone” or “cuddle chemical”, after several studies showed that a sniff of oxytocin could boost trust, cooperation, generosity and empathy. But more recently, other studies have found that under certain circumstances, it can also make people more distrusting, uncooperative, biased and envious.

Rather than universally promoting the better angels of our nature, oxytocin has broader effects on our behaviour. It probably either makes people more alert to social cues in their environment, or motivates them to seek social connections with others. These effects manifest in different ways, both positive and negative, depending on the person and the situation.

Culture matters too – it sets the stage on which OXTR plays out its role. Last year, Heejung Kim showed that American G-carriers are more likely than A-carriers to seek emotional support from their friends in times of need. But in Korea, where it’s more taboo to trouble your peers with your personal burdens, G-carriers are slightly less likely to turn to their friends.

Kim looked at a specific version of the OXTR gene, whose carriers are allegedly more social and sensitive. But this link between gene and behaviour depends on culture; it exists among American people, who tend to look for support in troubled times, but not in Korean cultures, where such support is less socially acceptable. In both cases, the G-carriers became more socially sensitive, but that led to very different behaviour depending on the norms of their own cultures. In Kogan’s study, everyone was a young, white American and as he rightly says, “More work is necessary to replicate and extend the present results to a larger, more diverse sample.”

Reference: Kogan, Saslow, Impett, Oveis, Keltner & Saturn. 2011. Thin-slicing study of the oxytocin receptor (OXTR) gene and the evaluation and expression of the prosocial disposition PNAS http://dx.doi.org/10.1073/pnas.1112658108

Photo by Valerie Everett

More on oxytocin:

MORE ABOUT: oxytocin

Comments (13)

  1. In addition to culture, I’d think any number of other factors would have confounded this, even with a larger study group. Male-female interaction? The nature of the partner relationship? Bell’s palsy? OK, maybe not the last one, but the use of facial expression is hugely variable, and a recent commentary on the Amanda Knox trial noted that as a whole, people aren’t actually that great at reading them, rather more inclining to confounding them with their own inner voices. Did anyone analyze the SNP variation in the *viewers*?

  2. A note on small sample sizes: being myself a statistician, I completely agree on the criticism. However, saying that the “study should not exist” or “the results should be seen as false” is way too harsh. Unfortunately, although we always prod our collaborators to send us LOTS of data, that is not always possible. Assays don’t always yield, the money doesn’t come in (these studies are expensive!), etc. In those cases, what you do is publish your study, point out in the discussion that yes, the sample size is small (and maybe append a power calculation, which wouldn’t hurt), and state your results as TRENDS. Even with small sample sizes, you may be tapping onto something, and your small study may pave the way to larger studies and convince the next committee that your grant is worth more money to get more data.

    Bottom line: yes, the sample size is small. That doesn’t mean it shouldn’t have been published. All it means is “interpret the results with care.” (Disclaimer: I have nothing to do with this paper, no common interest whatsoever; all I wanted to state here is a general comment on sample size and criticism).

  3. floodmouse

    The title of the post references “kind behavior.” The text of the post implies what is actually being measured is “degree of visible social signalling.” In my experience, strong social signallers are not always good at behavioral follow-through. That is, someone might pat you on the hand and say they are SO sorry you are sick, but would the same person walk a mile to bring you a pot of hot soup?

    The terms of the study need to be defined more tightly.

  4. There are a number of serious problems with this paper, most notably in how the statistics are designed and interpreted. The ratings were on a 7-point scale, and in Fig 1 the graph shows only the range 3.2 to 5.2. Assuming that it’s starting on a base-level of 1, that is where the Y axis should be grounded. This is honest data presentation 101. The error bars shown are only at the level of the b value, plus or minus, yet the standard deviations are much much higher than this. The core data is not given, but it strikes me as highly unlikely that a p of <0.001 can be derived for these. Someone with a good grasp of stats might want to have a go at whether the authors' analyses are appropriate. So repeat with a larger group, get all the touchy-feely data in, THEN genotype the "listeners", and we might see whether there is a real effect here. Put me down at the moment as singularly unconvinced.

    But maybe that's because I'm an A/A homozygote, and therefore a callous bastard…

  5. DK

    Kinda OT but something else that stands out in this paper: author attributions. The first author is listed as one who designed and performed research, and analyzed the data, and wrote the paper. There are five more co-authors and none of them as listed as having contributed to the actual research work beyond generic “designed research” and being involved in writing the paper. Not even “analyzed data” (the typical attribution for gratuitous authorship). Some people have no shame.

    Joe Pickrell: “If the sample size is 23… there’s no way that’s a real association.”

    That’s unfortunate wording. There are ways, of course (luck, strong effect). But with N=23 there is no reason to believe that it is real.

  6. This is early research, and more is needed as people have already discussed – bigger studies and more cultural diversity – but it’s interesting how we pick up on behavioural clues and how these *might* be influenced by our genes. We’ve written about it and linked to this post at http://www.genome-engineering.com/are-you-nice-%E2%80%93-or-is-it-just-your-genes.html

  7. We greatly appreciate the debate our study has generated and agree with many of the insights offered by our colleagues across academia. Sample size in particular was a prime concern for us from the inception of the study. Generally, when you find a significant effect with a sample of 23, the effect size must be quite large–something that is highly unlikely in this type of genetics research. So if an effect is found, then normally the reason is that the particular members of the sample were highly unusual, and by chance, other factors that also contribute to the phenotype (here being prosociality), happened to align with genotype.

    But there are two major points working in favor of our study. First, every target was rated by over 100 observers, creating a nested structure in the data (observations of targets nested within individuals). Through this design, we were able to actually have a highly sensitive test that does not require a large effect size to be in place to be detectable because our effective N became over 2000. But concern can still be raised that the targets are unusual and not representative of the general population. Here, we feel encouraged by (a) over a dozen studies with very large samples have now demonstrated a converging result with ours (people with GG genotype tending to be more socially attuned than carriers of the A allele), (b) the effect size in our study was reasonably small (we explained roughly 3% of the variance in observer ratings of prosociality of the targets using target genotypes), which is consistent with previous findings and what we might expect from a single SNP link to any complex phenotype, (c) self reports of compassion and sympathy of the targets correlated also reasonably (i.e. small effect size), though non-significantly (because the effective N for that association is 23).

    Thus our methodology allowed us to magnify any real effect that was present–that is one of the major methodological strengths of our design. But we strongly, strongly agree that replication is absolutely necessary to verify our result and we are currently attempting to do just that with two new datasets with bigger numbers of targets from new studies. We also strongly encourage outside teams to independently verify our results and test the complex moderations that are sure to be in place. We therefore view our result as preliminary, though promising because of its consistency with previous research on bigger samples and its methodological strength in detecting even small effect sizes.

  8. “Kogan flimed 23 people talking to their partner about”

    -> I believe it should be filmed, not flimed

    (you don’t have to approve this…just pointing it out for a correction)

  9. Interpreting the results “with care” is a bit disingenuous though, since basically the implication of a small sample size is that any conclusion could happen by chance.

    What you could say, is that the methodology shows promise to evaluate in a further, larger study.

    But I would say that the study is “false” or “should not exist” is closer to the truth than saying that the results are a “trend” or should be “interpreted with care”

  10. rory


    How does repeatedly measuring the same thing give you another data point. That is not another data point, it is another measure of the same data point. You will get a more precise estimate of that one data point but it is still just one data point.

  11. gillt

    But there are two major points working in favor of our study. First, every target was rated by over 100 observers, creating a nested structure in the data (observations of targets nested within individuals). Through this design, we were able to actually have a highly sensitive test that does not require a large effect size to be in place to be detectable because our effective N became over 2000

    Like PCR it appears you’ve amplified your N value, not as you claim made it effectively 2000. That would suggest magic.

  12. tim

    So… new paper in PNAS shows how this paper misanalysed the data



Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Not Exactly Rocket Science

Dive into the awe-inspiring, beautiful and quirky world of science news with award-winning writer Ed Yong. No previous experience required.

See More

Collapse bottom bar