Do You Know What’s Good For You?

By Neuroskeptic | July 9, 2013 5:16 pm

This post draws on the results of the controversial PACE Trial (2011), which compared the effects of four different treatment regimes for chronic fatigue syndrome (CFS).

However, this post isn’t about CFS. What interests me about PACE is that it illuminates a general psychological point: the limited nature of self-knowledge.

Patients in PACE were randomized to get one of four treatments. One was called APT. People randomized to this therapy did no better, in terms of symptoms, than people assigned to the “do nothing in particular” control condition, SMC.

However, people on APT said they were much more satisfied with their treatment than did the ones on SMC (85% satisfied vs 50%).

Two other conditions, CBT and GET, were associated with better symptom outcomes than the other two. People in these groups were satisfied – but no more so than the people on APT, which, remember, was quite a bit worse.

Here’s the symptom scores: APT was close to the dotted line, meaning no effect:

So satisfaction was unrelated to efficacy – even though it was the very same people judging both: the symptom outcomes were self-rated.

“How could anyone feel equally satisfied with a treatment that works and one that doesn’t work?”, you might ask. But no-one got the chance to do that, because no patient was in a position to compare two treatments. They got one each.

So they had no independent yardstick against which to measure the treatment they got. All they had was their own mental yardstick: their perceptions and expectations of what a satisfactory treatment should do (and if it does less, it’s unsatisfactory).

People in the APT arm evidently had a different mental yardstick to those in the two more effective treatment conditions, because they were all equally satisfied despite different outcomes. Why that might be is another story.

We all have mental yardsticks as to what we ‘should’ feel, what is ‘normal’ as opposed to ‘too much’ or ‘too little’ in different situations.

They’re the barely-acknowledged foundation stone of modern psychiatry: psychiatrists use theirs to judge patients’ minds, and patients use theirs to judge their own.

But where do these yardsticks come from?

And should we trust them?

ResearchBlogging.orgWhite PD, Goldsmith KA, Johnson AL, Potts L, Walwyn R, DeCesare JC, Baber HL, Burgess M, Clark LV, Cox DL, Bavinton J, Angus BJ, Murphy G, Murphy M, O’Dowd H, Wilks D, McCrone P, Chalder T, Sharpe M, & PACE trial management group (2011). Comparison of adaptive pacing therapy, cognitive behaviour therapy, graded exercise therapy, and specialist medical care for chronic fatigue syndrome (PACE): a randomised trial. Lancet, 377 (9768), 823-36 PMID: 21334061

  • Valentijn

    All of the yardsticks in that study are solely the perceptions of the patients and the study’s staff on questionnaires. There were no objectively measured outcomes used in determining “recovery”.

    The more interesting question is “why were patients equally satisfied while reporting different levels of recovery?” The answer would likely be in the treatments themselves – CBT/GET patients are encouraged to think of themselves as not being ill, and studies indicate that they have an increased tendency to say they’re better for up to a year or so after the intensive treatment/brainwashing.

    Oh, and the physical functioning and fatigue questionnaires? Patients could stay about the same (or even get a bit worse) and be classified as recovered. The entry “disabled” requirements and exit “recovered” definition have a fair bit of overlap.

  • adrian

    The graphs do not show symptom scores but they show statistics taken from questionaires that as with the satisfaction ratings are all concerned with patient perceptions. (different yardsticks)

    It is worth digging further into what these statistics on questionnaire scores,

    Firstly there is the fatigue mean difference. But the fatigue questionaire covers both mental and physical fatigue which, although loosely corrolated, they change independantly. Hence its an agregate of two figures with an implied utility function based on the different numbers of questions. Without explicitly validating this utility function (as is done for the EQ-5d scale) then it is hard to extract any meaning from the mean and standard deviation. There is another problem with it as a scale in that it is not clear that a person originally at point x and after a treatment scores y has the same improvement as someone starting at point q and improving by (y-x). That is the scale has not been demonstrated to be linear. Looking at the questions it is hard to imagine it is linear.

    The physical function is measured using the sf36 physical function scale which has similar problems in terms of lineararity. To my mind it is not clear that their is a single concept of physical function which is demonstrated by looking at questionnaire scores (at least for the general population) where there is a vast array of different orderings of question answers – suggesting different underlying issues in physical function.

    So we have 3 very subjective scales and are comparing them using invalid summary statistics. It is perhaps interesting that the results are inconsistant but this may just point to issues with subjective nature of all the measurement systems and different amounts of placebo effect for each treatment arm. To draw strong conclusions it is up to the authors to demonstrate that this is not the case.

    It should be noted that experiments on placebo effects with asthma showed that people reported reduced symptoms for sham treatments when filling out questionnaires but objective measures showed no improvement. This goes to demonstrate the dangers of drawing strong conclusions from subjective data. The only objective data published from the PACE trial thus far (job data and the 6 minute walking test) do not back up the more subjective questionnaire data. If you dig around a bit and find accelorometer data for similar trials it tends to show no change in activity levels even though self reported questionnaire scores improve.

    It is an interesting question to look at the structure of the various questionnaire measurement systems and how people answer them and what that actually means.

    When quoting statistics it is important to understand both the structure of the questionnaire and the distribution of the data. In one of the follow up papers to this one talking about recovery they use the mean and standard deviation of a survey of the UK population to conclude that over half the adult working age population would have a score of less that 85 on the sf36 physical function scale – However the median is 100. Suggesting they and others have not thought carefully about the statistics of what they report.

    So what should we learn from this:

    1) There is a need to publish all data (i.e. questionnaire answers) rather than summary statistics that are unreliable or simply wrong. The median and percentiles are generally more robust.

    2) Its too easy to develop a questionnaire and abstract it into a ‘scale’. Many systems break due to failed assumptions made as such abstractions happen.

    3) Objective measures need to be given to back up subjective measures particularly when they contradict each other (e.g. satisfaction, fatigue and physical function).

    4) don’t trust trial results where they don’t publish all data and stick to the original protocol in their reporting.

  • andrewkewley

    That is why we should rely on objective measures, such as neuropsychiatric testing and actometer data. At baseline, those measures when compared with matched controls show lower levels of performance (greater latency for example) and low activity levels.

    I recently wrote an invited commentary on this topic (in press), but the point is that so far, CBT has not shown improvement in these measures in RCTs.

    It is not rocket science, objective yardsticks do exist.

    • tomkindlon

      Yes, it is different from many other areas where CBT might be used. If one wants to measure physical functioning and investigate whether it has improved, relying on self-report measures such as the SF-36 physical functioning (SF-36 PF) scale, one of the primary outcome measures in this study isn’t necessarily ideal. Self-report measures can be inaccurate and not necessarily correlate well with actual abilities. Particular biases may exist with therapies such as CBT and GET for CFS which are designed to increase people’s confidence in their ability to exercise.

      For example, CBT and GET did well on self-report measures but
      “CBT and GET did not significantly reduce employment losses, overall service costs, welfare benefits or other financial payments” (see comment: ).
      There was no difference between the 6 minute walking distances/improvements in the CBT, APT and control groups even though the CBT group reported improvements on the SF-36 PF.

  • tomkindlon

    Interesting point.

    A couple points come to mind.

    (I) Would CBT, GET and APT have all got such high satisfaction scores with better instruments to measure satisfaction. “Results of single-item ratings or overall satisfaction surveys are over-optimistic and do not represent the true indication of care” (Heidegger et al., 2006).

    (II) Would the satisfaction have been maintained for longer periods. I have come across people who were satisfied soon after a CBT or GET but not in the long term. These therapies (as can be seen from the manuals for example) for CFS give the impression people will get back to full functioning. Filling people with hope and optimism that a therapy can do this may lead to satisfaction. When, as is very often the case, people don’t get back to full functioning, they may be less satisfied with the therapies. Indeed, they may be quite dissatisfied that the therapies were misleading and dissatisfied that this is all that is offered to them (and dissatisfied that some professionals give the impression that all is required, which can mean other angles for treating the condition are not sufficiently investigated).

    Heidegger T, Saal D, Nuebling M. Patient satisfaction with anaesthesia care: what is patient satisfaction, how should it be measured, and what is the evidence for assuring high patient satisfaction? Best Pract Res Clin Anaesthesiol. 2006 Jun;20(2):331-46.

  • Doug Fraser

    I was interested to read in the trial protocol that: “A measurement
    of participant *satisfaction with the trial* will also be taken at
    52 weeks” (under secondary outcome measures

    Rather differently, the Lancet article states: “At 52
    weeks, participants rated *satisfaction with treatment* received on
    a 7-point scale, condensed into three categories to aid
    interpretation (satisfied, neutral, or dissatisfied)”.

    The possibilities across that scale would consist of the following:

    Very satisfied / Moderately satisfied / Slightly satisfied / Neutral
    / Slightly dissatisfied / Moderately dissatisfied / Very

    This is also described in the Lancet report as : “Scored 1–7 (1 =
    poor, 7 = excellent)” .

    The article claims “high rates of participant satisfaction”.

    There seems to be more than a trial-arms-worth of data missing
    around this claim, and I cannot identify a way of assessing the
    claim (“Percentages exclude missing data”), although it’s
    probably obvious to the trained scientist.

    It might be that 59% of the public involved in the Standard Medical
    Care arm of the trial were very dissatisfied (or even worse) with
    their two or three appointments with an experienced Doctor, and
    one would obviously want to know why, but it also could be that the
    missing 67/160 of SMC entries reported people as being very
    satisfied (giving 89%
    rather than the 50% you mention).

    Those on the (illogical to me – and clearly not normal “pacing”) APT
    arm and the other arms (CBT and GET) might only have been
    slightly satisfied with their fourteen or fifteen appointments, and
    strangely, there don’t appear to be any
    neutrals. The scale was “condensed into three categories to
    aid interpretation (satisfied, neutral, or dissatisfied)”.

    It seems unlikely to me that even the most polite of twenty to
    forty year-old individuals on the trial would register much, if any,
    satisfaction with treatments that left them functioning physically
    at a level one might expect of a
    frail eighty year old, after a year of exposure, and constant
    homework. Presumably there’s some misunderstanding over what’s being
    referred to on the satisfaction scale.

    The evidence from the Pace trial
    results undermine the premises of CBT and GET for CFS (“perpetuated
    by reversible physiological changes of deconditioning and avoidance of
    activity”), so it should provide quite a stimulus
    to alter some of the more sclerotic seeming yardsticks, but that would
    take courage and independence, and honesty.

    • Doug Fraser

      Neurosceptic wrote “We all have mental yardsticks as to what we ‘should’ feel, what is
      ‘normal’ as opposed to ‘too much’ or ‘too little’ in different

      They’re the barely-acknowledged foundation stone of modern
      psychiatry: psychiatrists use theirs to judge patients’ minds, and
      patients use theirs to judge their own.

      But where do these yardsticks come from?

      And should we trust them?”


      The first time you’re confronted with evidence that there’s a
      glaring difference between what you think you feel you’re doing, and
      what you’re actually doing, might come as quite a shock to some
      people, I’d imagine.

      It could come to you in the form of a video showing that your golf
      swing is a mess, although it’s always felt right.

      But the more you fancy yourself as a bit of an expert, the greater
      the shock, and perhaps the greater the temptation to avoid looking
      at the video again. And maybe even more so, might be the temptation to
      avoid mentioning the troubling evidence to your long suffering wife,
      of many years.

      Maybe you were badly taught from an early age, or maybe you just
      gradually slipped into the current habitual mess without really noticing.

      Perhaps there had been no feedback, or perhaps you refused feedback
      from your friends, because although you were aware that
      your game wasn’t exactly the best in town, what you did and continue
      to do feels so right, and hence their criticism must be wrong.

      Maybe your friends became scared of you after you stomped off the
      golf course in a rage (again), and they didn’t dare offer you any
      useful feedback from that point on.

      But maybe they weren’t really your friends at all, and only needed
      (and your connections) to advance their own interests, and tended
      to heap praise on you, for your golfing skills, and were just simulating
      that warmth and empathy.

      And so now you’re convinced that you’re fully clad after all, and
      you like the sound of
      your own voice, so you’re off traveling the world spouting forth on
      golf, mainly in poor countries and dictatorships, billed as the
      world’s leading theoretical golfer.

      Then one day a company on the
      smooch for cheap labour, who happen to notice who you’re rubbing
      shoulders with, and like your patter, make you an offer that you really
      just can’t refuse….and so on and so forth, and that’s one helluva bent
      yardstick you’ve got.

      But of course there are no yardsticks judging minds, just people who believe they’re accurately judging their
      fellow humans, whose shoes they don’t inhabit.

  • Pingback: My Homepage()

  • Pingback: Lee David Harbert()

  • Pingback: تداول السوق - مركز الاسهم السعودية - مركز السوق - مركز السوق السعودية - هوامير البورصة - تداول - اسهم - توصيات الاسهم - تحليل فني - ساحات تداول()

  • Pingback: Is America Less Mentally Healthy Than A Chilean Jail? - Neuroskeptic |

  • Pingback: 10 times parents have tried to play matchmaker for their kids with very mixed results – Hardcore Comedy Ent.()



No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.


See More

@Neuro_Skeptic on Twitter


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar