Eteplirsen: A Curious Scientific Controversy

By Neuroskeptic | August 25, 2017 2:30 pm

In April 2016, an FDA committee voted not to recommend acceptance of eteplirsen, a drug designed to treat muscular dystrophy. In September, however, the FDA did approve the drug, following a heated internal debate.


This wasn’t the end of the story, however. What followed was an unusual scientific controversy that played out in the peer-reviewed literature, discussed in a Retraction Watch post this week.

Following the approval of eteplirsen, Ellis Unger and Robert Califf wrote a letter to the journal Annals of Neurology expressing concern over a paper about the drug published in that journal in 2013. This was a remarkable intervention, given that Califf was at the time head of the FDA, while Unger had led the FDA’s eteplirsen review team.

The paper Unger and Califf criticized was one they were well-acquainted with, because the case for the approval of eteplirsen had largely rested on it. Authored by Mendell et al., the 2013 paper reported on a clinical trial of eteplirsen in 12 children suffering from Duchenne muscular dystrophy (DMD). Mendell et al. reported that eteplirsen was able to increase levels of dystrophin, the protein that is deficient in DMD, as well as producing clinical improvement.

In their letter, the two FDA critics focussed on how Mendell et al. measured dystrophin. According to the 2013 paper, muscle biopsy samples were stained for dystrophin and then “evaluated by blinded expert muscle pathologists” (note the plural) to count the percentage of dystrophin-positive muscle cells.

Unger and Califf however say that an FDA lab inspection revealed that all of the biopsy stains had been evaluated by a single individual, and they describe this person as a ‘technician’, a word that implies someone more junior than a ‘pathologist’. They quote from an FDA report of the lab visit:

The immunohistochemistry images were only faintly stained, and had been read by a single technician using an older liquid crystal display (LCD) computer monitor in a windowed room where lighting was not controlled. (The technician had to suspend reading around mid-day, when brighter light began to fill the room and reading became impossible.)

Further, the FDA learned of problems with the blinding. The technician rating the images was blinded to treatment group (drug or placebo), but he or she was aware of when each biopsy had been performed. In Mendell et al.’s design, all patients received the drug at the final, 48 week timepoint. Thus, Unger and Califf say, the large increases in dystrophin expression seen at 48 weeks could have arisen “simply by having a lower threshold for calling fibers ‘positive’ at later time points in the study.”

Unger and Califf reveal that in the light of the limitations of Mendell et al.’s analysis, the FDA encouraged the researchers to re-analyze the biopsy data, with three independent, fully blinded pathologists as raters. This revealed much lower dystrophin-positive fibers, and no evidence of a treatment effect. This image shows the difference between the old and the new analysis:


In his rebuttal to the Unger and Califf letter, Mendell said that the lower dystrophin levels in the re-analysis were not unexpected, because the FDA told the raters to use more stringent criteria when classifying cells as dystropin-positive:

In the recount, three independent pathologists reported the results using the newly established criteria that excluded any muscle fibers with partial dystrophin staining (borderline positivity) and fibers with membrane staining that touched the borders of the image.

Mendell goes on to say that:

The independent pathologists performing the recount using the more-conservative scoring protocol confirmed the increase in dystrophin-positive fibers in the treated samples, with a mean of 16.27% increase in the number of dystrophin-positive fibers (p ≤ 0.001), and a 15-fold increase between the pretreatment and post-treatment samples… the finding that the treated patients had 16.2% dystrophin-positive fibers confirmed unequivocally that eteplirsen can restore dystrophin to levels that have been associated with milder phenotypes

I have to say that Mendell’s response struck me as unconvincing. For one thing, there is really no excuse for relying on a single rater in a study of this kind, especially given how much was at stake: DMD is currently an incurable disease. The efficacy of eteplirsen, or lack of it, is of huge clinical importance.

Using multiple independent raters from the start would have increased accuracy and also allowed inter-rater reliability to be assured. (Mendell in fact says that rating was done by “an expert pathologist with the assistance of an experienced staff member” but I don’t think this refers to two independent raters.)

I’m also not sure what Mendell is talking about when he refers to a “15-fold increase” between the pre- and post-treatment samples in the new analysis. Such an effect would be very impressive, but no group showed such a dramatic increase according to Unger and Califf’s graph above. At best, the 30mg/kg group showed about a 2-fold increase.

This study was also extremely small, with only four patients receiving placebo. I wonder if this is one of those studies that is so small that it is more likely to mislead than to inform us. I know that it’s very expensive to conduct a study like this, and I’m sure the researchers did everything in their power to increase the numbers. But it makes little sense to talk about p-values like p<0.001 when there are only a handful of datapoints.

CATEGORIZED UNDER: papers, select, Top Posts
  • smut clyde

    “evaluated by blinded expert muscle pathologists”

    Being dazzled by background daylight reflecting off the screen is a kind of blinding.

    • Neuroskeptic

      The rater was blinded, between 11 am and 2 pm (weather permitting.)

      There were multiple raters, whenever the technician was standing in front of a mirror.

      • Sys Best

        multiple personality?

  • Uncle Al

    Re Lipinski’s rule of five, “a molecular mass less than 500 daltons.” Eteplirsen has MW = 10,305.738, C364H569N177O122P30
    (“We’re gonna need a bigger vial.”)

    • Erik Bosma

      That’s some big molly. I wonder how they all stick together. Maybe they come with 2 AAA batteries.

  • Erik Bosma

    So then partially approve it and consequently produce a much larger sample size. After a predetermined length of time, stop the treatment and redo the analysis. If eteplirsen (sounds like a Norwegian word) was effective THEN approve it for permanent use.

    • Neuroskeptic

      This might work but I don’t think the FDA is allowed to conditionally approve drugs. Once something is approved, it is supposed to remain approved unless new evidence comes in suggesting it’s unsafe (AFAIK)

  • Dr__P

    It makes little sense to talk about p values in the first place. Talk about effect sizes and PRACTICAL significance instead



No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.


See More

@Neuro_Skeptic on Twitter


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar