How Did Researchers Manage to Read Movie Clips From the Brain?

By Valerie Ross | September 28, 2011 2:06 pm

What’s the news:  In a study published last week, researchers showed they could reconstruct video clips by watching viewers’ brain activity. The video of the study’s results, below, is pretty amazing, showing the original clips and their reconstructions side by side. How does it work, and does it mean mind-reading is on its way in?

How to Read Movies From the Brain, in 4 Easy Steps:

1) Build the Translator. The researchers first had three people watch hours of movie trailers, tracking bloodflow in their brains—which is linked to what the neurons are up to, as active neurons use more oxygen from the bloodstream—with an fMRI scan. (All three subjects were part of the research team; over the course of the study, they had to be in the scanner for a looong time.) The team focused on brain activity in a portion of each person’s visual cortex, compiling information about how 4,000 different spots in the the visual cortex responded to various simple features of a movie clip. “For each point in the brain we measured, we built a dictionary that told us what oriented lines and motions and textures in the original image actually caused brain activity,” says Jack Gallant, the UC Berkeley neuroscientist who led the study. “That dictionary allows us to translate between things that happen in the world and things that happen in each of the points of the brain we measure.”

2) Test the Translator. The study participants watched yet more video clips, and the team double checked that their dictionary—a statistics-based computer model—worked for the new clips, too.

3) Add More Words to the Dictionary. The researchers wanted a larger database of clips-to-brain-activity translation, so they collected 18 million seconds of video from randomly selected YouTube clips. They then ran the movies through the computer model, generating likely brain activation responses for each second of video.

4) Translate! Initially, the “dictionary” was an encoding model, translating from a movie clip into brain activity. From there, it was a theoretically simple—through practically laborious—endeavor to make a decoding model, based on Bayesian probability, to translate brain activity into a clip. (Think turning an English-French dictionary into a French-English one; you have all the information you need, but there’s a lot of reshuffling to do.) Each subject then watched a new set of second-long video clips they’d never before seen. The computer model selected the 100 clips (from that 18 million seconds of YouTube) that would produce brain activity most similar to the second-long clip the subject had just seen. It then averaged the clips together, hence the blurry quality of the videos. (You can see a video showing all three reconstructions, one made from the brain activity of each subject, here.) If the team had been after clarity, rather than proof of concept, they could’ve made the images at least somewhat crisper, Gallant says, by putting programming muscle into it. They could have set it up so that if 90 of the 100 most similar clips had faces, for instance, it would match up the eyes, nose, and mouth of each face before averaging the videos, leading to a clearer picture.

What’s the Context:

  • This isn’t the first time researchers have looked inside the brain to see what someone else is seeing. A number of scientists, including Gallant, have been working on “neural decoding” (i.e., mind-reading) techniques like this one for over a decade. They’re slowly getting better at decoding what we’ve seen, advancing from distinguishing between types of images (face vs. landscape, for instance) to reconstructing still images to reconstructing moving video clips.
  • Decoding what someone sees is different from decoding what they’re thinking. The researchers were just looking at low-level visual processing (what lines, textures, and movements people saw), not higher-level thought like what the clips reminded them of, whether they recognized the actors, or whether they wanted to see the movies they watched trailers for. Those are far more complicated questions to tease out, and can’t be tracked feature-by-feature as easily as visual processing.
  • fMRI has a built-in time lag; the level of oxygen in the blood doesn’t change unti about 4 seconds after neuron activity, since blood flow is a slow process compared to neurons’ electrical firing. By building specific lag times into their model—not just what part of a clip an area responded to, but how long after the clip the response occurred—the researchers could track brain activity in much closer to real time.

The Future Holds: How Close Are We to Reading Images From Everyone’s Brain?

  • Such brain-decoding technologies may ultimately be helpful for communicating with people who can’t otherwise communicate, due to locked-in syndrome or a similar condition. “I think that’s all possible in the future,” Gallant says, “but who knows when the future’s going to be, right?” Such advances could easily be decades away because of the complex, very specific nature of these models.
  • The brain has between 200 and 500 of functional areas in total, Gallant says, about 75 of which are related to vision—and to translate what’s happening in a new area, you’d need a new dictionary. It’s not just a matter of the time and effort involved in making new models, either; we need to understand the brain better first. Scientists know a lot more about how basic visual processing works than higher-level functions like emotion or memory.

Reference: Shinji Nishimoto, An T. Vu, Thomas Naselaris, Yuval Benjamini, Bin Yu, and Jack L. Gallant. “Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies.”

CATEGORIZED UNDER: Mind & Brain, Top Posts
  • Jen Hawse

    I don’t understand the words that scroll in the videos. Why are they there, what do they represent, is it the brain seeing these words or the models? One says something like, “u gatta…” another refers to the “king” another is in French saying, “tout ce que je…”.

  • Nestor

    Jen, the model reconstructs what it thinks the person is seeing from a library of youtube videos so my guess is that those letters are part of those videos that get carried along, a bit like those ransom notes make out of newspaper clippings.

  • Vasanth BR

    Since the raw brain and the nature of its raw functioning are transparent to race, culture and language variations, once the scientists are able to refine their reverse dictionaries from brain activity to visual image, the same dictionary will work for all persons speaking all different languages.

    However a rider.

    When I see on a screen, a peacock with a spread out white feathers, I not only see the white peacock but also concurrently think in English (say), “That is nice; but, I have not seen a white peacock in any nature park!” If a Spaniard sees the screen, he may think “Eso es agradable; ¡pero, no he visto un pavo real blanco en ningún parque de naturaleza!” How much does the language dependent thinking impact on the pattern of visual brain activity? Hopefully, within a couple of years (has it already been done?) when the thoughts in specific languages can be decoded and input into a loudspeaker, we can not only see what a person sees, but also hear what a person thinks.

    Next is raw primordial emotion, not resting on words of the language. Capture it and along with images and words, input them into another brain.

    Within a decade or two, there will be a need for only one person to experience an event; like climbing Mt Everest, starting from leaving the base camp to the 3 cheers after returning to the base camp. All that is required is to capture and record all the sensory inputs into his brain, along their channels and plug on the recording directly into appropriate locations of the brain of the recipient. The recipient will not be able to distinguish between whether he actually climbed Everest or it was a synthesised input. While the swerves and slips impacting the balancing muscles can be recorded (as they are felt in the brain) and transferred, can the tiredness and fatigue be recorded and transferred? When the climber’s leg muscle gets tired, even though the tiredness is actually felt in the brain, in addition, the leg muscle is physiologically altered. Is it adequate if only the tiredness signal to the brain is transferred to the recipient, without physiologically actually altering his leg muscle?


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!


80beats is DISCOVER's news aggregator, weaving together the choicest tidbits from the best articles covering the day's most compelling topics.

See More

Collapse bottom bar