Brain Recordings Capture Musicality of Speech
Neuroscientists decode song from brain recordings, revealing areas dealing with rhythm and vocals
As the chords of Pink Floyd's “Another Brick in the Wall, Part 1,” filled the hospital suite, neuroscientists at Albany Medical Center diligently recorded the activity of electrodes placed on the brains of patients being prepared for epilepsy surgery.
The goal? To capture the electrical activity of brain regions tuned to attributes of the music — tone, rhythm, harmony and words — to see if they could reconstruct what the patient was hearing.
More than a decade later, after detailed analysis of data from 29 such patients by neuroscientists at the University of California, Berkeley, the answer is clearly yes.
The phrase "All in all it was just a brick in the wall" comes through recognizably in the reconstructed song, its rhythms intact, and the words muddy, but decipherable. This is the first time researchers have reconstructed a recognizable song from brain recordings.
The reconstruction shows the feasibility of recording and translating brain waves to capture the musical elements of speech, as well as the syllables. In humans, these musical elements, called prosody — rhythm, stress, accent and intonation — carry meaning that the words alone do not convey.
Because these intracranial electroencephalography (iEEG) recordings can be made only from the surface of the brain — as close as you can get to the auditory centers — no one will be eavesdropping on the songs in your head anytime soon.
But for people who have trouble communicating, whether because of stroke or paralysis, such recordings from electrodes on the brain surface could help reproduce the musicality of speech that's missing from today's robot-like reconstructions.
"It's a wonderful result," said Robert Knight, a neurologist and UC Berkeley professor of psychology in the Helen Wills Neuroscience Institute who conducted the study with postdoctoral fellow Ludovic Bellier. "One of the things for me about music is it has prosody and emotional content. As this whole field of brain machine interfaces progresses, this gives you a way to add musicality to future brain implants for people who need it, someone who's got ALS or some other disabling neurological or developmental disorder compromising speech output. It gives you an ability to decode not only the linguistic content, but some of the prosodic content of speech, some of the affect. I think that's what we've really begun to crack the code on."
As brain recording techniques improve, it may be possible someday to make such recordings without opening the brain, perhaps using sensitive electrodes attached to the scalp. Currently, scalp EEG can measure brain activity to detect an individual letter from a stream of letters, but the approach takes at least 20 seconds to identify a single letter, making communication effortful and difficult, Knight said.
"Noninvasive techniques are just not accurate enough today. Let's hope, for patients, that in the future we could, from just electrodes placed outside on the skull, read activity from deeper regions of the brain with a good signal quality. But we are far from there," Bellier said.
Bellier, Knight and their colleagues reported the results today in the journal PLOS Biology, noting that they have added "another brick in the wall of our understanding of music processing in the human brain."
Reading your mind? Not yet.
The brain machine interfaces used today to help people communicate when they're unable to speak can decode words, but the sentences produced have a robotic quality akin to how the late Stephen Hawking sounded when he used a speech-generating device.
"Right now, the technology is more like a keyboard for the mind," Bellier said. "You can't read your thoughts from a keyboard. You need to push the buttons. And it makes kind of a robotic voice; for sure there's less of what I call expressive freedom."
Bellier should know. He has played music since childhood — drums, classical guitar and piano, at one point performing in a heavy metal band. When Knight asked him to work on the musicality of speech, Bellier said, "You bet I was excited when I got the proposal."
In 2012, Knight, postdoctoral fellow Brian Pasley and their colleagues were the first to reconstruct the words a person was hearing from recordings of brain activity alone.
More recently, other researchers have taken Knight's work much further. Eddie Chang, a UC San Francisco neurosurgeon and senior co-author of the 2012 paper, has recorded signals from the motor area of the brain associated with jaw, lip and tongue movements to reconstruct the speech intended by a paralyzed patient, with the words displayed on a computer screen.
That work, reported in 2021, employed artificial intelligence to interpret the brain recordings from a patient trying to vocalize a sentence based on a set of 50 words.
While Chang's technique is proving successful, the new study suggests that recording from the auditory regions of the brain, where all aspects of sound are processed, can capture other aspects of speech that are important in human communication.
"Decoding from the auditory cortices, which are closer to the acoustics of the sounds, as opposed to the motor cortex, which is closer to the movements that are done to generate the acoustics of speech, is super promising," Bellier added. "It will give a little color to what's decoded."
For the new study, Bellier reanalyzed brain recordings obtained in 2008 and 2015 as patients were played the approximately 3-minute Pink Floyd song, which is from the 1979 album The Wall. He hoped to go beyond previous studies, which had tested whether decoding models could identify different musical pieces and genres, to actually reconstruct music phrases through regression-based decoding models.
Bellier emphasized that the study, which used artificial intelligence to decode brain activity and then encode a reproduction, did not merely create a black box to synthesize speech. He and his colleagues were also able to pinpoint new areas of the brain involved in detecting rhythm, such as a thrumming guitar, and discovered that some portions of the auditory cortex — in the superior temporal gyrus, located just behind and above the ear — respond at the onset of a voice or a synthesizer, while other areas respond to sustained vocals.
The researchers also confirmed that the right side of the brain is more attuned to music than the left side.
"Language is more left brain. Music is more distributed, with a bias toward right," Knight said.
"It wasn't clear it would be the same with musical stimuli," Bellier said. "So here we confirm that that's not just a speech-specific thing, but that's it’s more fundamental to the auditory system and the way it processes both speech and music."
Knight is embarking on new research to understand the brain circuits that allow some people with aphasia due to stroke or brain damage to communicate by singing when they cannot otherwise find the words to express themselves.
Written by Robert Sanders for UC Berkeley News
Photos by Bellier et al. & UC Berkeley