Learning hierarchical sequence representations across human cortex and hippocampus

Humans experience sensory input continuously as segmented units of words and events. The ability of the brain to discover regularities is known as statistical learning. This concept can be represented at multiple levels including transitional probabilities and the identity of units. In a new report now published on Science Advances, Simon Henin and a team of scientists at the New York University School of Medicine, Yale University and the Max Planck Institute in the U.S. and Germany recorded sequence encoding in the cortex and hippocampus of human subjects exposed to auditory and visual sequences with temporal (time-based) regularities. Using early processing, they tracked lower-level features such as syllables and learned units including words, while later processing could only track learning units. The findings showed the existence of multiple parallel computational systems in humans to assist learning across organized cortico-hippocampal units.

Understanding the code of speech

We receive and experience continuous input from the world in digestible chunks. For example, with language, humans can acquire and extract meaningful sequences including sentences, words and phrases from a continuous stream of sounds without clear acoustic boundaries or pauses between linguistic elements. This segmentation happens incidentally and effortlessly as a core building block during development. The behavior of learning transitional probabilities between syllables or shapes in infants or adults are known as “statistical learning”. However, the mechanism of the brain supporting such cognitive functions are poorly understood. It is well known for brain regions such as the hippocampus and the inferior frontal gyrus (IFG) to aid in visual and auditory statistical learning. To understand this process, Henin et al. conducted intracranial recordings from 23 human epilepsy patients to provide mechanistic insight into the fundamental process of human learning relative to cortical areas that respond to the structure of the world. The findings highlighted neural frequency tagging (NFT) as a versatile tool to investigate incidental learning in preverbal and nonverbal patient populations.

Behavioral evidence of auditory statistical learning

Henin et al. studied the neural circuits and computation underlying statical learning by presenting 17 participants with auditory streams of syllables after manipulating the structure of the sequence. The team placed each syllable into the first, second and third position of a three-syllable word or a triplet in such structured streams. The resulting transitional probabilities were low and uniform without a word level of segmentation. During the auditory tasks, they generated 12 consonant-vowel syllables using MacTalk and concatenated them using MATLAB software to create two sequences: a structured and random word sequence. In the structured sequence, Henin et al. manipulated the transitional probabilities between syllables so that four hidden words could be embedded in sequence to create a continuous artificial language stream. They represented the underlying syllable presentation rate at 4 Hz and the word rate at 1.33 Hz. The team did not inform the participants of the structure but asked them to perform a cover task instead, where they indicated syllable repetitions randomly embedded in the auditory streams.

Neural tracking of auditory statistical learning

Henin et al. obtained direct neurophysiological signals from 1898 intracranial electrodes in 17 participants to comprehensively cover the frontal parietal, occipital and temporal lobes as well as the hippocampus in both hemispheres. The participants performed a two-alternative forced choice (2AFC) task where they listened to the two audio segments presented one after the other to select the stream containing one of the hidden words. The scientists noted the responses to originate predominantly in somatosensory/motor and temporal cortices. On average, they noted significantly increased word-rate coherence in the structured stream but not in the random stream, to support the sensitive and robust applications of NFT (neural frequency tagging) to assess online statistical learning. Using NFT, they tracked the representation of segmented units at two hierarchical levels of the stream and then tested the within-electrode phase coherence in the field potential and gamma band in the respective structured and random streams. Using electrocorticography, they showed the location of both words and syllable coherence to have occurred mainly in the superior temporal gyrus (STG) with smaller clusters in the motor cortex and pars opercularis. In parallel, the other tuning profile reflected electrodes with significant coherence exclusively at the word rate only with locations in the inferior frontal gyrus and the anterior temporal lobe (ATL). The anatomical grouping highlighted the neuroanatomy of the auditory processing hierarchy.

Analyzing auditory statistical learning and testing visual statistical learning.

To understand the results of neural frequency tagging (NFT), Henin et al. examined the segmentation driving the outcome, and based this on three statistical cues in the stream; including (1) transitional probabilities, (2) ordinal position or (3) word identity to facilitate unique cognitive functions. As with auditory statistical learning tasks, the team performed visual statistical learning tasks with the patient groups, where the team formed fractals using similar sets of images as those used in previous work. As before, the participants were not informed of the structure, but they performed a cover task. Henin et al. then used NFT to identify the brain areas exhibiting statistical learning in neurophysiological recordings from 1606 intracranial electrodes in 12 patients to cover the frontal, parietal temporal and occipital cortex. As with auditory statistical learning, they observed anatomical and hierarchical segregation between two temporal tuning profiles of electrodes, where one showed significant entrainment at the fractal and pair rates—mostly clustered in the occipital and parietal cortex, while the other showed significant entrainment of pair only rates, in the frontal, parietal and temporal cortex.


In this way, Simon Henin and colleagues used intracranial recordings in humans to describe how the brain tracks and learns structure within sensory information. The statistical learning process accompanied rapid changes in neural representations reflected in two functionally and anatomically distinct brain responses. These distinct responses revealed an anatomical hierarchy, which they mapped into early sensory processing stages in the superior temporal gyrus and occipital cortex. The team also mapped late, amodal processing stages in the inferior frontal gyrus and anterior temporal lobe. The patients extracted and represented nested structures within sensory streams in the brain in as little as two minutes, even when they were not aware of the process.

The work agreed with previous studies to demonstrate how the cortical hierarchy integrated information across seemingly longer windows of time. The neural frequency tagging (NFT) technique provided an exciting opportunity to characterize learning trajectories across clinical and healthy populations across sensory modalities, to track the acquisition of knowledge across the lifespan from newborns to the elderly. By combining NFT with representational similarity analysis (RSA), the team provided a powerful toolkit to reveal how the brain engaged in statistical learning across multiple levels of organization within the human brain.