Parallel and distributed encoding of speech across human auditory cortex

Page created by Fred Banks
 
CONTINUE READING
Parallel and distributed encoding of speech across human auditory cortex
Article

Parallel and distributed encoding of speech across
human auditory cortex
Graphical abstract                                               Authors
                                                                 Liberty S. Hamilton, Yulia Oganian,
                                                                 Jeffery Hall, Edward F. Chang

                                                                 Correspondence
                                                                 edward.chang@ucsf.edu

                                                                 In brief
                                                                 In the human auditory cortex, the
                                                                 encoding of acoustic and phonetic
                                                                 information of speech is distributed
                                                                 across areas, and the information
                                                                 processing is parallel.

Highlights
d   We recorded intracranial signals in human primary and
    nonprimary auditory cortex

d   A superior temporal gyrus onset zone activates parallel to
    primary auditory areas

d   Stimulation of superior temporal gyrus impairs speech
    perception

d   Stimulation of primary auditory cortex does not affect
    speech perception

           Hamilton et al., 2021, Cell 184, 1–14
           September 2, 2021 ª 2021 Elsevier Inc.
           https://doi.org/10.1016/j.cell.2021.07.019                                           ll
Parallel and distributed encoding of speech across human auditory cortex
Please cite this article in press as: Hamilton et al., Parallel and distributed encoding of speech across human auditory cortex, Cell (2021),
 https://doi.org/10.1016/j.cell.2021.07.019

                                                                                                                                  ll

Article
Parallel and distributed encoding
of speech across human auditory cortex
Liberty S. Hamilton,1,3 Yulia Oganian,1 Jeffery Hall,2 and Edward F. Chang1,4,*
1Department    of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
2Department    of Neurology and Neurosurgery, McGill University Montreal Neurological Institute, Montreal, QC, H3A 2B4, Canada
3Present address: Department of Speech, Language and Hearing Sciences, Department of Neurology, The University of Texas at Austin,

Austin, TX 78712, USA
4Lead contact

*Correspondence: edward.chang@ucsf.edu
 https://doi.org/10.1016/j.cell.2021.07.019

SUMMARY

Speech perception is thought to rely on a cortical feedforward serial transformation of acoustic into linguistic
representations. Using intracranial recordings across the entire human auditory cortex, electrocortical stim-
ulation, and surgical ablation, we show that cortical processing across areas is not consistent with a serial
hierarchical organization. Instead, response latency and receptive field analyses demonstrate parallel and
distinct information processing in the primary and nonprimary auditory cortices. This functional dissociation
was also observed where stimulation of the primary auditory cortex evokes auditory hallucination but does
not distort or interfere with speech perception. Opposite effects were observed during stimulation of nonpri-
mary cortex in superior temporal gyrus. Ablation of the primary auditory cortex does not affect speech
perception. These results establish a distributed functional organization of parallel information processing
throughout the human auditory cortex and demonstrate an essential independent role for nonprimary audi-
tory cortex in speech processing.

INTRODUCTION                                                             geneous neural response types, leaving fundamental questions
                                                                         about speech sound representations unanswered.
Speech contains myriad acoustic cues that give rise to the rich             Understanding sound representations in the brain has been
experience of spoken language. Listening to speech activates             advanced by the high spatiotemporal resolution of intracranial re-
populations of neurons in multiple functional regions of the audi-       cordings (Berezutskaya et al., 2017; Chang, 2015; Flinker et al.,
tory cortex, including primary and nonprimary cortical fields. The       2011; Howard et al., 2000; Lachaux et al., 2012; Nourski et al.,
computations carried out in these areas allow for extraction of          2012; Ozker et al., 2017). The human auditory cortex lies across
meaningful linguistic content, such as consonants and vowels,            an expansive surface of the temporal lobe, including the superior
or prosodic rhythm and intonation from the spectrotemporal               temporal plane and the adjacent superior temporal gyrus (STG;
acoustics of speech.                                                     Figure 1) on its lateral aspect. STG is exposed on the lateral tem-
   Many previous functional neuroimaging studies (e.g., func-            poral lobe and thus accessible by surface electrocorticography
tional magnetic resonance imaging [fMRI] and magnetoenceph-              (ECoG) recording methods. In contrast, most of the temporal
alography [MEG]) suggest a progressive selectivity for more              plane, including the tonotopic ‘‘auditory core’’ on the posterome-
complex sound types from primary toward nonprimary, higher-              dial portion of Heschl’s gyrus (HG), is buried deep within the lateral
order areas (Brodbeck et al., 2018; Chevillet et al., 2011; de           fissure, which separates the temporal lobe from the frontal and
Heer et al., 2017; Rauschecker and Scott, 2009; Wessinger                parietal lobes (Brewer and Barton, 2016). These anatomical con-
et al., 2001). For example, stronger responses are found in              straints make the temporal plane difficult to access for neurophys-
high-order auditory areas to complex, natural stimuli, such as           iological recordings. Therefore, functional parcellations beyond
speech and music, than simple artificial stimuli, such as pure           the core and their relation to anatomical landmarks are not well
tones (Leaver and Rauschecker, 2010; Norman-Haignere et al.,             understood. In the current study, the high temporal resolution of
2015; Schönwiesner and Zatorre, 2009). These results are typi-          ECoG allows us to probe the extent to which a serial hierarchy
cally interpreted in the framework of an anatomical-hierarchical         from primary auditory cortex on HG to the STG is consistent
model, where simple acoustic features are transformed into               with data from speech and tone listening. The high spatial resolu-
more complex and speech-specific representations across the              tion similarly provides a more complete picture of how different
auditory cortex. However, the limited spatial or temporal resolu-        functional and anatomical regions in the human auditory cortex
tion of noninvasive imaging precludes mapping of locally hetero-         interact.

                                                                                  Cell 184, 1–14, September 2, 2021 ª 2021 Elsevier Inc. 1
Parallel and distributed encoding of speech across human auditory cortex
Please cite this article in press as: Hamilton et al., Parallel and distributed encoding of speech across human auditory cortex, Cell (2021),
 https://doi.org/10.1016/j.cell.2021.07.019

            ll
                                                                                                                                               Article

Figure 1. Anatomical parcellations of temporal lobe regions of the human auditory cortex and electrode coverage
(A) Anatomical regions of interest on the left hemisphere temporal lobe of an example participant. STG = superior temporal gyrus, MTG = middle temporal gyrus.
(B) Electrode counts across anatomical areas for all nine participants.
(C) Comparison between onset-only and spectrotemporal models shows a population described by only a singular onset feature.
(D) All participants’ electrodes projected onto an Montreal Neurological Institute (MNI) atlas brain (cvs_avg35_inMNI152). Electrode size reflects the maximum
amount of variance (R2) explained by the encoding models tested in our analyses. Electrode sites are colored according to their anatomical location.
See also Figures S1 and S6 and Table S2.

   While recent advances in human intracranial recordings have                  Here, we define pSTG as the portion of the STG posterior to the
revealed specialization for different speech sound features in                  lateral exit point of the transverse temporal sulcus (Friederici,
lateral STG, whether similar feature representations exist on                   2015; Upadhyay et al., 2008). Recordings from temporal plane
the superior temporal surface, including core auditory cortex,                  typically use penetrating depth electrodes, which may capture
and whether STG responses to non-speech reflect the same                        the long axis of HG, but do not have the uniform and widespread
computational principles remain open questions. Such an anal-                   cortical surface coverage offered by electrode grids (Brugge
ysis requires sampling of neural responses to natural speech                    et al., 2003; Griffiths et al., 2010; Nourski et al., 2014; Steinsch-
and experimentally controlled simple sound stimuli across all                   neider et al., 2014). Few studies also record from the PT and PP,
cortical auditory areas simultaneously. Critically, a comprehen-                so these areas remain relatively understudied (Besle et al., 2008;
sive map of feature encoding across primary and higher-order                    Griffiths and Warren, 2002; Bidet-Caulet et al., 2007; Liégeois-
auditory areas is prerequisite to a meaningful evaluation of                    Chauvel et al., 1999). Placing grid electrodes in this area requires
models of information flow and transformations of cortical                      meticulous surgical dissection of the Sylvian fissure and is only
representations.                                                                possible in cases of opercular/insular surgery (Bouthillier et al.,
   Here, we simultaneously recorded neural activity from multiple               2012; Malak et al., 2009).
subfields of the human temporal lobe auditory cortex using high-                   We acquired intracranial recordings from 636 electrode sites
density electrode grids. Neurophysiological recordings allowed                  in the left temporal plane and STG in nine participants, including
us to determine the flow of information processing and how                      both grid and depth electrodes (Figures 1A, 1B, and S1), as par-
cues in the speech signal are mapped across the auditory cor-                   ticipants listened to speech and pure tone stimuli. As in previous
tex. Instead of a simple serial hierarchy, we found evidence for                work (Hamilton et al., 2018), we identified an onset-specific re-
distributed and parallel processing, where early latency re-                    gion using non-negative matrix factorization (NMF; black elec-
sponses were observed throughout the posterior temporal plane                   trodes in Figures 1C and 1D; n = 15 electrodes, 13 of which
and STG. Furthermore, direct focal electrocortical stimulation                  were located in anatomically defined pSTG area; see STAR
(ECS) and an ablation case study provide evidence that HG is                    Methods). This onset-selective region, which exhibited strong
neither necessary nor sufficient for speech perception.                         transient responses at the onset of sentences followed by rela-
                                                                                tive quiescence, was observed only in a localized region of the
RESULTS                                                                         lateral STG and not on the superior temporal plane.

The anatomical divisions of the human auditory cortex include                   Topography of response latencies for information
the planum temporale (PT), HG (or transverse temporal gyrus),                   processing
and planum polare (PP) on the superior temporal plane (Hickok                   The classical hierarchical model of speech perception assumes
and Saberi, 2012; Moerel et al., 2014) and posterior STG                        that sound information is first received in the primary auditory
(pSTG) and middle STG (mSTG) on its lateral surface (Figure 1A).                cortex on HG and then transformed into more complex

2 Cell 184, 1–14, September 2, 2021
Parallel and distributed encoding of speech across human auditory cortex
Please cite this article in press as: Hamilton et al., Parallel and distributed encoding of speech across human auditory cortex, Cell (2021),
  https://doi.org/10.1016/j.cell.2021.07.019

                                                                                                                                                       ll
Article

Figure 2. The onset of fast-latency responses in the pSTG is indistinguishable from the onset of responses in primary auditory areas
(A) Z-scored high-gamma-amplitude (HGA) responses during speech listening for two example sentences, split by single electrodes in each region of interest (ROI)
and ordered by average latency. Response latencies are marked as dashed lines and were measured as the maximum derivative of the high-gamma response.
(B) The high-gamma-derived latencies for each electrode across all participants on an atlas brain. PT, pmHG, and pSTG onset electrodes are outlined in black.
(C) Comparison of onset latencies across brain regions. Only latencies
Parallel and distributed encoding of speech across human auditory cortex
Please cite this article in press as: Hamilton et al., Parallel and distributed encoding of speech across human auditory cortex, Cell (2021),
 https://doi.org/10.1016/j.cell.2021.07.019

            ll
                                                                                                                                                Article

representations via corticocortical connections with the lateral                 hierarchy (see Figure 3A for an example of electrode coverage).
STG. To test this assumption, we assessed the relative timing                    Figures 3C and 3D shows example tone- and speech-derived
of responses to speech across the auditory cortex. Taking                        receptive fields from representative electrodes on the temporal
advantage of simultaneous recordings in multiple participants,                   plane and lateral STG. We derived these speech receptive fields
we performed a latency analysis on trial-averaged high-gamma                     using a nonlinear maximally informative dimensions (MID)
data within each region of interest (ROI) and then used a                        method, since this was shown to improve performance over a
cross-correlation analysis to determine lead-lag relationships.                  linear ridge spectrotemporal receptive field (STRF) model in
We saw fast-latency responses to sentences in posteromedial                      our current data and in previous datasets (Hullett et al., 2016).
HG (pmHG), pSTG onset, and PT electrodes, and longer latency                     Similar results were observed using a linear ridge STRF model
responses in anterolateral HG (alHG), pSTG non-onset, mSTG,                      (Figure S2A).
and PP electrodes (Figures 2A and 2B). Average high-gamma re-                       We found clear evidence for opposing gradients for speech
sponses within each ROI showed short, primary-like latencies in                  and tone response magnitudes from medial to lateral areas (Fig-
pSTG onset, PT, and pmHG, as compared to slower responses                        ure 3B; Table S1). Overall, responses to speech increased from
anterolaterally both on the temporal plane (alHG, PP) and lateral                medial to lateral regions, whereas responses to tones decreased
STG (pSTG non-onset, mSTG) (Figure 2C, main effect of ROI: p =                   (Figures 3F, S3, and S4). This difference in selectivity is consis-
4.4 3 1016, degrees of freedom [df] = 6). Latencies did not                     tent with previous reports of selective preference for complex/
significantly differ between pSTG onset, pmHG, and PT sites                      speech stimuli in STG and strong tuning to pure tone frequencies
(p > 0.05, post-hoc Wilcoxon rank sum tests, Figure 2C), but                     on the superior temporal plane (Binder et al., 2000; Démonet
were significantly different between all other areas (p < 0.05, Wil-             et al., 1992; Leaver and Rauschecker, 2010, 2016; Nourski
coxon rank sum tests).                                                           et al., 2012; Steinschneider et al., 2013, 2014).
   When comparing the lead-lag relationships between simulta-                       Despite confirmation of simple representations in core audi-
neously recorded sites in each participant, we found evidence                    tory cortex and more complex representations laterally, our re-
of strong coactivation of pmHG and pSTG onset electrodes,                        sults in the pSTG did not align entirely with evidence for a trans-
with a maximum cross-correlation at 0 lag for these areas (Fig-                  formation from medial to lateral areas (and from simple to
ure 2D). We saw parallel lag times between PT and pSTG onset,                    complex). For example, we found significantly more pure-tone
pmHG and pSTG onset, and alHG and pSTG other. The fast re-                       receptive fields within pSTG onset sites than in surrounding
sponses in PT, pmHG, and pSTG onset preceded non-onset re-                       pSTG non-onset sites. However, these representations them-
gions of the pSTG and mSTG as well as PP (Figures 2D and 2E),                    selves were not ‘‘simple’’ like the strong, single-peaked, V-
supporting a posterior to anterior latency gradient. The latency                 shaped classical receptive fields observed in PT and HG (Figures
and lag analysis suggest that the posterior STG and posterior                    3C and 3G). In fact, most of the pure-tone receptive fields (RFs) in
temporal plane may have separate, parallel inputs. While this la-                pSTG onset areas were complex and multi-peaked (Figure 3G).
tency analysis places constraints on information flow, it is strictly            The proportion of non-tone-responsive, single, and multi-
correlational, so it cannot definitively prove transfer of informa-              peaked pure-tone RFs differed significantly across all seven
tion from one area to another. Thus, we address causality in sub-                anatomical areas (c2 = 105.7, df = 12, p = 4.3 3 1017). Despite
sequent stimulation experiments and in a related ablation                        fast response latencies, representations in the pSTG onset area
case study.                                                                      differed from both narrow-frequency tuning in posteromedial
                                                                                 temporal plane and the preference for complex speech stimuli
Dissociation of tone and speech encoding in medial and                           in surrounding lateral STG.
lateral auditory cortex
Part of the argument for serial processing hierarchies in the audi-              Tuning for pure tones does not predict responses to
tory system is that tuning for simple features such as tones may                 speech outside of the core
then be combined at higher stages of the pathway to generate                     We next asked whether frequency tuning curves were similar for
more complex representations (e.g., phonetic) (Okada et al.,                     speech, since previous studies have shown that responses to
2010). To probe this, we used both simple pure tones as well                     simple, synthesized stimuli may not predict responses to more
as natural speech sentences as stimuli. We wanted to determine                   complex, natural stimuli (Hamilton and Huth, 2020; Portfors
which areas were more responsive to speech and whether re-                       et al., 2009; Schneider and Woolley, 2011; Theunissen et al.,
sponses to tones could predict responses to the spectral cues                    2000, 2001). By directly comparing responses to pure tones
in speech. Pure tones are typically used to identify tonotopic gra-              and English sentences in the same electrodes, we found a signif-
dients associated with auditory ‘‘core’’ regions, as they robustly               icant difference between core and the lateral auditory areas. Fre-
activate frequency-specific regions across the auditory pathway                  quency tuning for speech and tones was highly similar in PT and
(Brewer and Barton, 2016; Moerel et al., 2012; Wessinger et al.,                 pmHG but diverged for other areas (Figure 3H). Electrodes in PT
1997, 2001). Combining tone and speech stimuli within the same                   showed overlapping narrow-band responses to speech and
patient group allowed us to compare the representations of sim-                  tones (Figure 3E, electrodes 3 and 4), while electrodes in
ple and complex sound stimuli throughout the auditory cortical                   mSTG and pSTG show very different tuning curves depending

(E) Lag-correlation-based connections shown schematically for each of the seven ROIs. Arrows point from the leading ROI to the lagging ROI, color indicates the
time delay (lag), and width of the arrow indicates the strength of the cross-correlation. Latency patterns suggest parallel information processing in pSTG onset
area and posteromedial temporal plane.

4 Cell 184, 1–14, September 2, 2021
Parallel and distributed encoding of speech across human auditory cortex
Please cite this article in press as: Hamilton et al., Parallel and distributed encoding of speech across human auditory cortex, Cell (2021),
 https://doi.org/10.1016/j.cell.2021.07.019

                                                                                                                                              ll
Article

Figure 3. Regional selectivity for speech and pure tones; divergence of tuning curves to simple and complex acoustic inputs in the human
auditory cortex
(A) Example temporal plane and lateral temporal cortical grids for one participant. Inset shows the whole brain.
(B) Comparison between normalized response magnitudes to speech and pure tone stimuli across all participants on atlas brain. Electrodes are colored ac-
cording to the normalized magnitude of the pure-tone response (blue) and speech response during sentence listening (red). Purple indicates mixed selectivity.
(C) Pure-tone receptive fields for electrodes shown in (A).
(D) Speech spectrotemporal receptive fields using maximally informative dimensions (MID1 STRFs) for electrodes in (A).
(E) Comparison of normalized pure-tone and speech tuning curves from sites in (C) and (D).
(F) Maximum pure-tone response and speech responses by area.

                                                                                                                          (legend continued on next page)

                                                                                                                 Cell 184, 1–14, September 2, 2021 5
Parallel and distributed encoding of speech across human auditory cortex
Please cite this article in press as: Hamilton et al., Parallel and distributed encoding of speech across human auditory cortex, Cell (2021),
 https://doi.org/10.1016/j.cell.2021.07.019

            ll
                                                                                                                                                 Article

on the stimulus type (Figure 3E, electrodes 5–7). The correlation                 ral speech features and relative pitch to the STG, with onsets
between pure tone and speech frequency tuning was signifi-                        again confined to a zone in pSTG, whereas PT and HG were
cantly different across anatomical regions (Kruskal-Wallis                        dominated by representations of spectrotemporal sound acous-
ANOVA, c2 = 12.61, df = 6, p = 0.049). Post-hoc tests using Tu-                   tics and absolute pitch (Figure 4C).
key HSD correction for multiple comparisons showed that corre-                       We compared the full spectrotemporal model to smaller
lations between PT speech and tone receptive fields were higher                   models that contain binary speech features but no information
than PP (p = 0.03). A similar relationship was observed when                      about the precise spectrotemporal structure of the speech stim-
comparing the correlation between frequency tuning from the                       ulus. Comparing a reduced model containing only an onset pre-
pure-tone RF and linear model filters (Figure S2), though with                    dictor to a full spectrotemporal model, we found that the addi-
these models, pmHG tuning for tones and speech was more                           tional spectrotemporal information did not improve model fits
similar than the nonlinear model.                                                 for a subset of electrodes, located in posterior STG (Figure 4D),
   These findings demonstrate that speech responses in core                       in line with our unsupervised NMF analysis to uncover onset
areas can be well predicted from their responses to pure tones,                   selectivity (Figures 1D and S6). Notably, none of these elec-
whereas outside the core, they cannot and are likely tuned to                     trodes were located in PP or HG. A similar comparison showed
more complex combinations of spectral features. Crucially, the                    that for a group of electrodes in lateral STG, a temporal and
pSTG onset area, while weakly responsive to tones at fast la-                     phonological feature model (including binary onsets, peakRate
tencies, does not share similar stimulus representations to the                   and phonological feature predictors) outperformed the full spec-
auditory core, again corroborating a distinct role in sound pro-                  trotemporal model (Figures 4A and 4E). We combined phonolog-
cessing. That is, the pSTG onset area is not simply another A1-                   ical features and peakRate together in this step, as more detailed
like area, despite its fast latency. On the contrary, there are mul-              model comparisons showed these features were represented in
tiple very fast response areas, some of which are tonotopic and                   mostly overlapping sets of electrodes.
some of which are onset driven and not tonotopic. These find-                        Feature-tuned electrodes were located predominantly in mid-
ings suggest that there are fundamentally different representa-                   dle STG and not on the temporal plane (Figures 4C and 4E), sup-
tions in the lateral STG compared to medial HG/PT. While this                     porting the role of lateral STG in speech processing. To identify
has traditionally been interpreted as evidence of hierarchical                    encoding of pitch features, we next tested whether the addition
processing, another possibility, supported by our earlier latency                 of pitch features to the onset, phonological feature, and peak-
analysis, is that these areas process different information and                   Rate model would improve model performance. We observed
receive parallel inputs.                                                          a clear separation between absolute pitch encoding on the
                                                                                  medial temporal plane (specifically HG and PT) and relative pitch
Anatomical separation of pitch, phonetic, and onset                               encoding mostly in mid to anterior STG, alHG, and PP (Figures
information in the auditory cortex                                                4C, 4F, and S4). Complex representations of onsets, phonetic
While the tone and speech comparisons showed greater selec-                       features, envelope, and relative pitch are specific to the lateral
tivity for speech in the lateral STG, the features of speech that are             STG. In contrast, simple representations, including absolute
being encoded there are not clear, especially since they were                     pitch, were predominantly encoded on HG and PT.
poorly predicted by pure-tone responses. We therefore per-
formed an encoding model analysis that focused on acoustic                        Focal ECS reveals double dissociation of regions critical
and phonetic features of speech. Most of these features have                      for speech and non-speech processing
been observed in the STG, but whether these cues are also rep-                    The neurophysiological recordings so far suggest a distributed
resented on the temporal plane has been controversial.                            encoding of sound properties of speech, with evidence of early
   We fit models describing tuning for spectrotemporal features                   independent processing in both HG and the onset zone of the
(Figure 2D) as well as mixed-feature representations, as de-                      STG. These results are not consistent with the classical model
picted in Figure 4A. This analysis revealed that neural popula-                   of simple serial cortical processing hierarchy for speech, from
tions captured by single electrodes selectively encode only                       the core auditory cortex to the surrounding belt and parabelt
some, but not other, features in the speech signal, with a wide di-               auditory cortex. To causally evaluate the hypothesis of parallel
versity of responses across the human auditory cortex. Figure 4B                  processing in these areas, we used focal ECS on the medial
shows receptive fields for four example electrodes in different                   and lateral parts of the auditory cortex in seven participants (Fe-
participants. E1 encodes mainly phrase-level temporal structure                   noy et al., 2006; Leonard et al., 2016; Sinai et al., 2009). We also
cues by speech onsets, E2 encodes mainly syllable-level tempo-                    investigated differences in artificially evoked sound patterns in
ral structure (peakRate) and phonetic features. In contrast, abso-                HG and STG. A cortical feedforward serial model would predict
lute-pitch features explained significant variance in E3, while E4                that stimulation in either region would distort, modulate, or inter-
encoded relative-pitch features. Across participants, this anal-                  rupt the perception of spoken words, perhaps at different levels
ysis revealed a functional and anatomical localization of tempo-                  of representation. In contrast, if lateral and medial areas are part

(G) Percentage of sites with significant receptive fields by anatomical area, as measured by significance of within receptive field (RF) responses as compared to
outside RF. Percentages are split into single- versus multi-peaked RF.
(H) Correlation between frequency tuning for pure tones (as in D) to frequency tuning for speech (from MID1-STRF, as in E). Boxplot boxes show 25th and 75th
percentiles and the median. Whiskers show extreme non-outlier values. * = p-value
Parallel and distributed encoding of speech across human auditory cortex
Please cite this article in press as: Hamilton et al., Parallel and distributed encoding of speech across human auditory cortex, Cell (2021),
 https://doi.org/10.1016/j.cell.2021.07.019

                                                                                                                                                   ll
Article

Figure 4. Regional selectivity for speech features
(A) Speech features tested in feature model comparisons for an example sentence.
(B) Receptive fields for example electrodes. The feature with maximal unique R2 is indicated for each electrode. Electrodes were chosen for which distinct sets of
features explain a large portion of variance. For example, the best model for electrode 1 includes onset, peakRate, and phonetic features, but 50% of overall
explained variance is attributed to the onset predictor.
(C) Location of electrodes primarily coding for speech onsets, phonetic features and peakRate, relative pitch, and absolute pitch. Electrodes in (B) are circled.
(D and E) Onset (D) and feature-encoding electrodes (E) are defined as those for which the respective model outperforms a spectrotemporal model.
(F) Anatomical distribution of electrodes coding for different features. Absolute pitch dominates representations in PT and pmHG, and relative pitch is primarily
represented in PP, alHG, and STG (c2 = 62.5, p < 109). Orange = both relative and absolute pitch contribute unique variance, permutation p < 0.05. Onset-
encoding electrodes are mostly located in pSTG, whereas phonological features and peakRate are represented in posterior and anterior lateral STG.
Related to Figure S5.

                                                                                                                     Cell 184, 1–14, September 2, 2021 7
Parallel and distributed encoding of speech across human auditory cortex
Please cite this article in press as: Hamilton et al., Parallel and distributed encoding of speech across human auditory cortex, Cell (2021),
  https://doi.org/10.1016/j.cell.2021.07.019

             ll
                                                                                                                                             Article

                                                                                       ral plane (20 electrode sites; Figure 5A, white electrodes), but
                                                                                       not by stimulation on lateral STG (17 electrode sites; Figure 5A,
                                                                                       red electrodes). The sound percept was described as occurring
                                                                                       near the contralateral ear, starting at very low stimulation
                                                                                       thresholds (1–2 mA, 50 Hz). During HG stimulation, participants
                                                                                       described sounds like ‘‘running water,’’ ‘‘moving gravel,’’ ‘‘fan
                                                                                       sound,’’ ‘‘tapping,’’ and ‘‘buzzing.’’ Further inquiry led to de-
                                                                                       scriptions like ‘‘fast modulation of sound,’’ ‘‘like waves of
                                                                                       sound,’’ or ‘‘b-b-b-b.’’ Hallucinations were reliably evoked
                                                                                       with every trial of stimulation (applied 5–25 times at each
                                                                                       site). We further characterized the effects of different stimula-
                                                                                       tion parameters on a subset of sites, as permitted by clinical
                                                                                       protocol (11 sites). When current amplitude was increased by
                                                                                       1–2 mA, participants reported ‘‘louder’’ sounds at every cortical
                                                                                       site where sound could be evoked. When current frequency
                                                                                       was increased from 50 Hz to 100 Hz, they also reported louder
                                                                                       and ‘‘higher-pitch sounds.’’ One participant reported hearing
                                                                                       two tones during 100-Hz stimulation, one at the original pitch
                                                                                       when stimulated with 50 Hz and an additional, superimposed
                                                                                       higher-pitch tone. The converse was observed when we
                                                                                       reduced the current amplitude by 1 mA or frequency to
                                                                                       10 Hz, with participants reporting ‘‘more quiet now’’ and
                                                                                       ‘‘slower’’ sounds. When stimulation duration was changed to
                                                                                       5 s, participants could accurately report the start and end sug-
                                                                                       gesting a sustained evoked perception (two sites tested).
                                                                                          To assess how stimulation affected speech perception, ECS
                                                                                       was synchronized with single-word presentation, and partici-
                                                                                       pants were asked to repeat what they heard. Interestingly, par-
                                                                                       ticipants did not have any difficulty in hearing words during stim-
                                                                                       ulation on temporal plane sites, including HG. Stimulation on
                                                                                       sites that evoked a sound hallucination had no effect on speech
                                                                                       intelligibility, nor did stimulation affect the sound quality of
                                                                                       perception of the spoken words (0 out of 20 sites tested; see
                                                                                       also Video S1). Importantly, participants frequently reported
Figure 5. Electrocortical stimulation of Heschl’s gyrus and superior
                                                                                       hearing both the evoked sound percepts and the spoken words
temporal gyrus
(A) Focal electrocortical stimulation shows double dissociation between
                                                                                       independently and without distortion of either (12 sites). Two par-
effects of stimulation on HG and lateral STG. Stimulation in HG evoked audi-           ticipants also reported bilateral attenuation of sounds from stim-
tory hallucinations but did not interfere with word perception and repetition.         ulation of sites in the anterolateral HG (Figure 5A, green elec-
Participants could not perceive words during stimulation on lateral STG, but no        trodes). In these cases, they reported that environmental
additional sound hallucinations were evoked.                                           sounds such as the fan noise and hospital equipment beeping
(B) Z-scored high gamma amplitude (HGA) responses to sentences on elec-
                                                                                       were ‘‘muffled,’’ but speech was clear (two sites).
trodes in (A) did not differ between sites with different stimulation effects. Black
traces show the evoked sentence response in electrodes where stimulation
                                                                                          Electrical stimulation of the lateral STG had a completely
caused an auditory hallucination but no change in word perception. Red traces          different effect (14 sites). Participants were unable to detect
are sites where no hallucination occurred, but word perception was inter-              when electrical stimulation was applied. That is, there was no
rupted with stimulation.                                                               evoked sound hallucination with stimulation despite the fact
Related to Figure S5 and Video S1.                                                     the same sites exhibited robust and clear tuning to speech fea-
                                                                                       tures. However, when stimulation was applied during spoken
of independent parallel processing streams, one would predict                          words, participants experienced significant impairments in
dissociable effects. We hypothesized that stimulating core HG                          speech perception. They commonly reported: ‘‘I can’t hear’’ or
would evoke simple sound percepts, whereas stimulating STG                             ‘‘I could hear you speaking but can’t make out the words.’’
would evoke complex or speech/phonemic sounds based                                    One participant reported that syllables in the word seemed
upon their receptive field properties. As part of a clinical mapping                   ‘‘swapped.’’ They did not report that it was quieter or muffled.
protocol, we stimulated each site multiple times with increasing                       This is consistent with a previous report that demonstrated im-
current amplitude to carefully assess its function.                                    pairments in phonetic discrimination for speech sounds but no
  First, participants were asked to report what they perceived                         effect on tone discrimination (Sinai et al., 2009). We observed oc-
while cortical sites on the temporal plane and lateral STG                             casional paraphasic errors when repeating words, with phone-
were stimulated (Video S1; Table S2). An immediate sound                               mic substitutions or deletions. These effects could not be pre-
hallucination was evoked by stimulation of sites on the tempo-                         dicted by evoked responses alone, as sites with induced

8 Cell 184, 1–14, September 2, 2021
Parallel and distributed encoding of speech across human auditory cortex
Please cite this article in press as: Hamilton et al., Parallel and distributed encoding of speech across human auditory cortex, Cell (2021),
 https://doi.org/10.1016/j.cell.2021.07.019

                                                                                                                                   ll
Article

                                                                             sound aura. For example, he could speak to his mother, who
                                                                             would then verbally instruct him to lie down before the seizure
                                                                             became worse. Per his report, hearing words spoken by others
                                                                             was normal and did not sound distorted during the aura. In
                                                                             most episodes, the seizure self-resolved. Occasionally, the
                                                                             ongoing seizure propagation spread was associated with
                                                                             inability to comprehend language. His speech became ‘‘inco-
                                                                             herent’’ with ‘‘mixed up words,’’ and afterward, it evolved into
                                                                             a secondarily generalized convulsive seizure. High-resolution
                                                                             MRI of his brain was normal.
                                                                                To localize his seizure onset zone, he underwent stereotactic
                                                                             implantation of two parallel multi-electrode depth leads, placed
                                                                             longitudinally along the long axis of the left HG. Ten additional
                                                                             leads were placed in other brain areas. His seizures were found
                                                                             to originate from HG electrodes. Bedside stimulation mapping
                                                                             was performed, and electrical stimulation mapping of the HG
                                                                             electrodes reproduced his auditory auras with sound hallucina-
                                                                             tion. Furthermore, he was able to comprehend speech and
                                                                             had fluent language during stimulation mapping of HG.
                                                                                Because he had preserved language functions during stimula-
                                                                             tion mapping, the decision was made to proceed with thermo-
                                                                             coagulation of HG (see Figure 6). This was carried out through
                                                                             the same indwelling electrode leads (Bourdillon et al., 2017) while
                                                                             the patient was fully awake and carefully assessed throughout
Figure 6. Focal ablation of the left HG without damage to the pSTG           ablations at each site along the lead. He tolerated this procedure
has no effect on speech perception or language comprehension                 well, without changes in speech or language comprehension.
Magnetic resonance (MR) image shows the extent of the surgical ablation in   Post-treatment MRI showed excellent ablation of the left HG
the axial plane along the axis of the HG, sparing pSTG. Image is shown in    with preservation of the adjacent lateral STG. Formal audiometry
radiological orientation.                                                    after the procedure demonstrated normal bilateral audiograms,
                                                                             as well as normal speech comprehension and production.
hallucination or word perception effects showed no difference in             He was seizure free for 1 year after the HG thermocoagulation
high-gamma responses (Figure 5B).                                            ablation. This case study provides additional causal evidence
   We observed an unexpected and striking double dissociation                that sound processing in left HG is not necessary for speech
of effects of stimulation on medial HG and lateral STG. Stimula-             comprehension.
tion of HG evoked clear contralateral sound percepts, without
interruption or distortion of word perception. In contrast, stimu-           DISCUSSION
lation of the lateral STG did not evoke any sounds, but interfered
significantly with speech processing. These results suggest that             The human auditory system decomposes the speech signal into
medial and lateral auditory areas may be part of parallel sound              components that are relevant for perception. In this study, we
analysis pathways, rather than part of a commonly assumed sin-               provide a characterization of speech responses across the hu-
gle serial, hierarchical pathway. HG does not appear to be                   man auditory cortex, including the HG, surrounding areas of
required for speech perception, whereas STG does.                            PT and PP, and the posterior to middle extent of lateral STG.
Focal ablation of the left HG does not affect speech                         Microsurgical access to the Sylvian fissure provided dense
comprehension                                                                simultaneous recordings of the highly heterogeneous responses
Additional causal evidence was sought for evaluating the role of             to speech from all regions of the human auditory cortex, in
HG in speech perception. Injury to the posterior STG is strongly             contrast to previous intracranial approaches that relied on only
associated with impairments in speech perception and compre-                 piecemeal sampling from one of these regions at a time. We eval-
hension, as well as language production (Hillis et al., 2017). The           uated the role of each area in processing of speech sounds using
consequences of lesioning HG are less clear, as selective stroke             converging experimental approaches probing the timing and or-
or surgical resections there are extraordinarily rare. We present a          der of activation, the nature of simple and complex sound repre-
case study of a patient that underwent a selective ablation of HG.           sentations in each area, and their causal role in speech compre-
The patient is a 33-year-old right-handed man with a history of              hension using functional and surgical ablation.
refractory seizures with auditory auras. The seizure semiology                  Our findings on the functional organization of the human pri-
was described as a ‘‘ringing, high-pitched sound from around                 mary and parabelt auditory cortex are in line with previous
the right ear.’’ As the seizure progressed, the intensity of the             studies that employed depth recording electrodes in HG to
sound would increase in loudness and could last tens of seconds              show that core auditory cortical areas show tonotopic organiza-
or minutes. At the start, he could hear and comprehend speech                tion and fast-latency responses to click trains and can track pitch
from others while concurrently experiencing the high-pitched                 changes in pure tones (Brugge et al., 2009; Griffiths et al., 2010;

                                                                                                          Cell 184, 1–14, September 2, 2021 9
Please cite this article in press as: Hamilton et al., Parallel and distributed encoding of speech across human auditory cortex, Cell (2021),
 https://doi.org/10.1016/j.cell.2021.07.019

          ll
                                                                                                                                  Article

Howard et al., 1996; Steinschneider et al., 2014), as well as with       processing occurring in parallel with the computations per-
noninvasive tonotopic mapping using fMRI (Barton et al., 2012;           formed by circuits on the temporal plane itself (Nourski et al.,
Da Costa et al., 2011; Dick et al., 2017; Humphries et al., 2010;        2014), with a general pattern that the fastest and earliest activa-
Leaver and Rauschecker, 2016; Saenz and Langers, 2014;                   tions occur across the entire posterior aspect of the human
Schönwiesner et al., 2015; Talavage et al., 2004; Wessinger             auditory cortex.
et al., 1997; Woods and Alain, 2009; Woods et al., 2010). In                Despite the similarities in latency, ECS of contacts in lateral
contrast, responses in PP were slower and more similar to                and superior temporal areas revealed a striking functional and
higher-order areas of mid- to anterior STG, with different recep-        anatomical double dissociation. Stimulation of sites in postero-
tive fields for tones and speech, little frequency selectivity, and      medial temporal plane induced vivid sound hallucinations on
encoding of relative rather than absolute pitch in speech.               the contralateral side but no impairment of speech perception,
   Neural responses to speech were strongest in lateral STG,             whereas stimulation of lateral STG had the opposite effect.
where selectivity was greater for acoustic-phonetic and proso-           This aligns with clinical studies showing that resection of HG
dic features than to pure tones. In addition to replicating the ex-      does not result in speech comprehension deficits (Russell and
istence of an onset zone in pSTG (Hamilton et al., 2018), we             Golfinos, 2003; Sakurada et al., 2007; Silbergeld, 1997). In
found functionally distinct, but anatomically interleaved, popula-       contrast, damage to left lateral STG results in severe speech
tions in mid-STG representing different linguistically relevant fea-     comprehension (and language production) deficits (Butler
tures in speech. These included phonological features, acoustic          et al., 2014; Wernicke, 1874). Together, this suggests that the
onset edges that cue syllables (Oganian and Chang, 2019), and            primary auditory cortex on HG is not the main source of input
relative pitch, which is the main cue to intonational prosody            to the entire STG. Rather, we propose that pSTG receives direct
(Tang et al., 2017). While relative pitch was also represented in        input from outside the auditory core.
PP, peakRate and phonetic features were represented predom-                 ‘‘Core’’ auditory cortex is defined as the heavily myelinated,
inantly in middle STG, supporting its role for speech processing.        tonotopic region that receives thalamic projections from the
Of note, this is consistent with a spatial population code for           ventral medial geniculate body (vMGB) via the lemniscal audi-
speech cues that are both short (phoneme segment length;                 tory pathway (Bartlett, 2013; Dick et al., 2012; Galaburda and
e.g., consonant features) and relatively long (suprasegmental;           Sanides, 1980; Hackett, 2011; Hackett et al., 2001, 2007; Scott
e.g., prosodic cues) in duration.                                        et al., 2017a). However, it is largely underappreciated that the
   Processing for absolute versus relative pitch was distinctly          ‘‘parabelt’’ auditory cortex in the STG receives direct and
regionalized. In our previous work in the STG, absolute-pitch re-        distinct thalamic projections via the nonlemniscal pathway,
sponses were rare compared to relative-pitch and phonological            from the medial and dorsal divisions of the MGB, and pulvinar
representations (Tang et al., 2017). Here, we found that absolute-       (Bartlett, 2013; Hackett et al., 1998, Scott et al., 2017a). Of
pitch selectivity dominated in the temporal plane (HG and PT).           note, the medial and dorsal divisions of MGB are significantly
Absolute-pitch sensitivity has been observed in nonhuman pri-            larger relative to the ventral division in humans compared to
mates at the anterolateral border of the auditory core (Bendor           nonhuman primates (Brugge and Howard, 2002; Winer, 1984).
and Wang, 2005, 2010) and fits well with the narrow, low spectral        Functionally, sound representation in core and non-core areas
tuning for pure tones and speech vocal pitch in these areas. In          is dramatically different; core receptive fields have narrow fre-
contrast, relative-pitch representations dominated in mid-ante-          quency tuning and are tuned to contralateral sounds (Bitterman
rior lateral STG and PP. This is consistent with previous human          et al., 2008; Khalighinejad et al., 2021), whereas lateral STG has
studies, where sounds with pitch activate more of lateral HG             complex spectrotemporal but poor spatial selectivity. A parsi-
than sounds without pitch and sounds with pitch variation acti-          monious interpretation is that distinct parallel thalamocortical
vate regions of PP and anterior STG (Patterson et al., 2002).            pathways process different aspects of the speech signal. We
Such selectivity may be analogous to voice-selective areas on            speculate that the non-lemniscal pathway plays an essential
the anterior temporal plane in macaque (Perrodin et al., 2011;           role for speech perception and therefore may not require core
Petkov et al., 2008, 2009).                                              A1 processing. This is an alternative to the mainstream model
   On the other hand, our response latency analysis challenges           of core-belt-parabelt cortical pathway that is thought to underlie
current models of information flow from primary to parabelt              a hierarchical processing of progressively complex, abstract
auditory cortex (Brodbeck et al., 2018; Hickok and Poeppel,              features. However, it is likely that some relevant aspects of
2007; Jasmin et al., 2019; Rauschecker and Tian, 2000; Saenz             speech (i.e., localization or acoustic quality) occur via cortico-
and Langers, 2014). For example, the comparably short                    cortical interactions between areas with different thalamocorti-
response latencies in posteromedial HG, PT, and the pSTG                 cal inputs.
onset area and the differences in representational content of               Thus, the auditory cortical system may have parallel organi-
these areas do not support simple serial processing. The                 zation to a greater extent than the visual system (Rauschecker
pSTG onset area responded to onsets exclusively, with a broad            et al., 1997). For example, in the ventral stream for object
spectral but narrow temporal response, a pattern that was not            recognition, feedforward connections relay information hierar-
seen in temporal plane regions. This region is not purely speech         chically from the lateral geniculate nucleus (LGN) to V1 and
selective and also responds to non-speech and synthetic sound            onward to V4, through successively larger retinotopically orga-
onsets (Hamilton et al., 2018). In contrast, HG/PT STRFs were            nized receptive fields (Felleman and Van Essen, 1991; Sereno
more selective spectrally. The similar response timescales in            et al., 1995). While increasing evidence shows additional paral-
these areas indicate that the onset zone in pSTG reflects early          lel processing within the visual system, injury to V1 still has

10 Cell 184, 1–14, September 2, 2021
Please cite this article in press as: Hamilton et al., Parallel and distributed encoding of speech across human auditory cortex, Cell (2021),
 https://doi.org/10.1016/j.cell.2021.07.019

                                                                                                                                           ll
Article

immediate and long-standing effects on visual processing. In                     B Electrocortical stimulation (ECS)
contrast, the auditory system is heavily parallel throughout,                    B Thermoablation case study
A1 (but not nonprimary STG) is organized tonotopically, and                  d   QUANTIFICATION AND STATISTICAL ANALYSIS
isolated A1 injury has no clear consequences for speech
perception or intelligibility.                                           SUPPLEMENTAL INFORMATION
   Our results demonstrate a comprehensive cortical map of the
acoustic and phonetic representations underlying human                   Supplemental information can be found online at https://doi.org/10.1016/j.cell.
speech perception. Speech representations are distributed                2021.07.019.

across the human auditory cortex, with clear parallel processing
as well as potential serial processing at longer latencies in the        ACKNOWLEDGMENTS
anterior and middle STG. Overall, our findings speak to a distrib-
                                                                         The authors would like to thank Matthew Leonard and Brian Malone for helpful
uted mosaic of specialized processing units, each representing           comments on the manuscript. The authors also thank Michael T. Lawton, Neal
different acoustic and phonetic cues in the speech signal, the           Fox, Matthew Leonard, Matthias Sjerps, Kunal Raygor, and Leah Muller for
combination of which creates the rich experience of natural              assistance with intraoperative recordings and Patrick Hullett for assistance
speech comprehension.                                                    with MID analysis and comments on the manuscript. This work was supported
                                                                         by grants from the NIH (F32 DC014192-01 to L.S.H. and R01-DC012379 and
                                                                         U01-NS117765 to E.F.C.). This research was also supported by Bill and Susan
Limitations of the study                                                 Oberndorf, the Joan and Sandy Weill Foundation, and the William K. Bowes
A limitation of the current study is that our stimulation and abla-      Foundation. We gratefully acknowledge the support of NVIDIA Corporation
tion results involved unilateral and not bilateral HG. While neural      with the donation of the Tesla K40 GPU used for this research.
activity during speech processing is largely bilateral (Cogan et
al., 2014), in clinical language mapping, unilateral stimulation of      AUTHOR CONTRIBUTIONS
STG alone on the language dominant side results in comprehen-
sion deficits. Thus, the other hemisphere is unable to compen-           L.S.H. and E.F.C. conceived the neurophysiology recording experiments.
sate for this disruption. Previous work has shown that the ipsilat-      L.S.H., Y.O., E.F.C., and others collected the data. L.S.H. and Y.O. analyzed
                                                                         the data. J.H. contributed data from the thermoablation case study. L.S.H.,
eral primary auditory cortex connects to contralateral primary
                                                                         Y.O., and E.F.C. wrote and revised the paper. E.F.C. performed the surgeries,
auditory cortex, but not to contralateral STG (and vice versa)           designed the stimulation experiments, and supervised the project.
(Hackett et al., 1999; Kaas and Hackett, 2000). Thus, it is not
possible for pSTG to receive inputs through contralateral A1             DECLARATION OF INTERESTS
directly. Information might instead travel from contralateral A1
to contralateral STG and then to ipsilateral STG. It is also             The authors declare no competing interests.
possible that cross-hemispheric connectivity could be present
between STG and alHG, but based on our latency analysis,                 Received: August 15, 2020
                                                                         Revised: February 11, 2021
this seems unlikely. Future studies incorporating bilateral stimu-
                                                                         Accepted: July 19, 2021
lation may be able to uncover to what extent pSTG and HG are
                                                                         Published: August 18, 2021
truly functionally independent.
                                                                         REFERENCES
STAR+METHODS
                                                                         Aertsen, A.M., and Johannesma, P.I.M. (1981). The spectro-temporal recep-
Detailed methods are provided in the online version of this paper        tive field. A functional characteristic of auditory neurons. Biol. Cybern. 42,
                                                                         133–143.
and include the following:
                                                                         Atencio, C.A., Sharpee, T.O., and Schreiner, C.E. (2008). Cooperative nonlin-
   d   KEY RESOURCES TABLE                                               earities in auditory cortical neurons. Neuron 58, 956–966.
   d   RESOURCE AVAILABILITY                                             Bartlett, E.L. (2013). The organization and physiology of the auditory thalamus
       B Lead contact                                                    and its role in processing acoustic features important for speech perception.
       B Materials availability                                          Brain Lang. 126, 29–48.
       B Data and code availability                                      Barton, B., Venezia, J.H., Saberi, K., Hickok, G., and Brewer, A.A. (2012).
   d   EXPERIMENTAL MODEL AND SUBJECT DETAILS                            Orthogonal acoustic dimensions define auditory field maps in human cortex.
                                                                         Proc. Natl. Acad. Sci. USA 109, 20738–20743.
   d   METHOD DETAILS
                                                                         Bendor, D., and Wang, X. (2005). The neuronal representation of pitch in pri-
       B Neural recordings
                                                                         mate auditory cortex. Nature 436, 1161–1165.
       B Electrode localization
                                                                         Bendor, D., and Wang, X. (2010). Neural coding of periodicity in marmoset
       B Stimuli
                                                                         auditory cortex. J. Neurophysiol. 103, 1809–1822.
       B Tone stimuli
                                                                         Berezutskaya, J., Freudenburg, Z.V., Güçlü, U., van Gerven, M.A.J., and Ram-
       B Latency analysis
                                                                         sey, N.F. (2017). Neural tuning to low-level features of speech throughout the
       B Pure tone receptive fields                                      perisylvian cortex. J. Neurosci. 37, 7906–7920.
       B Nonlinear maximally informative dimensions
                                                                         Besle, J., Fischer, C., Bidet-Caulet, A., Lecaignard, F., Bertrand, O., and Giard,
       B Linear receptive field analysis                                 M.-H. (2008). Visual activation and audiovisual interactions in the auditory cor-
       B Sentence onset feature                                          tex during speech perception: intracranial recordings in humans. J. Neurosci.
       B Unsupervised clustering of LFP time series                      28, 14301–14310.

                                                                                                           Cell 184, 1–14, September 2, 2021 11
Please cite this article in press as: Hamilton et al., Parallel and distributed encoding of speech across human auditory cortex, Cell (2021),
  https://doi.org/10.1016/j.cell.2021.07.019

            ll
                                                                                                                                                       Article
Bidet-Caulet, A., Fischer, C., Besle, J., Aguera, P.-E., Giard, M.-H., and Ber-     Dick, F.K., Lehet, M.I., Callaghan, M.F., Keller, T.A., Sereno, M.I., and Holt, L.L.
trand, O. (2007). Effects of selective attention on the electrophysiological rep-   (2017). Extensive Tonotopic Mapping across Auditory Cortex Is Recapitulated
resentation of concurrent sounds in the human auditory cortex. J. Neurosci.         by Spectrally Directed Attention and Systematically Related to Cortical Mye-
27, 9252–9261.                                                                      loarchitecture. J. Neurosci. 37, 12187–12201.
Binder, J.R., Frost, J.A., Hammeke, T.A., Bellgowan, P.S., Springer, J.A., Kauf-    Edwards, E., Soltani, M., Kim, W., Dalal, S.S., Nagarajan, S.S., Berger, M.S.,
man, J.N., and Possing, E.T. (2000). Human temporal lobe activation by              and Knight, R.T. (2009). Comparison of time-frequency responses and the
speech and nonspeech sounds. Cereb. Cortex 10, 512–528.                             event-related potential to auditory speech stimuli in human cortex.
Bitterman, Y., Mukamel, R., Malach, R., Fried, I., and Nelken, I. (2008). Ultra-    J. Neurophysiol. 102, 377–386.
fine frequency tuning revealed in single neurons of human auditory cortex. Na-      Felleman, D.J., and Van Essen, D.C. (1991). Distributed hierarchical process-
ture 451, 197–201.                                                                  ing in the primate cerebral cortex. Cereb. Cortex 1, 1–47.
Bourdillon, P., Isnard, J., Catenoix, H., Montavont, A., Rheims, S., Ryvlin, P.,    Fenoy, A.J., Severson, M.A., Volkov, I.O., Brugge, J.F., and Howard, M.A., 3rd.
Ostrowsky-Coste, K., Mauguiere, F., and Guénot, M. (2017). Stereo electroen-       (2006). Hearing suppression induced by electrical stimulation of human audi-
cephalography-guided radiofrequency thermocoagulation (SEEG-guided RF-              tory cortex. Brain Res. 1118, 75–83.
TC) in drug-resistant focal epilepsy: Results from a 10-year experience. Epilep-
                                                                                    Fischl, B., Sereno, M.I., Tootell, R.B.H., and Dale, A.M. (1999). High-resolution
sia 58, 85–93.
                                                                                    intersubject averaging and a coordinate system for the cortical surface. Hum.
Bouthillier, A., Surbeck, W., Weil, A.G., Tayah, T., and Nguyen, D.K. (2012). The   Brain Mapp. 8, 272–284.
hybrid operculo-insular electrode: a new electrode for intracranial investiga-
                                                                                    Flinker, A., Chang, E.F., Barbaro, N.M., Berger, M.S., and Knight, R.T. (2011).
tion of perisylvian/insular refractory epilepsy. Neurosurgery 70, 1574–1580,
                                                                                    Sub-centimeter language organization in the human temporal lobe. Brain
discussion 1580.
                                                                                    Lang. 117, 103–109.
Brewer, A.A., and Barton, B. (2016). Maps of the Auditory Cortex. Annu. Rev.
                                                                                    Friederici, A.D. (2015). White-matter pathways for speech and language pro-
Neurosci. 39, 385–407.
                                                                                    cessing. In Handbook of Clinical Neurology, Chapter 10, M.J. Aminoff, F. Bol-
Brodbeck, C., Presacco, A., and Simon, J.Z. (2018). Neural source dynamics          ler, and D.F. Swaab, eds. (Elsevier), pp. 177–186.
of brain responses to continuous stimuli: Speech processing from acoustics
                                                                                    Galaburda, A., and Sanides, F. (1980). Cytoarchitectonic organization of the
to comprehension. Neuroimage 172, 162–174.
                                                                                    human auditory cortex. J. Comp. Neurol. 190, 597–610.
Brugge, J.F., and Howard, M.A. (2002). Hearing. In Encyclopedia of the Human
                                                                                    Garofolo, J.S., Lamel, L.F., Fisher, W.F., Fiscus, J.G., Pallett, D.S., Dahlgren,
Brain, V.S. Ramachandran, ed. (Academic Press), pp. 429–448.
                                                                                    N.L., and Zue, V. (1993). TIMIT Acoustic-Phonetic Continuous Speech Corpus
Brugge, J.F., Volkov, I.O., Garell, P.C., Reale, R.A., and Howard, M.A., 3rd.       (Linguistic Data Consortium).
(2003). Functional connections between auditory cortex on Heschl’s gyrus
                                                                                    Griffiths, T.D., and Warren, J.D. (2002). The planum temporale as a computa-
and on the lateral superior temporal gyrus in humans. J. Neurophysiol. 90,
                                                                                    tional hub. Trends Neurosci. 25, 348–353.
3750–3763.
Brugge, J.F., Nourski, K.V., Oya, H., Reale, R.A., Kawasaki, H., Steinsch-          Griffiths, T.D., Kumar, S., Sedley, W., Nourski, K.V., Kawasaki, H., Oya, H., Pat-
neider, M., and Howard, M.A., 3rd. (2009). Coding of repetitive transients by       terson, R.D., Brugge, J.F., and Howard, M.A. (2010). Direct recordings of pitch
auditory cortex on Heschl’s gyrus. J. Neurophysiol. 102, 2358–2374.                 responses from human auditory cortex. Curr. Biol. 20, 1128–1132.

Butler, R.A., Lambon Ralph, M.A., and Woollams, A.M. (2014). Capturing multi-       Hackett, T.A. (2011). Information flow in the auditory cortical network. Hear.
dimensionality in stroke aphasia: mapping principal behavioural components          Res. 271, 133–146.
to neural structures. Brain 137, 3248–3266.                                         Hackett, T.A., Stepniewska, I., and Kaas, J.H. (1998). Subdivisions of auditory
Chang, E.F. (2015). Towards large-scale, human-based, mesoscopic neuro-             cortex and ipsilateral cortical connections of the parabelt auditory cortex in
technologies. Neuron 86, 68–78.                                                     macaque monkeys. J. Comp. Neurol. 394, 475–495.

Cheung, C., Hamilton, L.S., Johnson, K., and Chang, E.F. (2016). The auditory       Hackett, T.A., Stepniewska, I., and Kaas, J.H. (1999). Callosal connections of
representation of speech sounds in human motor cortex. eLife 5, 1–19.               the parabelt auditory cortex in macaque monkeys. Eur. J. Neurosci. 11,
                                                                                    856–866.
Chevillet, M., Riesenhuber, M., and Rauschecker, J.P. (2011). Functional cor-
relates of the anterolateral processing hierarchy in human auditory cortex.         Hackett, T.A., Preuss, T.M., and Kaas, J.H. (2001). Architectonic identification
J. Neurosci. 31, 9345–9352.                                                         of the core region in auditory cortex of macaques, chimpanzees, and humans.
                                                                                    J. Comp. Neurol. 441, 197–222.
Cogan, G.B., Thesen, T., Carlson, C., Doyle, W., Devinsky, O., and Pesaran, B.
(2014). Sensory–motor transformations for speech occur bilaterally. Nature          Hackett, T.A., De La Mothe, L.A., Ulbert, I., Karmos, G., Smiley, J., and
507 (7490), 94–98.                                                                  Schroeder, C.E. (2007). Multisensory convergence in auditory cortex, II. Tha-
                                                                                    lamocortical connections of the caudal superior temporal plane. J. Comp.
Da Costa, S., van der Zwaag, W., Marques, J.P., Frackowiak, R.S.J., Clarke,
                                                                                    Neurol. 502, 924–952.
S., and Saenz, M. (2011). Human primary auditory cortex follows the shape
of Heschl’s gyrus. J. Neurosci. 31, 14067–14075.                                    Hamilton, L.S., Chang, D.L., Lee, M.B., and Chang, E.F. (2017). Semi-auto-
                                                                                    mated anatomical labeling and inter-subject warping of high-density intracra-
Dalca, A.V., Danagoulian, G., Kikinis, R., Schmidt, E., and Golland, P. (2011).
                                                                                    nial recording electrodes in electrocorticography. Frontiers in Neuroinfor-
Segmentation of nerve bundles and ganglia in spine MRI using particle filters.
                                                                                    matics 11, 62.
In Proceedings of the International Conference on Medical Image Computing
and Computer-Assisted Intervention, pp. 537–545.                                    Hamilton, L.S., Edwards, E., and Chang, E.F. (2018). A spatial map of onset
                                                                                    and sustained responses to speech in the human superior temporal gyrus.
de Heer, W.A., Huth, A.G., Griffiths, T.L., Gallant, J.L., and Theunissen, F.E.
                                                                                    Curr. Biol. 28, 1860–1871.e4.
(2017). The Hierarchical Cortical Organization of Human Speech Processing.
J. Neurosci. 37, 6539–6557.                                                         Hamilton, L.S., and Huth, A.G. (2020). The revolution will not be controlled: nat-
Démonet, J.F., Chollet, F., Ramsay, S., Cardebat, D., Nespoulous, J.L., Wise,      ural stimuli in speech neuroscience. Lang. Cogn. Neurosci. 35 (5), 573–582.
R., Rascol, A., and Frackowiak, R. (1992). The anatomy of phonological and          Hickok, G., and Poeppel, D. (2007). The cortical organization of speech pro-
semantic processing in normal subjects. Brain 115, 1753–1768.                       cessing. Nat. Rev. Neurosci. 8, 393–402.
Dick, F., Tierney, A.T., Lutti, A., Josephs, O., Sereno, M.I., and Weiskopf, N.     Hickok, G., and Saberi, K. (2012). Redefining the Functional Organization of the
(2012). In vivo functional and myeloarchitectonic mapping of human primary          Planum Temporale Region: Space, Objects, and Sensory–Motor Integration.
auditory areas. J. Neurosci. 32, 16095–16105.                                       In The Human Auditory Cortex, D. Poeppel, ed. (Springer), pp. 333–350.

12 Cell 184, 1–14, September 2, 2021
You can also read