Learning unacceptability: Repeated exposure to acceptable sentences improves adult learners' recognition of unacceptable sentences

Page created by Hazel Dean
 
CONTINUE READING
Learning unacceptability: Repeated exposure to acceptable sentences improves adult learners' recognition of unacceptable sentences
Learning unacceptability: Repeated exposure to acceptable sentences improves adult
learners’ recognition of unacceptable sentences

                                              Abstract
     People who learn a new language as adults tend to judge unconventional utterances more
     leniently than native speakers do, while both groups’ ratings on acceptable utterances tend
     to align more closely. Experiment 1 confirms this asymmetry with 61 English-speaking
     undergraduate students enrolled in Spanish classes. The finding that unconventional
     utterances are particularly hard for learners to fully appreciate raises the possibility that
     conventional utterances may not statistically preempt unconventional paraphrases for adult
     learners. To investigate this, we report a preregistered study that provides the undergraduates
     learning Spanish with three days of exposure to conventional Spanish sentences involving
     one of two sets of constructions. They performed self-paced reading initially and after
     exposure. While native Spanish speakers displayed the expected slow-down when reading
     the unconventional sentences (Exp 2), but the learners did not, regardless of exposure or
     proficiency. At the same time, judgment data reveal that even beginning learners at the initial
     assessment explicitly rate unconventional sentences somewhat lower than conventional
     sentences, and the recognition of unconventionality increases with proficiency. Moreover,
     the judgment data reveal an effect of statistical preemption, particularly on intermediate
     learners, as predicted: repeatedly witnessing conventional sentences significantly impacted
     subsequent ratings of unconventional paraphrases. Collectively, the current findings indicate
     that adult learners do take advantage of statistical preemption to identify unacceptable
     sentences, but their ability to recognize unacceptability in real-time lags far behind.

Introduction
Learning multiple languages is highly beneficial in our multicultural world, as bilingualism is
valued by immigrants, schools, travelers, and all heterogeneous communities. Yet learning a new
language as an adult is slow and difficult for most people. The task is challenging because each
language involves a unique and complex system of generalizations, subregularities, and
idiosyncrasies. For instance, native English speakers judge the sentences in (8a)-(11a) to be
strongly dispreferred in comparison to the conventional alternatives in (8b)-(11b):

   8a ?Lisa filled water into the cup. (Ambridge & Brandt, 2013)
   9a ?The magician disappeared the rabbit. (Robenalt & Goldberg, 2016)
   10a ?Amber explained Zach the answer. (Tachihara & Goldberg, 2020)
   11a ?Dan forced that Helen plays tennis. (Tachihara & Goldberg, 2020)

   8b. Lisa filled the cup with water.
   9b. The magician made the rabbit disappear.
   10b. Amber explained the answer to Zach.
   11b. Dan forced Helen to play tennis.

The types of utterances in (8b)-(11b) can be viewed as statistically preempting the corresponding
utterance types in (8a)-(11a) for native English speakers, who tend to strongly prefer to use the
formulations in (8b)-(11b) instead of any of the formulations in (8a)-(11a) to express the intended
messages.

                                                                                                   1
Learning unacceptability: Repeated exposure to acceptable sentences improves adult learners' recognition of unacceptable sentences
For adults learning a new language, avoiding the types of unconventional sentences in (8a)-
(11a) is especially challenging. Several studies have demonstrated that adult learners are markedly
more tolerant of unconventional sentences in comparison to native speakers (Ambridge & Brandt,
2013; Brooks et al., 1999; Robenalt & Goldberg, 2016; Tachihara & Goldberg, 2020). In fact,
there is a significantly larger difference between native speakers’ and learners’ acceptability
ratings on unconventional sentences in comparison to conventional sentences (like the [b]
sentences): judgments on conventional sentences tend to align more closely with native speakers’.
Tachihara & Goldberg (2020) found that adult learners’ more lenient judgments on unconventional
formulations were remarkably robust, regardless of their language background, and even though
most learners were able to correctly recognize that a conventional alternative was somewhat
preferred over an unconventional formulation. The unacceptability of the sentences in (8a)-(11a)
may be particularly difficult to fully appreciate because each of the utterances is easy to interpret.
Moreover, the same constructions are fully acceptable when used with different verbs, as
illustrated in (8c)-(11c):

   8c. Lisa poured water into the cup.
   9c. The magician hid the rabbit.
   10c. Amber told Zack the answer.
   11c. Dan thought that Helen plays tennis.

The type of semi-arbitrary unacceptability evident in (8a)-(11a) appears to exist in every language,
as languages are conventional systems of communication, which develop over time in complex
ways. We will see, for example, that native Spanish speakers judge (12a)-(13a) to be markedly
less acceptable in comparison to (12b)-(13b), while the same constructions, unacceptable in the
(a) sentences, are fully acceptable when used with different main verbs (12c)-(13c). That is,
languages can be quite picky about which combinations of verbs and “argument structure”
constructions are conventional and acceptable, for reasons that are not always transparent to
researchers, let alone language learners.

   12a. ? Rafael obligó que ellos irían al cine.
      “Rafael forced that they go to the movies.”
   13a. ?Maya creyó a Javier a esperar por ella.
      “Maya believed Javier to wait for her.”

   12b. Rafael los obligó a ir al cine.
      “Rafael forced them to go to the movies.”
   13b. Maya creyó que Javier esperaría por ella.
      “Maya believed that Javier would wait for her.”

   12c. Rafael pensó que ellos irían al cine.
      “Rafael thought that they would go to the movies.”
   13c. Maya obligó a Javier a esperar por ella.
      Maya forced Javier to wait for her.

Adult learners are known to produce utterances like those in (8a)-(13a). This is not in itself
problematic, as subcommunities of speakers routinely develop their own vernaculars and there is

                                                                                                    2
Learning unacceptability: Repeated exposure to acceptable sentences improves adult learners' recognition of unacceptable sentences
no evidence that any vernacular is “better” than any other, so we might consider adult language
learners to simply develop their own conventional ways of speaking (Ortega, 2013; Cheng et al,
2021). In fact, as long as what is uttered is interpreted as intended, any utterance can legitimately
be said to be good enough (Goldberg & Ferreira, 2022). At the same time, many adult learners aim
to reproduce a vernacular they perceive to be standard, to avoid implicit (or explicit) bias (Gluszek
& Dovidio, 2010). To the extent that adult learners share this goal, the question arises as to why
adult learners show a propensity to rate unconventional language leniently.
        We do not need to endorse a prescriptivist attitude toward differences between native
speakers and adult learners’ speech to find the differences compelling1. Our own interest lies in
the mechanisms involved in language learning by adults and children. It is striking that child
learners inevitably learn the nuanced conventions of their language(s), and as adults, generally
recapitulate the conventional, colloquial language they witness being used by their peers; as native
speakers, they rarely produce the unconventional types of examples in any of the (a) examples
above as adults and if asked, reliably judge such examples to be quite unacceptable.1 How do child
learners avoid producing and come to rate unconventional sentences so strongly unfavorably in
their native language(s)?
     Studies have repeatedly found that caregivers do not reliably provide explicit corrections, as
they are more concerned with the content of the child’s speech than its form. Even when
corrections are offered, they appear to often be misinterpreted or ignored (Baker, 1979; Bowerman,
1988; Golinkoff & Hirsh-Pasek, 1996; Pinker, 1989). Presumably, child learners come to avoid
unconventional formulations because they repeatedly witness conventional alternatives which
come to suppress any unconventional ways of expressing the identical message over years of
exposure (unless there is a particular reason to produce a novel formulation). This occurs through
a process of competition or statistical preemption (e.g., Ambridge et al., 2018; Boyd & Goldberg,
2011; Perek & Goldberg, 2017; Goldberg, 1995; 2011; 2006; see also Chouinard & Clark, 2003
for a related discussion of recasts).
        There exist two possible interpretations of how statistical preemption works. One
possibility assumes that an unconventional formulation needs to be predicted or activated before
the conventional formulation is witnessed. Alternatively, statistical preemption may only require
that a conventional formulation be activated at the same time or even after an unconventional
formulation is witnessed. In either case, statistical preemption relies on the idea that a more
conventional formulation competes with any unconventional formulation as a means of expressing
the same message. The more often a conventional formulation is witnessed, the stronger it becomes,
and the more strongly it comes to suppress any unconventional potential alternative.
        Evidence that native speakers learn what not to say via statistical preemption comes from
production studies (Boyd & Goldberg, 2011), corpus data (Goldberg, 2011), and mini-artificial
language paradigms (Perek & Goldberg, 2017). Other evidence comes from the fact that native
speakers tend to judge novel sentences with readily available conventional paraphrases to be less

1
  We use the term, “adult language learners” to refer to adults who are not yet proficient in a new language. Other
terms used in the literature are “second language (L2) speakers” or “nonnative speakers.” We prefer “language
learners” to highlight the dynamic nature of language learning. We use “native speakers” to refer to fluent speakers
who grew up speaking the language. While we acknowledge that using the term, “native speakers” may exclude
potential participants who are fluent expert speakers (Cheng et al., 2021), we use this term because we recruited
participants for the current study by asking for “native speakers of .”

                                                                                                                       3
Learning unacceptability: Repeated exposure to acceptable sentences improves adult learners' recognition of unacceptable sentences
acceptable than novel sentences for which no readily available paraphrase exists, where the
strength of a readily available paraphrase is estimated by the degree of convergence of speakers
upon the same paraphrase (Robenalt & Goldberg, 2015). For instance, if asked to paraphrase an
unconventional utterance like (8a): ?Lisa filled water into the cup, more than half of the native
English speakers spontaneously suggest the same paraphrase, (8b), Lisa filled the cup with water.
Relatedly, combinations of verb and argument structure that appear more frequently in naturalistic
data provide more readily accessible paraphrases, and several studies have found that judgments
on novel utterances vary inversely with the frequency of the convention paraphrase (Ambridge et
al., 2008; Brooks & Tomasello, 1999; Robenalt & Goldberg, 2015; 2016; Theakston 2004). For
example, 14a is judged less acceptable than 15a, presumably because the combination of verb and
construction in 14b is more frequent than that in 15b: i.e., 14b more strongly preempts 14a, than
15b preempts 15a.

     14a. ??The magician disappeared the rabbit.
     15a. ?The magician vanished the rabbit.

     14b. MAKE  DISAPPEAR
     15b. MAKE  VANISH

The effect of paraphrases’ frequency on (un)acceptability indicates that repeated experiences are
key to statistical preemption. With sufficient exposure to a conventional formulation, it should be
possible to learn that a novel alternative is unacceptable without the need for explicit correction.
The current work tests whether statistical preemption plays a role in adult language learning.
         Few previous studies have addressed this question directly, and the few that have, have
drawn different conclusions. Adult language learners’ greater acceptance of unconventional
formulations suggests that statistical preemption may not work the same way in child and adult
language learners except for adults at the highest levels of expertise in the target language
(Ambridge & Brandt, 2013; Navarro-Torres et al., to appear; Robenalt & Goldberg, 2016;
Treffers-Daller & Calude 2015). For instance, Zhang and Mai, (2018) compared two highly
proficient English groups of speakers whose first language was Chinese on judgments of English
denominal verbs (e.g., shirt the model; brick the path). One group of participants was English
majors in their fourth year of study; the other group of participants was professional English
teachers. Only the teachers showed an effect of statistical preemption: they were more likely to
accept a denominal verb when a preempting verb was of lower frequency, but the English majors
did not. Tachihara & Goldberg (2020) probed this phenomenon in a series of 5 studies, testing
980 adult learners of English who had lived in the US and were reasonably proficient in English.
The study confirmed, as already discussed, that learners generally knew which of the two
formulations was preferable, but nonetheless tended to be more lenient than native English
speakers when asked to rate unconventional sentences. In an attempt to better align learners’
judgments of unacceptable sentences with native speakers’, one study in Tachihara & Goldberg
(2020) exposed adult learners to a conventional formulation immediately before the being asked
to rate the unconventional paraphrase. The single exposure had no effect on learners’ acceptability
ratings.
         In accounting for adult learners’ challenge in fully appreciating unconventional language
as unacceptable, the question of possible interference or “transfer” from learners’ more dominant
naturally arises. Yet experienced language teachers have long suspected that the production of

                                                                                                  4
Learning unacceptability: Repeated exposure to acceptable sentences improves adult learners' recognition of unacceptable sentences
unconventional expressions (like those in (8a)-(13a)) do not result from transfer effects (Borg,
2003). An analysis in Tachihara & Goldberg (2020) confirms that transfer effects cannot be wholly
responsible for the higher tolerance of unconventional sentences by adult learners. In particular,
two constructions of similar complexity were examined in a group of adult Spanish speakers
learning English: the double object construction and the clausal complement construction. Spanish
does not share the same double-object construction as English does, but the clausal complement
constructions in the two languages are quite parallel. Verbs that are translational equivalents are
equally (un)acceptable with a tensed clausal complements: For instance, Dan obligó que Helen
juegue, is the word-for-word translation of Dan forced that Helen play tennis and both sentences
are unacceptable. If judgments on English sentences were influenced by transfer from participants’
native Spanish, we would expect more nativelike judgments on the construction that was parallel
in the two languages in comparison to the construction that differed. Yet that was not what was
found: Spanish-speaking learners of English judged unconventional instances of both
constructions equally more leniently in English than native English speakers did.

        In the current work, the target language being learned is Spanish. We explore the possibility
that adult language learners can benefit from statistical preemption if exposure to conventional
formulations is repeated over a series of days. The motivation for including multiple days of
exposure stems, not only from a lack of evidence for an immediate effect of statistical preemption
in prior work, but also from positive findings in word learning. Gaskell & Dunmay (2003), for
instance, taught adults novel words that partially overlapped formally with existing words (e.g.,
cathedruke, which overlaps with cathedral). After participants learned the novel words, the effect
of phonological competition was measured, using a lexical decision task. In this paradigm, longer
reaction times for trials containing familiar words (e.g., cathedral) are interpreted as resulting from
lexical competition from the newly learned words (e.g., cathedruke). Participants recognized the
novel words immediately after exposure, but the anticipated competition effect did not emerge
until after the 4th day of repeated exposure. This, along with other studies comparing immediate
and delayed testing, suggest that lexical competition requires a period of consolidation during sleep
(Dunmay & Gaskell 2007; Gais, Lucas, & Born, 2006; Lindsey & Gaskell 2010; Mattys & Clark
2002). While phonological competition in a lexical decision task differs from the sentence-level
competition in a judgment task, we hypothesize that the competition involves the same or related
processes. Therefore, we use repeated exposure over three days in order to test for an effect of
competition between conventional and unconventional formulations.
        Because the strength of memory depends on experience, one needs sufficient experience
to successfully access a memory in context. Students who are just learning a new language may
have had too little experience to activate competing formulations upon witnessing a particular
utterance. On the other hand, once learners are highly proficient, we can expect them to behave
similarly to native speakers (Navarro-Torres et al., to appear; Robenalt & Goldberg, 2016;
Tachihara & Goldberg, 2020). Thus, we predict that intermediate speakers, who have had enough
experience to activate sentence formulations, but not enough experience to necessarily recognize
competitive relationships, should benefit the most from exposure.
        In addition to the acceptability rating task, we included a self-paced reading task as an
implicit, online measure of sentence processing. Self-paced reading can reveal whether or not
participants are detecting anomalies in real-time as they read each word (Jegerski, 2014). The task
has been used with native speakers and language learners in order to assess their implicit
understanding of sentence structure (for reviews see Clahsen & Felser, 2006; VanPatten & Jegerski,

                                                                                                     5
2010). The reason to include both explicit ratings and implicit reading time measure is that it is
possible that the exposure may only have an impact on implicit knowledge. Or the opposite may
be true: learners may be able to make explicit judgments before implicit measures are influenced.
The language learners in the current study were all recruited from Spanish classrooms, so they
may be especially interested in gaining explicit knowledge (Larsen-Freeman, 2000). Thus, by
including both explicit ratings and the more implicit measure of reading times, we can explore
how explicit judgments and implicit online processing may differ. To summarize, the current work
investigates how adults come to appreciate what is unconventional in the language they are
currently learning: Spanish.
        In what follows, two studies set the stage for our primary manipulation. To foreshadow
results, Experiment 1 confirms that learners differ more from native speakers on judgments of
unconventional sentences than on conventional sentences for 5 different construction types in the
target language of Spanish. We then report a reading-time study that confirms that the stimuli we
assume to be unacceptable in Spanish evoke the expected slow-down in reading times for native
Spanish speakers. Experiment 3 is our key manipulation. Groups of undergraduates learning
classroom Spanish are provided with exposure to one of two sets of fully conventional Spanish
sentences for each of 3 days. On the fourth day, we compare acceptability ratings and reading
times on unconventional paraphrases in order to see whether exposure to conventional sentences
impacts the Spanish learners’ ratings or reading times. We find an effect of exposure in the
judgment data, albeit not in reading times. As predicted, the effect on judgments is focused on
Spanish learners at the intermediate level. We end with a discussion of the difference between
judgments and reading times, the effect of proficiency, the effect of sleep, the importance of
generalizability, and suggestions for future studies.

Preregistration
We preregistered the current studies on AsPredicted.org before data collection and included our
hypotheses, the dependent measures, the data collection process with restrictions on participants
and intended sample sizes, and all statistical analyses unless specified as exploratory. Deviations
from the preregistration are described and explained in the text with additional references to
supporting information (Experiment 1, https://aspredicted.org/J9V_TCN; Experiment 2,
https://aspredicted.org/CJB_BSN; Experiment 3, https://aspredicted.org/KJF_KGC).

Experiment 1
Experiment 1 tests whether native speakers and language learners of Spanish differ systematically
in their acceptability ratings of conventional and unconventional sentences, as has been previously
reported for native and language learners of English (Robenalt & Goldberg, 2016; Tachihara &
Goldberg, 2020).

Method
Participants
70 native Spanish speakers living in Spanish-speaking countries and 70 Spanish learners in the US
were recruited online through Cloud Research (Litman, Robinson, & Abberbock, 2017). For the
second group, participants responded that their “native primary language is English” and “English
is my first language” and rated their proficiency in Spanish to be less than 85 on a scale of 0-100.

Procedure

                                                                                                  6
The consent form was written in English for English-speaking learners of Spanish, and in Spanish
for native Spanish speakers. Then a message explained that all following instructions and questions
would be in Spanish. Participants were instructed to exit the survey without penalty if they did not
know Spanish. Participants rated the acceptability of each sentence on a gradient scale between 0-
100 (100 being fully acceptable). The order of sentence presentation was randomized for each
participant. Participants were provided with two examples to clarify the task: one unacceptable
sentence (A mí me gusto la película, assigned low rating) and one acceptable (Yo vivo aquí,
assigned a high rating).

Stimuli
We created 42 sentences that included 5 types of variation in the constructions used: copula choice,
adjective position, grammatical gender, the double object construction, and the clausal
complement construction (see Table 1). For each construction, half of the sentence stimuli were
conventional sentences and half were unconventional sentences. The first four distinctions are
unique to Spanish while the last is parallel in English and Spanish. Eight sentences were created
for each of the first four constructions, and 10 for the last. The complete list of stimuli is available
in Supporting information 1.

Table 1. Construction types and sample stimuli used in Experiments 1-3
 Construction types Unconventional formulation/                Translation into English
                       Conventional formulation
 1. ser vs. estar         ?La estación de tren es en esta calle.
                          La estación de tren está en esta calle.   “The train station is on this street.”
 2. pre vs post nominal   ?El viejo hermano de Lola es guapo.
 adjectives
                          El hermano viejo de Lola es guapo.        “Lola’s older brother is handsome.”
 3. el vs. la             ?Usamos la mapa para encontrar la
                          casa.
                          Usamos el mapa para encontrar la casa.    “We use the map to find the house.”
 4. double object         ?Estella envió su madre una carta.
                          Estella le envió a su madre una carta.    “Estella sent her mom a letter.”
 5. que vs. a             ?Rafael obligó que ellos irían al cine.
                          Rafael obligó a ellos a ir al cine.       “Rafael forced them to go to the movie.”

Results
Figure 1 displays the descriptive results. As is evident, both native speakers and adult language
learners recognize that conventional sentences are more acceptable than unconventional sentences.
This is confirmed with a linear mixed effects model using judgment scores as the outcome variable,
conventionality as the fixed effect, and maximal converging random effect structure (in this case
random intercepts and slopes for subjects and random intercepts for items (β = 8.38, t = 4.36, p <
0.001). At the same time, the difference between native speakers and learners is larger for
unacceptable sentences than it is for acceptable sentences. Specifically, a linear mixed effects

                                                                                                             7
model was fit to the data with judgment scores as the outcome and Speaker_group and
Conventionality as fixed interacting effects. Random slopes and intercepts were included for
subjects and items. Results show a main effect of Speaker_group (β = -25.94, t = -6.08, p < 0.0001),
a main effect of Conventionality (β = 8.38, t = 7.40, p = 0.003), and most importantly, the predicted
interaction (β = 36.45, t=7.40, p
random intercepts and slopes for subject and item. We find a main effect of Speaker group (β = -
26.77, t = -6.29, p < 0.001) but no main effect of construction (β= 7.48, t = 1.38, p = 0.22) and no
interacting effect of speaker and construction (β = -0.37, t = -0.09, p = 0.93). This means that the
discrepancy in judgments between language learners and native speakers for the double object
construction did not differ from that of the clausal complement. Language learners found it just as
challenging to detect unacceptability in a construction that behaved similarly to their dominant
language as they did for a construction that behaved differently. This suggests that transfer is not
wholly responsible for the higher tolerance of unconventional sentences by language learners.
   Finally, unsurprisingly, we find a negative correlation between judgment scores and self-rated
proficiency of language learners, with more proficient learners aligning more with native speakers
(r = -0.11, p < 0.001).

Experiment 2
A self-paced reading time study was conducted with native Spanish speakers to confirm that the
experimental procedure and the stimuli would work as expected. Based on prior work with self-
paced reading, we expected that participants would show a slow down for unconventional
sentences compared to conventional sentences, demonstrating that they are able to detect
unacceptability during online comprehension (Jegerski, 2014).

Methods
Participants
100 native Spanish speakers living in Spanish-speaking countries were recruited through
CloudResearch, and all are included in the analysis.

Stimuli
The stimuli consisted of a total of 20 unconventional sentences, 20 conventional sentences, and 20
conventional filler sentences. The target sentences were based on the same 5 constructions from
Experiment 1. To shorten the length of the experiment, each participant read half of the target
sentences, with the other half counterbalanced across participants. Thus, each participant saw 10
unconventional sentences, 10 conventional sentences, and 20 conventional filler sentences. Note
that unconventional sentences made up only 25% of the stimuli in an effort to mitigate participants’
expectation of reading unconventional sentences during the task. Some conventional filler
sentences were followed by yes-or-no comprehension questions about the content of the sentence,
used to encourage and assess participants’ attention. The target sentences, filler sentences, and 13
comprehension questions appeared in randomized lists.
    For each sentence, a target region was identified prior to data collection. This included the first
word at which one can detect that an unconventional sentence is unacceptable and the next two
words to allow for possible spillover effects. The target regions for conventional sentences and
unconventional sentences were as close as possible to facilitate comparison. For example, given
the unconventional sentence, La estación de tren es en esta calle, the target region was es + en +
esta. For the conventional sentence, La estación de tren está en esta calle, the target region was
está + en + esta. The target region never included the first or the last word in the sentence. All
stimuli and target regions are available in Supporting information 5.

Procedure

                                                                                                     9
We created the cumulative self-paced reading task on Inquisit 6 (Just, Carpenter & Woolley, 1982).
Words appeared one word at a time as participants pressed the space button. The words remained
on the screen until the end of the trial, such that the whole sentence was visible at the last word.
When each sentence ended, participants clicked a button that appeared on the bottom right corner
of the screen. To familiarize participants with self-paced reading, the first 25% of sentences were
filler sentences.

Results
As expected, native speakers display a small but significant slow-down in the target region of
unconventional sentences. A linear mixed-effects model with log sum of reading time over target
region as the outcome and conventionality as the fixed effect was fit to the data. Random intercepts
for subjects and items were included2 . Native speakers are slower to read the target region in
unconventional sentences compared to conventional sentences (β = -0.13, t = -8.15, p < 0.0001)
(Figure 2). Thus, native speakers can detect (un)acceptability during online sentence
comprehension for this set of sentences using a self-paced reading paradigm.

                          ***

Figure 2: Mean sum of reaction time over target region for unconventional and conventional sentences.
Error bars represent standard error. Circles represent the mean score for each participant.

Experiment 3

2
  Our preregistration specified analysis of only the first word of the target region instead of the sum of the target
region and use a maximal fitting model. The first-word analysis also revealed a significant effect of conventionality
for native speakers (β = -0.04, t = -2.39 p = 0.02). We report the longer time window to maximize the possibility of
finding a slow-down among the language learners. See Supporting information 2 for the original analysis and an
explanation of the random effects structure.

                                                                                                                  10
To find out if Spanish learners show heightened sensitivity to unacceptable target sentences via
statistical preemption, we provided them with multiple days of exposure to conventional
(acceptable) paraphrases of the target sentences. Specifically, conventional sentences are provided
over 3 days to determine whether this leads to better recognition that unconventional sentences are
unacceptable. The same judgment task from Experiment 1 and the self-paced reading task from
Experiment 2 were repeated with participants in Experiment 3.

Methods
Participants
We preregistered a plan to analyze the data of 100 participants enrolled in Spanish classes at
Princeton University. The recruitment of classroom learners served 2 objectives: participants were
active learners of Spanish and their course level was known, providing an objective measure of
proficiency. We recruited participants from all levels of Spanish instruction classes. 128
participants took part in the study during 2 semesters of recruiting; however, only 73 completed
the critical final assessment. All participants were native English speakers or highly proficient
English speakers (all native languages are provided in Supporting information 3). Two participants
were excluded because they indicated that their native language included Spanish and they rated
their Spanish proficiency to be at ceiling. Ten additional participants were excluded for not passing
the preregistered 75% threshold on comprehension questions. Participants were classified as
Beginner, Intermediate, or Advanced, according to the placement test created and scored by the
Spanish Department (specific tracks for the Spanish classes are provided in Supporting
information 4). The breakdown of the final 61 participants by level is listed in Table 2. Their data
is analyzed here.

Table 2: Participants’ proficiency grouped by class level.
 Proficiency                        N
 Beginner (SPA 101, 102, 103) 19
 Intermediate (SPA 105, 107)        19
 Advanced (SPA 108, 200, 300) 23
 Total                              61

Procedure
The experiment was administered on six days within an 8-day window (Table 3). During a pretest,
participants registered for the experiment and responded to a questionnaire about their language
backgrounds. During the initial and final assessments (days 2 and 6), participants read a
combination of conventional and unconventional sentences in a self-paced reading task and
completed a judgment task on the same set of sentences. During the intervening three days of
exposure (days 3-4-5), participants read only conventional sentences, also using a self-paced
reading task. All sessions except the pretest questionnaire included comprehension questions,
which were used to encourage and assess participants’ attention. As mentioned, comprehension
questions served as the preregistered exclusion criterion (75% accuracy required). In an additional
effort to engage participants, we included 16 non-linguistic encouragement gifs (e.g., Jennifer
Lopez clapping) which appeared at random intervals throughout the tasks.

Table 3. Summary of experiment tasks with example stimuli provided on each day.

                                                                                                  11
Day 1   Pretest                Questionnaire             Example stimuli
    Day 2   First assessment       Self-paced reading       ?La estación de tren es en esta calle.
                                   & Judgment               El baño está en ese piso.
    Day 3   Exposure               Self-paced reading       La estación de tren está en esta calle.
    Day 4   Exposure               Self-paced reading       La tienda que le gusta a Daria está en ese bloque.
    Day 5   Exposure               Self-paced reading       El taxí está en la calle incorrecta.
    Day 6   Final assessment       Self-paced reading       ?La estación de tren es en esta calle.
                                   & Judgment               El baño está en ese piso.
                                                            ?La iglesia es en camino a la escuela.

        We divided participants into two subgroups as follows. On days 3-4-5, one group of
participants was exposed to: ser vs. estar; and, prenominal vs. postnominal adjectives. The other
group was exposed to: el vs. la; and que complements vs. a complements. This design allows us
to compare the effect of exposure on particular constructions between subgroups while controlling
for the delay between initial and final assessments.3 All stimuli can be found in SI 5.

Assessments (Days 2 and 6)
The format of the judgment task was the same as used in Experiment 1, and the format of the self-
paced reading task was the same as used in Experiment 2. The initial assessment consisted of 20
unconventional sentences, 20 conventional sentences, and 40 conventional filler sentences. 4
unconventional sentences and 4 conventional sentences collected were double object constructions
and only appeared in the initial assessment since it was not part of the manipulation. The final
assessment included 16 additional novel unconventional sentences based on the same construction
types but including different words. This was done to determine whether any effect of exposure
would generalize beyond the particulars of the sentences witnessed.
      Thus, the final assessment consisted of 32 unconventional sentences, 16 conventional
sentences, and 64 conventional filler sentences. Unconventional sentences made up 25% of the
stimuli, much like in Experiment 2, in order to mitigate participants’ expectations of reading an
unconventional sentence. Due to experimenter error, the judgment task at the final assessment
consisted of a random subset of 40 unconventional and conventional sentences instead of 48. As
in Experiment 2, filler sentences made up the first 25% of the task, so participants can be
familiarized with self-paced reading. For the rest of the task, the order of sentences was
randomized for each participant.

Exposure phase (Days 3-4-5)
During the 3 days of exposure, participants only read conventional sentences. Each day they
witnessed 8 conventional sentences and 8 conventional filler sentences. On the first day of the
exposure, the conventional sentences directly competed with unconventional sentences in the
assessment: the conventional sentences included the same verbs and noun phrases as the
unconventional paraphrases learners had to read and rate. On the other following two days of
exposure, participants read different conventional sentences of the same construction types
(conventional sentences included distinct verbs and noun phrases). (Recall Table 3 for an overview
of the types of sentences participants read each day).

3
 We excluded the double object construction from exposure and final assessment in order to have an even number of
constructions and reduce the amount of time of the experiment.

                                                                                                              12
Results
Judgment results at initial assessment
We first tested whether learners’ ratings distinguished conventional from unconventional
sentences at the initial assessment. As expected, they did, replicating the finding in Experiment 1
as well as prior work on English (Robenalt & Goldberg, 2016; Tachihara & Goldberg, 2020). That
is, participants knew Spanish well enough to assign higher acceptability ratings to conventional
than to unconventional sentences. Specifically, a linear mixed model confirms that conventionality
predicted acceptability judgments for the learners of Spanish, with random intercepts for subjects
and items included, even at the initial assessment (β = 24.15, t = 7.41, p < 0.0001) and at final
assessment (β = 23.65, t = 6.74, p < 0.0001).
         To examine the role of proficiency, we first analyze acceptability scores at the initial
assessment as a function of class level. We ran a mixed-effects model with conventionality and
class as interacting fixed effects and random intercepts for subjects and items. We found a
significant interaction, meaning that as the proficiency increases, the difference in judgment scores
between conventional and unconventional sentences also increases (β = 4.59, t = 6.69, p < 0.0001).
Figure 4. displays each class from lower to higher proficiency. As is visible, the effect was driven
by the unconventional sentences (orange bars); the same model confirms a significant effect of
class on the unconventional sentences, but not for the conventional sentences (unconventional: β
= -3.61, t = -3.93, p = 0.0002; conventional: β = 1.07, t = 1.65, p = 0.11). In other words, as
proficiency increases, judgments for unconventional sentences decrease while judgments on
conventional sentences remain largely unchanged.
Score

Figure 3. Mean acceptability scores for unconventional and conventional sentences by class levels in increasing
proficiency from left to right. Error bars represent standard error. Circles represent the mean score for each
participant.

                                                                                                                  13
Judgment results at final assessment
        Our main aim is to investigate whether exposure to conventional paraphrases impacts
judgments on unconventional sentences. This would constitute evidence of statistical preemption
among L2 learners of Spanish. Results confirm just this: exposure to conventional sentences
lowered learners’ subsequent ratings of unconventional paraphrases at the final assessment. Recall
that which set of constructions were witnessed was counterbalanced across participants, so that we
could directly compare for the effect of exposure while controlling for the delay and general
familiarity with the task. We ran a linear mixed-effects model with judgment scores in the final
assessment as the outcome and exposure as the fixed effect, including random intercepts for
subjects and items. Exposure significantly impacted judgments: participants gave lower ratings to
unconventional paraphrases after repeatedly reading the conventional sentences (β = -3.88, t = -
2.79, p = 0.0053). In other words, we find evidence of learning through statistical preemption in
language learners: reading conventional sentences led participants to rate unconventional
paraphrases to be appropriately less acceptable. To make sure that the effect was not driven by a
single construction, we ran the model with an added random effect of construction type in an
exploratory analysis and again found an effect of exposure (β = -3.90, t = -2.81, p = 0.005). There
was no significant influence of exposure on conventional sentences (β = 0.29, t = 0.16, p =0.87).
        Recall that we had preregistered an expectation that the effect of statistical preemption
would be strongest for the intermediate level leaners. Indeed, the effect of exposure is significant
only for intermediate level learners of Spanish (β = -6.25, t = -3.05, p = 0.002); beginner level (β
= 0.68, t = 0.34, p = 0.74); advanced level (β = -2.11, t = -1.16, p = 0.25). This is displayed in
Figure 4 by course level, which displays judgments for unconventional sentences only, by whether

Figure 4. Mean acceptability scores for unconventional sentences with and without exposure to the conventional
alternative by class levels in increasing proficiency from left to right. Error bars represent standard error. Circles
represent the mean score for each participant.

                                                                                                                         14
participants were exposed to the corresponding conventional formulations or not. Of interest is
whether unconventional sentences with exposure to the corresponding conventional formulations
are rated lower than unconventional sentences without such exposure. As illustrated in Figure 4,
they were, particularly for intermediate-level speakers, as hypothesized.
         Recall that in the final assessments, half of the unconventional sentences that participants
rated had appeared in the initial assessment (e.g., ?La estación de tren es en esta calle), and were
very close paraphrases of the specific conventional sentences the same participants had witnessed
on the first day of exposure (La estación de tren está en esta calle). The other half of the
unconventional sentences in the final assessment were entirely new but involved the same
constructions (?La iglesia es en camino a la escuela). This allows us to investigate how generally
statistical preemption applied: was the effect restricted narrowly to the specific content witnessed
during the exposure, or did it apply to the new unconventional sentences that shared the same type
of conventional paraphrase? To test this, we compared judgment scores for repeated and new
unconventional sentences to see whether the effect of statistical preemption generalized beyond
specific utterances. An exploratory mixed-effects model with judgment scores on the
unconventional sentences in the final assessment as our outcome and repeated vs. new as the fixed
effect was fit to the data, with random intercepts for subjects and items. We find that scores on the
repeated and new unconventional sentences were not distinct (β = -2.12, t = -0.61, p = 0.55),
suggesting that participants generalized the preemptive exposure beyond the specific sentences
they had witnessed.
         We had preregistered a plan to analyze the effect of exposure on the difference in judgment
scores between initial and final assessments for unconventional sentences. The change in judgment
score was calculated by subtracting the initial assessment score from the final assessment score for
each item. Because half of the items in the final assessment were new items that were not tested in
the initial assessment, only the items that were repeated in the initial and the final assessment were
included for the analysis using the change in judgment score. We ran a mixed-effects model with
change in judgment score as the outcome and exposure as the fixed effect on the repeated
unconventional sentences. Random intercepts for subjects and items were included. We did not
find a significant effect of exposure in this case (β = -2.65, t = -1.19, p = 0.23). This suggests not
only that new sentences are comparable to repeated unconventional sentences, but that new
sentences are necessary for the analysis to have sufficient power for a significant effect of exposure.

Self-paced reading task
We found no evidence that language learners slowed down during the key window when reading
unconventional sentences. Specifically, we ran the same mixed-effects model that was used to
demonstrate a slow-down among native Spanish speakers on the same unconventional sentences
in Experiment 2: log sum of reading time over target region as our outcome, conventionality as
the fixed effect, and random intercepts for subjects and items. But adult learners showed no slow-
down either during the initial assessment (β = 0.003, t = 0.031, p = 0.98), nor at the final
assessment (β = 0.019, t = 0.35, p = 0.73) (see SI 6). The same null effect was found when
exposure was taken into account: We ran a mixed-effects model with log sum of reading time over
target region as our outcome and exposure as the fixed effect on the unconventional sentences
from the final assessment, with random intercepts for subjects and items included. Participants did
not slow down when reading unconventional sentences after being exposed to conventional
paraphrases (β = -0.21, t = -1.52 p =0.13) (SI 7). The accuracy on the comprehension questions
of these sentences was high (M = 94.83%), indicating that participants were paying attention and

                                                                                                   15
understood the sentences. Additional analyses for self-paced reading can be found in Supporting
information 8 & 9.

Discussion
Language learners have specific difficulty identifying unconventional sentences as unacceptable,
providing a likely reason that they commonly continue to produce sentences that native speakers
strongly judge to be unacceptable, even at relatively high proficiency levels (Bley-Vroman & Joo,
2001; Bley-Vroman & Yoshinaga, 1992; Hubbard & Hix, 1988; Inagaki, 1997; Martinez- Garcia
& Wulff, 2012; Oh, 2010). In Experiment 1, Spanish learners displayed a larger discrepancy from
native speakers’ on ratings of unconventional sentences than conventional sentences. We saw that
transfer from participants’ dominant language (English) is unlikely to be responsible, since
learners showed the same pattern, regardless of whether the acceptability pattern was the same in
English and Spanish. Experiment 2 confirmed, as expected, that native Spanish speakers recognize
unconventional sentences immediately upon encountering them, as they slowed down in the target
region during the self-paced reading task.
         In Experiment 3, our goal was to determine whether statistical preemption is operative in
adult learners. Indirect support that adult learners can make use of statistical preemption is the
simple fact that awareness of unacceptable sentences in Spanish increases with proficiency (recall
Figure 3). Whereas previous results likewise indicate that L2 learners at the highest proficiency
judge unacceptable sentences much like native speakers do presumably by learning through
statistical preemption, the current work is the first that we know of that manipulates exposure to
statistical preemption across proficiency levels and finds evidence of statistical preemption.
Specifically, results on the judgment task demonstrate that repeated exposure to conventional
sentences helps language learners more fully appreciate that unconventional paraphrases are
markedly unacceptable, particularly at the intermediate level, as predicted. Since judgments were
compared after exposure and dependent on which set of constructions participants had been
exposed to, we can be confident that the exposure to conventional sentences led learners to
ultimately rate unconventional sentences as less acceptable.
         By contrast, the results from the self-paced reading time measure are incongruent with
those of the judgment task. Recall even at the initial assessment, the judgment data revealed
undergraduate learners already recognized some distinction between conventional and
unconventional sentences. And we have seen that exposure led to an even greater distinction
between conventional and unconventional sentences. Yet in the self-paced reading task, learners
showed no slow-down during the initial or final assessment nor any effect of exposure. That is,
comparisons of reading times for conventional and unconventional sentences reveal no evidence
that the undergraduate learners detected any difference between them, despite judgment evidence
that they did.
         The fact that judgment data reveal evidence of a growing awareness of unacceptability
while reading times do not indicate that adult learners, at the levels of proficiency included here,
do not detect unacceptability in real-time but require the extra time afforded in the off-line
judgment task. If detecting unacceptability requires speakers to access an acceptable paraphrase,
we speculate that the lack of a slow-down during online reading may reflect a relative
inaccessibility of conventional paraphrases during real time processing. That is, if it takes learners
more time to access a conventional paraphrase than it does native speakers, they may only notice
that a sentence is unacceptable after it has been completed. Studies using electroencephalogram
(EEG) to measure brain responses of L2 speakers show that language learners sometimes show a

                                                                                                   16
reduced event-related potential (ERP) when reading anomalous sentences compared to native
speakers (Rossi et. at., 2006; Foucart & Frenck-Mestre, 2012). These effects depend on the
proficiency and/or age of acquisition of the participants. These patterns of results are consistent
with the idea that the accessibility of a conventional paraphrase determines whether
unacceptability is detected during online comprehension.
         Alternatively, it may be that recognition of unacceptability requires an explicit task, such
as a judgment task. Because the participants in this study were recruited from language instruction
classes and were actively trying to learn Spanish, there was likely a conscious effort to learn the
(un)acceptability of sentences. Indeed, many of the constructions used in this task, such as ser vs.
estar are explicitly taught in Spanish classes. It is important to note, however, that while the
judgment task itself was explicit, no feedback was provided at any point in the experiment. The
learning evident in the judgment task occurred without explicit feedback, recasts, or corrections.
This means that language learners are able to learn to judge unacceptability through statistical
preemption, a type of indirect negative evidence, but it remains unclear the extent to which it is
recognized in real-time.
         We also find that learning through statistical preemption depended on the proficiency level
of the participant. Specifically, only intermediate-level speakers showed an effect of exposure
when rating the acceptability of unconventional sentences. Successful learning through statistical
preemption requires competition between conventional and unconventional sentences. A learner
must activate sentence structures that differ from what they are currently reading for competition
to take place. The amount of experience with the language can affect how successfully one can
activate sentence structures. Past literature has shown that language learners appear to be less likely
than native speakers to predict upcoming forms (Grüter, Hurtado, Marchman, & Fernald, 2014;
Ito, Martin, & Nieuwland, 2017; Kaan, 2014; Kaan, Dallas, & Wijnen, 2010; Kaan, Kirkham, &
Wijnen, 2016; Lew-Williams & Fernald, 2010; Martin et al., 2013). This suggests that they would
also be less likely to activate sentence formulations that are not in front of them. Because
beginning-level students have had little experience with the language, they likely have trouble
activating sentence formulations and learning unacceptability from competition. Intermediate-
level learners, who are likely to have had enough experience to activate sentence formulations,
were the ones to benefit from exposure to conventional sentences, as we had predicted. Advanced
students were already more closely aligned with native speakers, so there was less possibility for
the exposure to show an effect. Thus, learning unacceptability through a competition mechanism
like statistical preemption may require an optimal level of sentence activations.

Limitations
The current study was designed to be relatively easy to complete and take a short amount of time
to participate. This was done so in an effort to recruit as many students as possible across different
proficiency levels. One limitation was that we were not able to recruit our target sample size of
100, although we were able to recruit equally across proficiency levels. Because the recruitment
periods occurred over two semesters during the COVID-19 pandemic, where classes were only
offered online, it is possible that students were more reluctant to participate in an online study. The
short format of the study also meant that we were limited in the number of sentences in our stimuli.
Future studies with a larger sample and a larger set of sentences would be beneficial because they
would allow for additional analyses comparing proficiency and sentence types.
        We designed our current study such that repeated exposure occurred over days to allow for
memory consolidation to take place during sleep (Dunmay & Gaskell 2007; Gais, Lucas, & Born,

                                                                                                    17
2006; Lindsey & Gaskell 2010; Mattys & Clark 2002). The lack of an effect of statistical
preemption reported by Tachihara & Goldberg (2020) may be due to the fact that only a single
conventional sentence was provided for each unconventional sentence or by the fact that the study
was conducted in a single session, so participants had no opportunity to sleep. The current design
cannot disentangle the effects of repeated exposure and sleep, but future work should investigate
each independently.

Generalizability
Another difference between the single exposure in Tachihara & Goldberg (2020) and the current
study is the variability of conventional sentences used during exposure. Because there were three
separate exposure sessions, learners received exposure to three different conventional sentences of
the same construction. It is possible that this variability helped augment the effect of exposure. We
found that learners are able to generalize within the same construction since they rated
unconventional sentences as unacceptable regardless of whether they were repeated at the initial
and final assessments or were new sentences. Variability of sentences could increase
generalizability and make statistical preemption more effective in learning unacceptability. In
other words, it’s not just that learners can generalize within the same construction, but that they
need the variability in the input for statistical preemption to be effective.
         Generalizability is important in the practical use of statistical preemption as a pedagogical
tool. If learners were only able to reduce acceptability for sentences in which they had read the
exact paraphrase, statistical preemption would not be a viable tool for learning. However, since
learners are able to generalize unacceptability, repeated exposure to the conventional construction
is sufficient to lead to knowledge about the unacceptability of the unconventional construction.

Transfer effect
Transfer effect is a commonly assumed cause of mistakes in language learners. However, findings
from this paper and recent evidence suggest that transfer effects are not wholly responsible for
higher tolerance for unconventional sentences in language learners. Specifically, we tested two
constructions of similar complexity, the double object construction, which behaves differently
between English and Spanish, and the clausal complement construction, which behaves in the same
way between English and Spanish. This means that the same verbs that are novel and unacceptable
in Spanish are also unacceptable in English. Dan obligó que Helen juegue, the literal translation
of Dan forced that Helen play tennis, is unacceptable in both languages. Thus, if one was using
knowledge of their native language, English, and transferring it onto the new language, Spanish,
they should show a higher tolerance for the double object, but not for the clausal complement. Yet,
there was no difference between constructions, supporting the idea that transfer effects are not the
cause of higher tolerance. In Tachihara & Goldberg (2020), we asked native Spanish speakers who
spoke English as a nonnative or less dominant language to judge English sentences in the double
object construction and the clausal complement construction. We found the same results, with
higher tolerance of unacceptable formulations in both constructions and, importantly, no difference
between the two constructions. Taken together, we conclude that transfer effects are not the cause
of the higher acceptability of unacceptable sentences in language learners. Consistent with this
idea, experienced language teachers have long noted that most mistakes are not a result of the
transfer (Borg, 2003).

Reduced sensitivity to competition

                                                                                                   18
You can also read