Curbing Feature Coding: Strictly Local Feature Assignment

Page created by Darryl Riley

Sports

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Curbing Feature Coding: Strictly Local Feature Assignment

Thomas Graf
Department of Linguistics
Stony Brook University
Stony Brook, NY 11794, USA
mail@thomasgraf.net

Abstract that all grammar formalisms “leak” in the sense
that they also allow for some unnatural patterns.
Graf (2017) warns that every syntactic formal- But the extent of the problem for linguistic the-
ism faces a severe overgeneration problem be-
ory has not been fully appreciated. Graf (2017)
cause of the hidden power of subcategoriza-
tion. Any constraint definable in monadic
shows how this strategy can be generalized to flout
second-order logic can be compiled into the all island constraints, enforce constraints that lack
category system so that it is indirectly enforced any notion of locality, and even add highly unnat-
as part of subcategorization. Not only does ural counting requirements to the grammar. Every
this kind of feature coding deprive syntactic constraint that can be defined in monadic second-
proposals of their empirical bite, it also un- order logic is expressible through category refine-
dermines computational efforts to limit syn- ment. This allows for very unnatural constraints,
tactic formalisms via subregular complexity.
e.g. enforcing verb-second word order iff the sen-
This paper presents a subregular solution to
feature coding. Instead of features being a tence contains exactly three relative clauses or
cheap resource that comes for free, features both a Principle A violation and a word in which
must be assigned by a transduction. In par- unbounded tone plateauing is not obeyed. The
ticular, category features must be assigned by only way to preclude this is to restrict the shape
an input strictly local (ISL) tree-to-tree trans- of category systems, but Graf (2017) argues that
duction, defined here for the first time. The the usual linguistic requirements on syntactic cat-
restriction to ISL transductions correctly rules
egories are insufficient. Hence every syntactic for-
out various deviant category systems.
malism lacks a key mechanism to distinguish natu-
ral patterns from unnatural ones, resulting in mas-
1 Introduction
sive overgeneration.
Theoretical and computational linguists both This paper proposes a computational solution to
strive to identify limited models of language that this problem, drawing from recent work on sub-
furnish sufficient power without allowing for ex- regular complexity. Features no longer come part
cessive overgeneration. Recently, Graf (2017) and parcel with lexical items, but must be assigned
noted that the findings of Graf (2011) and Kobele to tree structures by a transduction. An unnatural
(2011) point towards a major loop hole in all cur- feature system that keeps track of, say, a counting
rent theories of syntax. The category system can dependency, requires a very powerful transduc-
be abused to encode additional information about tion. The category systems of natural languages,
the syntactic tree, and the usual subcategorization on the other hand, can be handled by much simpler
requirements can then be used to enforce a certain means. I argue that these category systems only
kind of synchronization between parts of the tree. require inspection of a lexical item’s local con-
For instance, the category DP may be split into text. This intuition is formalized by generalizing
DP[+NPI] and DP[ NPI] depending on whether the input-strictly local (ISL) string-to-string map-
the DP is an NPI, and the category X of each se- pings of Chandlee (2014) to ISL tree-to-tree trans-
lecting head becomes X[+NPI] if the argument it ductions. To the best of my knowledge, this is the
selects contains an unlicensed NPI. This simple first time a subregular tranduction class is defined
strategy has been known for a long time but did for trees, and I hope it will be a fertile vantage
not raise serious concerns as it is widely accepted point for mathematical and empirical work alike.

The paper deviates slightly from the usual struc- Merge

ture. Since the problem is also of interest to theo-
" :: O+ C Merge
retical linguists and the proposed solution is fairly
intuitive, the first half focuses on the big picture
foo :: E+ O Merge
and keeps formal concepts to a minimum (§2).
The mathematical aspects are then worked out in bar :: O+ E Merge
§3, the most important of which is the formal def-
inition of ISL tree transductions (§3.2). bar :: E+ O Merge

2 Problem and Solution: Informal foo :: O+ E bar :: O

Sketch
Figure 1: Derivation tree for foo bar bar foo bar
The power of category systems and subcategoriza-
tion is best illustrated with an example (§2.1). This head, which must always be the last lexical item to
makes it clear what unnatural category systems be merged. The C-head carries the selector feature
may look like, and in what respects they clearly O+ . Overall, G consists of the following lexical
differ from natural ones (§2.2). The problem of items:
category abuse in syntax is actually an instance of
the more general phenomenon of feature coding, (1) MG G with even/odd alternation
which also appears in the domain of subregular " :: O+ C foo :: E+ O foo :: O
complexity (§2.3). But subregular complexity also foo :: O+ E
provides a way of measuring the complexity of bar :: E+ O bar :: O
feature systems via transductions. With strict lim- bar :: O+ E
its on the power of these transductions, many of The MG generates any string over foo and bar
the unnatural category systems are correctly ruled whose length is odd. The reasoning for this is
out (§2.4) while it becomes possible to formulate as follows: the derivation must start with either
new syntactic universals (§2.5). foo :: O or bar :: O . From this point on, se-
lecting heads alternate between E and O , but
2.1 A Grammar with Odd/Even Counting
only a head carrying O can be selected by the C-
Let us start with a toy example from Minimal- head to end the derivation. The end result is that
ist grammars (MGs; Stabler, 1997, 2011) that il- the number of pronounced lexical items in the tree
lustrates the power of syntactic categories. MGs must be odd, as is also illustrated in Fig. 1. The
are closely modeled after Minimalist syntax, and MG above thus instantiates a simple case of mod-
subcategorization is encoded via category and se- ulo counting at the string level.
lector features that drive the operation Merge. A
head with selector feature X+ can only be merged 2.2 (Un)Naturalness of the Example MG
with a phrase whose head has category feature The example grammar G in (1) is highly unnatu-
X . This matching of features is called feature ral in several respects. First of all, string length
checking. The category feature of a lexical item l does not seem to be a relevant criterion for natural
can only be checked once all selector features of language syntax. This definitely holds for mod-
l have been checked. While exceedingly simple, ulo counting, which is unheard of. But even ab-
this system is already too powerful as a model of solute size requirements are hard to come by un-
subcategorization in natural languages. less one abandons the well-motivated competence-
Consider the MG G where the only pronounced performance distinctions. A potential counterex-
lexical items are foo and bar, which may have the ample is Heavy NP-shift, which is sensitive to
category features E or O . By default, the cate- a constituent’s size and thus, possibly, its string
gory feature is O . But if the lexical item carries length. But even here processing provides a more
a selector feature O+ or E+ , the category feature plausible explanation (cf. Liu, 2018). Syntax itself
must be the opposite of that selector feature (E seems to be completely blind to size, be it string
or O , respectively). Hence foo and bar may have length or the size of a tree.
the feature strings O , E+ O , or O+ E . Besides Perhaps even more important is the fact that O
foo and bar, the MG only has an unpronounced C- and E do not convey intrinsic information of the

lexical item l that carries them. Instead, these cat- furnish enough power for phonology. Crucially,
egories represent properties of the whole subtree. though, these claims depend on the choice of fea-
Hence the category is highly context-dependent. If tures because every regular pattern can be made
one wanted to insert another instance of foo or bar subregular by introducing additional features. In
in the subtree headed by l, one would also have to formal terms: every recognizable set is a projec-
change the category of l because of how O and tion of a local set (cf. Rogers, 1997).
E have to alternate. The change of l’s category For instance, the regular string language of odd-
then requires changing the category of l’s selector, length strings over a can be pushed into the ex-
the selector of l’s selector, and so on. This directly tremely weak subclass of strictly 2-local string
contradicts a basic principle of selection: a lexical languages if one introduces a feature [±odd]. A
item selects for its argument, not the argument(s) string like a a a a a would then be represented
of its argument. A verb selecting a PP may restrict as a[+odd] a[ odd] a[+odd] a[ odd] a[+odd].
the shape of the P-head, but not the DP inside the The language with the diacritic [±odd] feature
PP. And no lexical item can freely select any head is strictly 2-local because it can be expressed in
of any category as long as the selected subtree sat- terms of constraints that involve at most two seg-
isfies some other property. Subcategorization en- ments:
forces head-head dependencies, not head-subtree
dependencies, and any category system that allows (2) Strictly 2-local constraints
the latter to be reduced to the former is missing a a. Every string must start with a[+odd]
key aspect of natural language. and end with a[+odd].
b. a[+odd] must not follow a[+odd].
2.3 The Full Extent of the Problem c. a[ odd] must not follow a[ odd].
As was already mentioned in the introduction, the The example in §2.1 is a syntactic analog of this
example above is but the tip of the iceberg. With- trick, with O and E filling the roles of [±odd].
out restrictions on the category system, any ar- In all these cases, feature coding obfuscates sub-
bitrary constraint can be enforced as long as it regular complexity by precompiling complex de-
is definable in monadic second-order logic. Graf pendencies into an invisible alphabet of features
(2017, p. 22–24, p. 27f) gives several illustrative and diacritics.
examples of overgeneration and explains in de- The feature coding problem is less severe in
tail why the usual heuristics (e.g. syntactic dis- subregular phonology thanks to the restriction to
tribution, morphological inflection) are not suf- articulatory features, which can usually be re-
ficient to distinguish natural from unnatural cat- placed by the actual segments without changing
egory systems. Beyond modulo counting, this anything substantive about the analysis.1 In syn-
kind of feature coding also allows for, among tax, features play a much more vital role as two
other things, strange constraint interactions (“Sat- representations may look exactly the same except
isfy either verb-second or Principle A, but not for their feature make-up.
both”), symmetric counterparts of existing con- For instance, Fig. 2 gives an MG dependency
straints (Reverse Principle A: every reflexive must tree representation for the gardeners water their
c-command a suitable R-expression), and dis- flowers, while adding the movement features top
placement mechanisms that do not use movement to their and top+ to water yields the MG depen-
and hence bypass island constraints. All of this be- dency tree representation of the very different top-
comes possible because feature coding abuses cat- icalization sentence their flowers, the gardeners
egories as a local buffer for non-local information, water. The movement features are an essential
erasing all locality and complexity differences be- part of the representation. Similarly, category and
tween constraints. selector features can be crucial for head-argument
The potential abuse of syntactic categories is ac- relations in MG derivation trees. In Fig. 1, switch-
tually an instance of a more general problem that ing the feature strings of the bottom-most foo and
has to be carefully avoided in subregular phonol-
1
ogy. Subregular phonology (see Heinz 2018 and One notable exception is Baek (2018). She adds a limited
number of structural features to define a subregular class that
references therein) has identified very restricted lies strictly between the classes TSL (Heinz et al., 2011) and
subclasses of the regular string languages that still ITSL (De Santo and Graf, 2019).

bar would yield a new string foo bar bar bar                   water :: D+ D+ V
                                                                                         ISL assignment
                                                                                                                water

foo. This is because derivation trees encode head-
                                                          the :: N+ D    their :: N+ D                    the           their
argument relations only via Merge features, not
via dominance or linear order. It is not surprising,                                       removal
                                                        gardeners :: N   flowers :: D                gardeners      flowers
then, that all the recent work extending the subreg-
ular perspective from phonology to syntax relies        Figure 2: Feature assignment as a transduction problem
on feature in one way or another (Graf, 2018; Graf      between a feature-annotated MG dependency tree (left)
and Shafiei, 2019; Graf and De Santo, 2019; Vu,         and its feature-free counterpart (right)
2018; Vu et al., 2019).
   But even if features could be done away with,
                                                        given dependency would be to enforce over these
that would be too extreme a step as they can still
                                                        representations. In order to assess the complex-
be useful. Consider once more the case of topical-
                                                        ity of feature systems, we have to decouple them
ization movement. This involves three computa-
                                                        from the representations. Intuitively speaking, we
tional steps: I) identifying the mover and the target
                                                        want to measure the complexity of constructing a
site, II) determining whether topicalization move-
                                                        feature-annotated representation from its feature-
ment is licit, and III) displacing the topicalized
                                                        free counterpart.
phrase. Without features, the first two steps would
                                                           Formally, this takes the shape of a transduction
have to be handled by the same computational de-
                                                        problem. For strings, transductions are a formal
vice, which first makes a non-deterministic choice
                                                        counterpart of rewrite rules, and for trees they are
as to what should move where, and then decides
                                                        similar to syntactic transformations in the sense of
whether this instance of movement obeys all rel-
                                                        Chomsky (1965). Chandlee (2014) defines a par-
evant constraints. By making features an integral
                                                        ticularly weak kind of string transductions known
part of the representation, we factor out the first
                                                        as input strictly local (ISL). An ISL transduction
step in order to isolate the complexity of the sec-
                                                        considers only the local context of a symbol when
ond step. But without a restrictive theory of fea-
                                                        deciding how it should be rewritten. Word-final
tures, there is the risk of factoring out more than
                                                        devoicing and intervocalic voicing are examples
intended. This would lead to misleading claims
                                                        of ISL transductions in phonology, whereas long-
about subregular complexity that are merely ar-
                                                        distance sibilant harmony would not be ISL be-
tifacts of feature coding. Subregular syntax thus
                                                        cause the rewriting of a sibilant can depend on
finds itself in a precarious situation where the very
                                                        other segments that are arbitrarily far away. ISL
thing it depends on also threatens to undermine all
                                                        can be lifted from strings to trees: a node in a tree
its findings.
                                                        may be rewritten in various ways depending on its
   The original problem of syntactic categories         local context in the tree. A transduction is ISL-k
thus is but a piece of the larger puzzle of how         iff all local contexts can be limited to at most k lev-
to avoid feature coding. The brute force solution       els (a mother-daughter configuration, for example,
of shunning features altogether is not workable         involves two levels).
in syntax. Features distinguish otherwise identi-
                                                           Figure 2 illustrates the approach with a feature-
cal representations; theoretical and computational
                                                        annotated MG dependency tree for the garden-
linguists alike are too accustomed to thinking in
                                                        ers water their flowers. The question at hand is
terms of features; and features do allow for in-
                                                        whether the familiar categories D, N, and V can
sightful factorizations of complexity. The problem
                                                        be assigned by an ISL transduction. We take a
is not features as such, it is the lack of a measur-
                                                        feature-annotated representation like the one of
ing rod for how much complexity has been shifted
                                                        the left and remove all category and selector fea-
into the feature system.
                                                        tures. Then we have to define an ISL transduction
                                                        that takes us back to the original representation. If
2.4   Solution: Strictly Local Feature
                                                        this can be done with any well-formed tree, then
      Assignment
                                                        the whole feature system is ISL recoverable.
Features come for free under current models of             For the specific tree in Fig. 2, we need an ISL-
complexity because they are representational de-        2 transduction. The feature annotations for the,
vices. Subregular complexity takes the representa-      their, and gardeners can be recovered without any
tions for granted and then investigates how hard a      further context information just from the phonetic

exponents. That is the case because there simply " :: O+ E " " :: O+ E

are no alternative feature annotations for these lex- bar :: E+ O bar bar :: E+ O
ical items in English. With water and flowers, on m k m k
m0
the other hand, there is ambiguity as each one of m
bar :: O+ E bar :: O+ E
them could be either a noun or a verb. But in both bar
cases a minimum amount of context is sufficient foo :: E+ O bar :: E+ O
foo
to disambiguate their categories. Since flowers is bar :: O+ E foo :: O+ E
selected by the, which can only be a determiner, bar
bar :: E+ O bar :: E+ O
flowers must be a noun. Similarly, water must be n0 n
a verb because it selects the and their, neither one n k n k

of which could be an argument of the noun water. bar :: O+ E bar bar :: O+ E

Inspecting the daughters or the mother of a node bar :: O bar bar :: O
requires a context with two levels, so the transduc-
tion is ISL-2 for this specific example. The com- Figure 3: Modulo systems are not ISL-k recoverable
plexity of the whole feature system corresponds to
the weakest transduction that works for all well-
Note first that the value of k can vary depending
formed trees (usually there will be infinitely many
on other assumptions. MG derivation trees like the
of those; therefore, conclusive complexity results
one in Fig. 1 display a greater distance between
require proofs rather than examples).
heads and arguments than MG dependency trees
The feature system of the MG G in Sec. 2.1 is like the one in Fig. 2, so the latter will minimize
not ISL recoverable. This follows from the fact the value for k. This does not mean that the latter
that it is not ISL-k recoverable for any k 1. For is linguistically preferable, but rather that k cannot
the sake of simplicity, we will once again use a be fixed independently of the choice of representa-
dependency tree format as in Fig. 2 instead of the tion. The value of k will also depend greatly on the
derivation tree format in Fig. 1. Now suppose that shape of the phonetic exponents. Fully inflected
the features for the left tree in Fig. 3 could be cor- forms can provide crucial clues about a lexical
rectly assigned from the middle tree by an ISL- item’s category that would be missing from the un-
k transduction. Since the transduction is ISL-k, inflected roots postulated in Distributed morphol-
the features assigned to foo depend exclusively on ogy (Halle and Marantz, 1993). It remains to be
some context with at most k levels. Crucially, foo seen which set of assumptions and parameters will
will always receive the same features as long as prove most insightful.
the context remains the same. But now compare At this point, though, I put forward a maximally
this to the tree on the right. Here foo has switched restrictive conjecture. Based on a preliminary sur-
positions with bar below it, inducing a change in vey of English data and the linguistic bon mot that
its feature make-up. Yet the locally bounded con- heads do not select for arguments of arguments, I
text for foo has not changed at all — the middle contend that the category systems of natural lan-
tree could also be a description for the right tree guages are maximally simple:
depending on the values of m0 k and n0 k.
Hence the feature annotation for foo varies despite (3) Complexity of category systems
identical contexts, which proves that the feature Given MG dependency trees with unin-
p
system is not ISL recoverable. In fact, no ISL flected roots as exponents (e.g. destroy,
p
transduction can handle any feature system that in- water), it holds for every natural lan-
volves modulo counting. guage that all its category and selector fea-
tures are ISL-2 recoverable.
2.5 Some Linguistic Implications
The conjecture in (3) predicts that whenever a lex-
ISL recoverability correctly rules out some of the ical item is categorially ambiguous, its category
most egregious patterns and constraints. But we feature can be determined by inspecting the select-
can try to further limit feature systems based on ing head or the heads of the selected arguments.
the size of contexts. Instead of ISL recoverabil- Even if ISL-2 recoverability ultimately turns out
ity, the relevant restriction would be ISL-k recov- to be too strong an assumption, ISL-k recoverabil-
erability for some small k. ity still rules out many undesirable feature systems

and reins in feature coding while allowing for lim- 3 Formal Definitions
ited categorial ambiguity.
This section puts the informal discussion of the
ISL recoverability also has some more indirect
preceding section on a formal footing by defining
consequences. One prediction is that no natural
ISL recoverability in terms of ISL tree relabelings.
language can have an arbitrarily long sequence
But in order to simplify future work on feature re-
x1 , . . . , xn such that I) each xi is an empty head,
coverability, I define the more general class of ISL
and II) xi selects xi+1 and nothing else (1  i <
tree transductions, which ISL tree relabelings are
n). This prediction follows from the fact that un-
a particular simple subtype of. The definition of
pronounced lexical items provide no overt clues
ISL tree transductions differs markedly from that
about their category. If the local context does
of other tree transductions. Building on Gorn do-
not furnish any pronounced material, local cat-
mains and tree contexts (§3.1), I define an ISL tree
egory inference hinges on structural differences.
transducer as a finite set of triples, each one of
Since the configuration above is structurally uni-
which maps a node n to a tree context based on
form, there is insufficient information to correctly
the configuration n appears in. The ISL transduc-
infer the categories of all empty heads. This case
tion then combines all these tree contexts to yield
is interesting because of the proliferation of empty
the final output tree (§3.2). Given this formal ap-
heads in Minimalist syntax. If there is any clear
paratus, feature recoverability is easy to state in
counterexample to (3), it is likely to involve empty
rigorous terms (§3.3).
heads.
It should also be noted that ISL recoverabil- 3.1 Technical Preliminaries
ity is only expected to hold for category and se- We define trees as finite, labeled Gorn domains
lector features. Features that participate in long- (Gorn, 1967). First note, though, that we use N to
distance dependencies like movement cannot be denote the set of all positive natural numbers, i.e.
reliably assigned by an ISL transduction.2 Con- {1, 2, 3, . . .} rather than {0, 1, 2, 3, . . .} — this is
sider once more our topicalization example from non-standard, but will slightly simplify the usage
before. Whether water should receive a top+ fea- of indices in the definition of ISL-k transducers.
ture to license topicalization depends on whether A Gorn domain D is a set of strings drawn from
there is some head with a matching top feature ⇤
N , which are called (Gorn) addresses, or simply
to undergo topicalization. In the case at hand, this nodes. Every Gorn domain must satisfy two clo-
can be made based on the local context alone. But sure properties: for all u 2 N⇤ and 1  i  j
in general, a mover can be arbitrarily far away it holds that uj 2 D implies both u 2 D and
from its target site, as in this author, John thinks ui 2 D. This entails the inclusion of the empty
that Bill said that Mary really adores. Correct as- string ", which denotes the root. Addresses are in-
signment of top+ thus requires a context of un- terpreted such that u immediately dominates each
bounded size, which is impossible with ISL trans- ui, and each ui is the immediate left sibling of
ductions. u(i + 1).
Many empirical and theoretical issues remain to A ⌃-tree is a pair t := hD, `i where D is a finite
be settled. The MG corpus of Torr (2017) may Gorn domain and ` : D ! ⌃ is a total function
provide valuable clues about the feasibility of con- that maps each address to its label, i.e. a member
jecture (3), but it must be supplemented by a broad of the alphabet ⌃. The depth of t is equivalent to
range of typological data. On the formal side, the length of the longest Gorn address.
studying the recoverability of movement features A (⌃, n)-context is a ⌃-tree whose leaf
will require more powerful extensions of ISL tree nodes may also have labels drawn from the set
transductions. The next section fully formalizes {⇤1 , . . . , ⇤n } of ports, which must be disjoint
ISL transductions to provide a suitable vantage from ⌃. Suppose we are given a (⌃, n)-context
point for this future work. c := hDc , `c i with m  n ports labeled ⇤i at ad-
2
In grammars with adjunction, subcategorization can also dresses a1 , . . . , am , as well as a tree (or context)
become a long-distance dependency depending on one’s s := hDs , `s i. Then we use c[⇤i s] to denote
choice of representation (Graf, 2018). A modified version
of (3) would predict that the uninflected root of each adjunct the result of substituting s for each ⇤i in c. This is
still provides enough information to reliably infer category a new tree t := hD, `i such that
and selector features. I am much more skeptical that this will
turn out to be true across all languages. • D := Dc [ {aj d | 1  j  m, d 2 Ds }, and

• for every b 2 D                                               chronous). This is subsequently limited to the spe-
                   8                                               cial case of ISL relabelings, which are the formal
                   >
                   
                   :
                       `c (b)   otherwise                          1, an ISL-k tree transducer from ⌃-trees to ⌦-
                                                                   trees is a finite set ⌧ of ISL-k rewrite rules hs, a, ti,
   The construction also generalizes to multiple si-               where
multaneous substitutions, as in c[⇤i                      s, ⇤j      • s is a ⌃-tree of depth i < k,
t]. If c contains no node labeled ⇤i , then c[⇤i
s, ⇤j      t] = c[⇤j            t] (and c[ ] = c).                   • a is a node (i.e. a Gorn address) of s with
   If S is a set, then substitution can apply                          d 0 daughters,
in two ways. With synchronous substitution,
t[⇤i         S] := {t[⇤i                 s] | s 2 S}. Asyn-          • and t is an (⌦, d)-context.                        y
chronous substitution, denoted t[⇤i ( S], yields
{t[⇤i1       s 1 , . . . , ⇤ in     sn ] | s1 , . . . , sn 2 S},   Definition 2 (Synchronous ISL transduction).
assuming that t contains exactly n occurrences                     The transduction realized by an ISL-k transducer
of ⇤i . Substitution with sets and multiple si-                    in synchronous mode is defined in a recursive
multaneous substitutions will be crucial for ISL                   fashion. First, a node b in tree u can be targeted
transductions.                                                     by an ISL-k context hs, a, ti iff there is some
                                                                   p 2 N⇤ such that
3.2    ISL Transductions
                                                                   node match b = pa, and
Chandlee (2014) defines ISL string-to-string map-
pings in terms of deterministic, finite-state string-              label match for all nodes g of s, `s (g) = `u (pg),
to-string transducers. Even though the definition
does not provide an explicit look-ahead compo-                     full-width match for all nodes gi of s with g 2
nent, ISL mappings can emulate finitely bounded                         N⇤ and i 2 N , if pgj is a node of u (j > i),
look-ahead via a delayed-output strategy. Sup-                          then gj is a node of s.
pose, for instance, that a is rewritten as b before
                                                                   Now suppose furthermore that n in u has d             0
d, as c before e, and just as a before f . This is em-
                                                                   daughters. Given an ISL-k tree transducer ⌧ , we
ulated by deleting a and rewriting the next symbol
                                                                   use ⌧ (u, b) to denote the set of all trees t[⇤1
as either bd, ce, or af . Later works define ISL
                                                                    ⌧ (u, b1), . . . , ⇤d    ⌧ (u, bd)] such that there is
functions in terms of local contexts (not to be con-
                                                                   a rewrite rule hs, a, ti in ⌧ that targets node b in
fused with (⌃, n)-contexts), and those definitions
                                                                   u. If this set is empty, ⌧ (u, b) is undefined. For
make look-ahead a standard component to sim-
                                                                   any ⌃-tree t, we may simply write ⌧ (t) instead
plify practical work (Chandlee and Heinz, 2018;
                                                                   of ⌧ (t, "). For any tree language L, the trans-
Graf and Mayer, 2018; De Santo and Graf, 2019).
                                                                   duction computed by ⌧ in synchronous mode is
   With tree transducers, the emulation of finitely
                                                                    ⌧ (L) := {hi, oi | i 2 L, o 2 ⌧ (i)}. A transduc-
bounded look-ahead is a much more complex af-
                                                                   tion is synchronous input strictly k-local (sISL-k)
fair that depends on various parameters such as di-
                                                                   iff it can be computed by some ISL-k transducer in
rectionality (top-down or bottom-up), totality, and
                                                                   synchronous mode. It is synchronous input strictly
determinism. For this reason, I explicitly add fi-
                                                                   local (sISL) iff it is sISL-k for some k 1.           y
nite look-ahead in the subsequent definitions. I
will also allow for non-determinism as future work                    The definition of asynchronous input strictly k-
may require transductions than can handle option-                  local (aISL-k) transductions is exactly the same,
                                                                                                   (             (
ality (e.g. whether a node should receive a move-                  except that ⌧ is replaced by ⌧ such that ⌧ (u, b)
                                                                                                 (
ment feature to undergo topicalization).                           denotes the set t[⇤1 ( ⌧ (u, b1), . . . , ⇤d (
                                                                   (
   For the sake of generality and as a starting point              ⌧ (u, bd)]. ISL is used as a shorthand for sISL or
for future work, I first define a version of ISL tree              aISL, ignoring transduction mode.
transductions that allows for non-determinism,                        The definition of ISL transductions differs from
deletion, and copying, and that can run in two dif-                that of other tree transductions in that the input
ferent modes of operation (synchronous or asyn-                    tree is not altered incrementally to yield the output

tree. Instead, each node in the input contributes A structure preserving ISL transducer never
a context to the output, or rather, a range of pos- changes the structure of the input tree. Structure
sible contexts in the case of a non-deterministic preservation thus entails linearity, which is why
transduction. The transduction then stitches these the latter is not mentioned in the definition of rela-
contexts together in order to arrive at a single belings. Linearity in turn removes the distinction
tree structure. This stitching is accomplished by between synchronous and asynchronous mode as
the recursive step of mapping ⌧ (u, b) to t[⇤1 no ⇤i ever has more than one occurrence. Only
⌧ (u, b1), . . . , ⇤d ⌧ (u, bd)]. Each ⌧ (u, bi) (1  this very limited type of ISL transducers is rele-
i  d) corresponds to a context produced from the vant for feature recoverability.
i-th daughter of the node b in u, and these con-
texts are inserted into the appropriate ports of the 3.3 ISL Feature Recoverability
context t produced from n. If n is a leaf node, its We are finally in a position to define the notion of
output structure is a tree instead of a context. This ISL recoverability that was informally discussed
ensures that the recursion step terminates eventu- in §2.4. In order to clearly separate features from
ally. other parts of the alphabet, we have to track them
in a separate component. MGs make this split
Example. Figure 4 specifies a fragment of an ISL-
fully explicit, with a lexical item’s phonetic expo-
3 transducer for translating multiplication trees to
nent a member of ⌃ and their feature annotation
addition trees (assuming no numbers larger than
a string over an entirely separate set of features.
3). For simplicity, the ISL rewrite rules are writ-
Other formalisms such as TAG or GPSG can also
ten in a context-free format with a box around
be recast along these lines.
the node to be rewritten. An underscore is used
to match any arbitrary node label. On the right, Definition 4. Let F be a set of features. An F -
a particular input-output mapping is shown with annotated ⌃-tree is a tree whose labels are drawn
the transducer running in asynchronous mode. In from ⌃ ⇥ F ⇤ . y
synchronous mode, all ⇤i in the output of rule G Definition 5. Let F be a set of features and e a
would have to be replaced by the same tree. y function that maps each h , f i 2 ⌃ ⇥ F ⇤ to .
It is easy to see that every ISL-k string trans- Then F is ISL-k recoverable with respect to lan-
duction is an ISL-k tree transduction over unary guage L of F -annotated ⌃-trees iff there is an ISL-
branching trees. This shows that ISL-k tree trans- k transducer ⌧ such that ⌧ (e(t)) = t for all t 2 L.y
ducers are a natural generalization of ISL for Note that feature recoverability can vary de-
strings. However, the current definition goes far pending on the particulars of the tree languages. A
beyond ISL string mappings in that it allows for feature that may not be recoverable with respect to
non-determinism and copying. L may be recoverable with respect to L0 . Consider
once more the grammar in §2.1. If foo always had
Definition 3 (Transducer subtypes). An ISL-k
to carry O , and bar always had to carry E , then
tree transducer ⌧ is
those category features would be recoverable even
deterministic iff it holds for every ⌃-tree u that though they still encode an even/odd alternation.
no node of u can be targeted by more than In this hypothetical scenario, the alternation is tied
one context of ⌧ , to overt exponents, reducing modulo counting to a
strictly 2-local alternation of lexical items. In the
linear/non-deleting iff all contexts hs, a, ti of ⌧ other direction, even the simplest (non-trivial) cat-
are such that if the node at address a in s has egory system cannot be recovered from a language
d 1 daughters, then t contains every port where all lexical items are unpronounced. And as
⇤i at most once/at least once (1  i  d), a reviewer correctly points out, if one puts no re-
strictions on the use of empty heads, features can
structure preserving iff all rewrite rules hs, a, ti be encoded in terms of specific structural config-
of ⌧ are such that t is of the form urations with empty heads. Feature recoverabil-
!(⇤1 , . . . , ⇤d ) (! 2 ⌦). ity thus is a fluid notion that depends equally on
the nature of ⌃, the syntactic assumptions about
A deterministic, structure preserving ISL-k structure and phonetic exponents, and the overall
transducer is called an ISL-k relabeling. y complexity of the tree language.

A)   ⇥/+         )           ⇤1   E)       ⇥       )        ⇤1                          Input               Workspace                  Output

             _                                 •                                                ⇥               A ⇤1                           +

                                           _       1                                            •               G +                        2       +
    B)       +       )       +        F)       ⇥       )        +
                                                                                            ⇥       3       ⇤1                   +             2       +
             •       ⇤1          ⇤1            •       ⇤1           ⇤1
                                                                                            •                               ⇤1       ⇤1            1       1
         _       _                         _       2
                                                                                                                    A
    C)       ⇥       )       2        G)       ⇥       )        +                       1       2                           ⇤1

             •                                 •           ⇤1        +                                      C
                                                                                                                    2 F           +

         1       2                         _       3           ⇤1        ⇤1
                                                                                                                             ⇤1       ⇤1
    D)   1       )       1            H)   2       )       2
                                                                                                            I                D
                                      I)   3       )       3                                            H
                                                                                                                2       3             1

Figure 4: A non-deterministic ISL-3 tree transducer (left) for converting a tree with addition and multiplication
to addition only (right). The workspace depicts I) how each node is rewritten as one or more contexts, and II) all
possible options for combining these contexts into a particular output tree via substitution steps that are based on
the structure of the input tree. Dashed arrows are annotated with the corresponding rewrite rules. The transducer
is assumed to operate in asynchronous mode. The output displays one out of 23 = 8 options that differ in when
and where rewrite rules C and F are used. In synchronous mode, there would be only two distinct outputs.

4    Conclusion                                                               of transduction, possibly based on the string class
                                                                              TSL (Heinz et al., 2011). There also seems to
ISL tree transductions — or more precisely, ISL                               be a deep connection between feature recoverabil-
tree relabelings — offer a reasonable approxima-                              ity and the notion of inessential features (Kracht,
tion of the limits of category systems in natural                             1997; Tiede, 2008).
languages. I conjecture that all natural languages                               From a linguistic perspective, one pressing
are such that the category of a lexical item can be                           question is to what extent feature recoverability
inferred from its local context in a tree without any                         depends on whether syntax uses fully inflected
feature annotations. In combination with standard                             lexical forms or underspecified roots. If fully in-
assumptions about linguistic structure, feature re-                           flected lexical items do not reduce the complex-
coverability is a powerful restriction that elimi-                            ity of the ISL transduction, or allows for unnat-
nates many of the undesirable cases of feature cod-                           ural constraints that would not be possible other-
ing identified in Graf (2017). It also makes strong                           wise, that would be a powerful argument that syn-
empirical predictions that merit further investiga-                           tax indeed has no need for anything beyond simple
tion by linguists.                                                            roots.
   Many questions had to remain open. On
the formal side, this includes abstract character-                            Acknowledgments
izations as well as core properties of ISL tree
transductions, e.g. (non-)closure under intersec-                             The work reported in this paper was supported
tion, union, and composition. The relations to                                by the National Science Foundation under Grant
other transduction classes are largely unknown.                               No. BCS-1845344. This paper benefited greatly
I conjecture that (deterministic) synchronous/                                from the feedback of Jeffrey Heinz, Dakotah Lam-
asynchronous ISL transductions are subsumed by                                bert, and three anonymous reviewers. I am in-
(deterministic) bottom-up/top-down transductions                              debted to the participants of the University of
with finite look-ahead. Linear ISL transductions                              Tromsø’s workshop Thirty Million Theories of
should be subsumed by both. The movement fea-                                 Syntactic Features, which lit the initial spark that
tures of MGs will require a more powerful kind                                grew into the ideas reported here.

References                                                  Jeffrey Heinz, Chetan Rawal, and Herbert G. Tanner.
                                                               2011. Tier-based strictly local constraints in phonol-
Hyunah Baek. 2018. Computational representation of             ogy. In Proceedings of the 49th Annual Meeting
  unbounded stress: Tiers with structural features. In         of the Association for Computational Linguistics,
  Proceedings of CLS 53, pages 13–24.                          pages 58–64.
Jane Chandlee. 2014. Strictly Local Phonological Pro-       Gregory M. Kobele. 2011. Minimalist tree languages
   cesses. Ph.D. thesis, University of Delaware.              are closed under intersection with recognizable tree
Jane Chandlee and Jeffrey Heinz. 2018. Strict locality        languages. In LACL 2011, volume 6736 of Lecture
   and phonological maps. Linguistic Inquiry, 49:23–          Notes in Artificial Intelligence, pages 129–144.
   60.                                                      Marcus Kracht. 1997. Inessential features. In Alain
Noam Chomsky. 1965. Aspects of the Theory of Syn-            Lecomte, F. Lamarche, and G. Perrier, editors, Logi-
  tax. MIT Press, Cambridge, MA.                             cal Aspects of Computational Linguistics, pages 43–
                                                             62. Springer, Berlin.
Aniello De Santo and Thomas Graf. 2019. Struc-
  ture sensitive tier projection: Applications and for-     Lei Liu. 2018. Minimalist parsing of heavy NP shift.
  mal properties. In Formal Grammar, pages 35–50,             In Proceedings of the 32nd Pacific Asia Confer-
  Berlin, Heidelberg. Springer.                               ence on Language, Information and Computation
                                                              (PACLIC 32). Association for Computational Lin-
Saul Gorn. 1967. Explicit definitions and linguis-            guistics.
  tic dominoes. In Systems and Computer Science,
  Proceedings of the Conference held at University          James Rogers. 1997. Strict LT2 : Regular :: Local :
  of Western Ontario, 1965, Toronto. University of            Recognizable. In Logical Aspects of Computational
  Toronto Press.                                              Linguistics: First International Conference, LACL
                                                              ’96 (Selected Papers), volume 1328 of Lectures
Thomas Graf. 2011. Closure properties of Minimal-             Notes in Computer Science/Lectures Notes in Arti-
  ist derivation tree languages. In LACL 2011, vol-           ficial Intelligence, pages 366–385. Springer.
  ume 6736 of Lecture Notes in Artificial Intelligence,
  pages 96–111, Heidelberg. Springer.                       Edward P. Stabler. 1997. Derivational Minimalism. In
                                                              Christian Retoré, editor, Logical Aspects of Compu-
Thomas Graf. 2017. A computational guide to the di-           tational Linguistics, volume 1328 of Lecture Notes
  chotomy of features and constraints. Glossa, 2:1–           in Computer Science, pages 68–95. Springer, Berlin.
  36.
                                                            Edward P. Stabler. 2011. Computational perspectives
Thomas Graf. 2018. Why movement comes for free                on Minimalism. In Cedric Boeckx, editor, Oxford
  once you have adjunction. In Proceedings of CLS             Handbook of Linguistic Minimalism, pages 617–
  53, pages 117–136.                                          643. Oxford University Press, Oxford.
Thomas Graf and Aniello De Santo. 2019. Sensing tree        Hans-Jörg Tiede. 2008. Inessential features, inelim-
  automata as a model of syntactic dependencies. In           inable features, and modal logics for model theoretic
  Proceedings of the 16th Meeting on the Mathematics          syntax. Journal of Logic, Language and Informa-
  of Language, pages 12–26, Toronto, Canada. Asso-            tion, 17:217–227.
  ciation for Computational Linguistics.
                                                            John Torr. 2017. Autobank: a semi-automatic anno-
Thomas Graf and Connor Mayer. 2018. Sanskrit n-               tation tool for developing deep Minimalist gram-
  retroflexion is input-output tier-based strictly local.     mar treebanks. In Proceedings of the Demonstra-
  In Proceedings of SIGMORPHON 2018, pages 151–               tions at the 15th Conference of the European Chap-
  160.                                                        ter of the Association for Computational Linguistics,
                                                              pages 81–86.
Thomas Graf and Nazila Shafiei. 2019. C-command
  dependencies as TSL string constraints. In Proceed-       Mai Ha Vu. 2018. Towards a formal description of
  ings of the Society for Computation in Linguistics         NPI-licensing patterns. In Proceedings of the Soci-
  (SCiL) 2019, pages 205–215.                                ety for Computation in Linguistics, volume 1, pages
                                                             154–163.
Morris Halle and Alec Marantz. 1993. Distributed
 morphology and the pieces of inflection. In Ken            Mai Ha Vu, Nazila Shafiei, and Thomas Graf. 2019.
 Hale and Samuel J. Keyser, editors, The view from           Case assignment in TSL syntax: A case study. In
 building 20, pages 111–176. MIT Press, Cambridge,           Proceedings of the Society for Computation in Lin-
 MA.                                                         guistics (SCiL) 2019, pages 267–276.
Jeffrey Heinz. 2018. The computational nature of
   phonological generalizations. In Larry Hyman and
   Frank Plank, editors, Phonological Typology, Pho-
   netics and Phonology, chapter 5, pages 126–195.
   Mouton De Gruyter.

You can also read