Zero-Shot Cross-lingual Semantic Parsing

Page created by Bertha Mcbride
 
CONTINUE READING
Zero-Shot Cross-lingual Semantic Parsing

                                                                      Tom Sherborne and Mirella Lapata
                                                                  School of Informatics, University of Edinburgh
                                                             tom.sherborne@ed.ac.uk, mlap@inf.ed.ac.uk

                                                               Abstract                          Liang, 2016; Dong and Lapata, 2016) with further
                                                                                                 modeling advances in multi-stage decoding (Dong
                                             Recent work in crosslingual semantic parsing
                                             has successfully applied machine translation        and Lapata, 2018; Guo et al., 2019), schema linking
                                             to localize accurate parsing to new languages.      (Shaw et al., 2019; Wang et al., 2019) and grammar
arXiv:2104.07554v1 [cs.CL] 15 Apr 2021

                                             However, these advances assume access to            based decoding (Yin and Neubig, 2017; Lin et al.,
                                             high-quality machine translation systems, and       2019). In addition to modeling developments, re-
                                             tools such as word aligners, for all test lan-      cent work has also expanded to multi-lingual pars-
                                             guages. We remove these assumptions and             ing. However, this has primarily required that par-
                                             study cross-lingual semantic parsing as a zero-     allel training data is either available (Jie and Lu,
                                             shot problem without parallel data for 7 test
                                             languages (DE, ZH, FR, ES, PT, HI, TR). We
                                                                                                 2014), or requires professional translation to gen-
                                             propose a multi-task encoder-decoder model          erate (Susanto and Lu, 2017a). This creates an
                                             to transfer parsing knowledge to additional         entry barrier to localizing a semantic parser which
                                             languages using only English-Logical form           may not be necessary. Sherborne et al. (2020) and
                                             paired data and unlabeled, monolingual utter-       Moradshahi et al. (2020) explore the utility of ma-
                                             ances in each test language. We train an en-        chine translation, as a cheap alternative to human
                                             coder to generate language-agnostic represen-       translation, for training data but found translation
                                             tations jointly optimized for generating logical
                                                                                                 artifacts as a performance limiting factor. Zhu et al.
                                             forms or utterance reconstruction and against
                                             language discriminability. Our system frames        (2020) and Li et al. (2021) both examine zero-shot
                                             zero-shot parsing as a latent-space alignment       spoken language understanding and observed a sig-
                                             problem and finds that pre-trained models can       nificant performance penalty from cross-lingual
                                             be improved to generate logical forms with          transfer from English to lower resource languages.
                                             minimal cross-lingual transfer penalty. Ex-            Cross-lingual generative semantic parsing, as
                                             perimental results on Overnight and a new
                                                                                                 opposed to the slot-filling format, has been under-
                                             executable version of MultiATIS++ find that
                                             our zero-shot approach performs above back-         explored in the zero-shot case. This challenging
                                             translation baselines and, in some cases, ap-       task combines the outstanding difficulty of struc-
                                             proaches the supervised upper bound.                tured prediction, for accurate parsing, with a la-
                                                                                                 tent space alignment requirement, wherein mul-
                                         1   Introduction                                        tiple languages should encode to an overlapping
                                         Executable semantic parsing translates a natural        semantic representation. Prior work has identified
                                         language utterance to a logical form for execution      this penalty from cross-lingual transfer (Artetxe
                                         in some knowledge base to return a denotation. The      et al., 2020; Zhu et al., 2020; Li et al., 2021) that
                                         parsing task realizes an utterance as a semantically-   is insufficiently overcome with pre-trained mod-
                                         identical, but machine-interpretable, expression        els alone. While there has been some success in
                                         grounded in a denotation. The transduction be-          machine-translation based approaches, we argue
                                         tween natural and formal languages has allowed          that inducing a shared multilingual space without
                                         semantic parsers to become critical infrastructure      parallel data is superior because (a) this nullifies
                                         in the pipeline of human-computer interfaces for        the introduction of translation or word alignment
                                         question answering from structured data (Berant         errors and (b) this approach scales to low-resource
                                         et al., 2013; Liang, 2016; Kollar et al., 2018).        languages without reliable machine translation.
                                            Sequence-to-sequence approaches have proven             In this work, we propose a method of zero-shot
                                         capable in producing high quality parsers (Jia and      executable semantic parsing using only mono-
lingual data for cross-lingual transfer of parsing       erative trees (Lu, 2014; Susanto and Lu, 2017b)
knowledge. Our approach uses paired English-             and LSTM-based sequence-to-sequence models
logical form data for the parsing task and adapts to     (Susanto and Lu, 2017a). This work has largely
additional languages using auxiliary tasks trained       affirmed the benefit of “high-resource” multi-
on unlabeled monolingual corpora. Our motivation         language ensemble training (Jie and Lu, 2014).
is to accurately parse languages, for which paired       Code-switching in multilingual parsing has also
training data is unseen, to examine if any transla-      been explored through mixed-language training
tion is required for accurate parsing. The objective     datasets (Duong et al., 2017; Einolghozati et al.,
of this work is to parse utterances in some language,    2021). For adapting a parser to a new language,
l, without observing paired training data, (xl , y),     machine translation has been explored as mostly
suitable machine translation, word alignment or          reasonable proxy for in-language data (Sherborne
bilingual dictionaries between l and English. Us-        et al., 2020; Moradshahi et al., 2020). However,
ing a multi-task objective, our system adapts pre-       machine translation, in either direction, can intro-
trained language models to generate logical forms        duce limiting artifacts (Artetxe et al., 2020) and
from multiple languages with a minimized penalty         generalisation is limited due to a “parallax” error
for cross-lingual transfer from English to German        between gold test utterances and “translationese”
(DE), Chinese (ZH), French (FR), Spanish (ES),           training data (Riley et al., 2020).
Portuguese (PT), Hindi (HI) and Turkish (TR).               Zero-shot parsing has primarily focused on
                                                         ‘cross-domain’ challenges to improve parsing
2   Related Work                                         across varying query structure and lexicons (Herzig
Cross-lingual Modeling This area has recently            and Berant, 2018; Givoli and Reichart, 2019) or
gained increasing interest across a range of natural     different databases (Zhong et al., 2020; Suhr et al.,
language understanding settings, with benchmarks         2020). The Spider dataset (Yu et al., 2018) for-
such as XGLUE (Liang et al., 2020) and XTREME            malises this challenge with unseen tables at test
(Hu et al., 2020) providing classification and gen-      time for zero-shot generalisation. Combining zero-
eration benchmarks across a range of languages           shot parsing with cross-lingual modeling has been
(Zhao et al., 2020). There has been significant inter-   examined for the UCCA formalism (Hershcovich
est in cross-lingual approaches to dependency pars-      et al., 2019) and for task-oriented parsing (see be-
ing (Tiedemann et al., 2014; Schuster et al., 2019),     low). Generally, we find that cross-lingual exe-
sentence simplification (Mallinson et al., 2020) and     cutable parsing has been under-explored in the zero-
spoken-language understanding (SLU; He et al.,           shot case.
2013; Upadhyay et al., 2018). Within this, pre-             Executable parsing contrasts to more abstract
training has shown to be widely beneficial for cross-    meaning expressions (AMR, λ-calculus) or hybrid
lingual transfer using models such as multilingual       logical forms, such as decoupled TOP (Li et al.,
BERT (Devlin et al., 2019) or XLM-Roberta (Con-          2021), which cannot be executed to retrieve deno-
neau et al., 2020a).                                     tations. Datasets such as ATIS (Dahl et al., 1994)
   Broadly, pre-trained models trained on massive        have both executable parsing and spoken language
corpora purportedly learn an overlapping cross-          understanding format. We focus on only the for-
lingual latent space (Conneau et al., 2020b) but         mer, as a generation task, and note that results for
have also been identified as under-trained for some      the latter classification task are not comparable.
tasks (Li et al., 2021). A subset of cross-lingual
                                                         Dialogue Modeling Usage of MT has also been
modeling has focused on engineering alignment of
                                                         extended to a task-oriented setting using the MTOP
multi-lingual word representations (Conneau et al.,
                                                         dataset (Li et al., 2021), employing a pointer-
2017; Artetxe and Schwenk, 2018; Hu et al., 2021)
                                                         generator network (See et al., 2017) to generate
for tasks such as dependency parsing (Schuster
                                                         dialogue-act style logical forms. This has been
et al., 2019; Xu and Koehn, 2021) and word align-
                                                         combined with adjacent SLU tasks, on MTOP
ment (Jalili Sabet et al., 2020; Dou and Neubig,
                                                         and MultiATIS++ (Xu et al., 2020), to combine
2021).
                                                         cross-lingual task-oriented parsing and SLU clas-
Semantic Parsing For parsing, there has been             sification. Generally, these approaches addition-
recent investigation into multilingual parsing us-       ally require word alignment to project annotations
ing multiple language ensembles using hybrid gen-        between languages. Prior zero-shot cross-lingual
work in SLU (Li et al., 2021; Zhu et al., 2020; Kr-               tion space to minimize the penalty of cross-lingual
ishnan et al., 2021) similarly identifies a penalty               transfer and improve parsing of languages without
for cross-lingual transfer and finds that pre-trained             training data. To generate this latent space, we
models and machine translation can only partially                 posit that only unpaired monolingual data in the
mitigate this error.                                              target language, and some pre-trained encoder, are
   Compared with similar exploration of cross-                    required. We remove the assumption that machine
lingual parsing such as Xu et al. (2020) and Li                   translation is suitable and study the inverse case
et al. (2021), the zero-shot case is our primary fo-              wherein only paired data in English and monolin-
cus. Our study assumes a case of no paired data                   gual data in target languages are available. This
in the test and our proposed approach is more sim-                frames the cross-lingual parsing task as one of la-
ilar to Mallinson et al. (2020) and Xu and Koehn                  tent representation alignment only, to explore a
(2021) in that we objectify the convergence of pre-               possible upper bound of parsing accuracy without
trained representations for a downstream task. Our                errors from external dependencies.
approach is also similar to work in zero-resource                    The benefit of our zero-shot method is that our
(Firat et al., 2016) or unsupervised machine trans-               approach requires only external corpora in the
lation with monolingual corpora (Lample et al.,                   test language. Using only English paired data
2018). Contrastingly, our approach is not pair-                   and monolingual corpora, we can generate log-
wise between languages owing to our single multi-                 ical forms above back-translation baselines and
lingual latent representation.                                    compete with fully supervised in-language train-
                                                                  ing. For example, consider the German test of the
3       Problem Formulation                                       Overnight dataset (Wang et al., 2015; Sherborne
                                                                  et al., 2020) which lacks German paired training
The primary challenge for cross-lingual parsing                   data. To competitively parse this test set, our ap-
is learning parameters, that can parse an utter-                  proach minimally requires only the original English
ance, x, from any test language. Typically, a                     training data and collection of unlabeled German
parser trained on language l, or multiple lan-                    utterances.
guages {l1 , l2 , . . . , lN }, is only capable for these
languages and performs poorly outside this set. For               4   Method
a new language, conventional approaches requires
parallel datasets and models (Jie and Lu, 2014;                   We adopt a multi-task sequence-to-sequence model
Haas and Riezler, 2016; Duong et al., 2017).                      (Luong et al., 2016) combining logical form gen-
                                                                  eration with two auxiliary objectives. The first
   In our work, zero-shot parsing refers to parsing
                                                                  is a monolingual reconstruction loss, similar to
utterances in languages without paired data during
                                                                  domain-adaptive pre-training (Gururangan et al.,
training. For some language, l, there exists no pair-
                                                                  2020), and the second is a language identification
ing of xl to a logical form, y, except for English.
                                                                  task. We describe each model component below:
During training, the model only generates logical
forms conditioned on English input data. Other                    Generating Logical Forms Predicting logical
languages are incorporated into the model using                   forms is the primary output objective for the sys-
auxiliary objectives detailed in Section 4. Zero-                 tem. For an utterance x = (x1 , x2 , . . . , xT ), we de-
shot parsing happens at test time, when a logical                 sire to predict a logical form y = (y1 , y2 , . . . , yM )
form is predicted conditioned upon an input ques-                 representing the same meaning in a machine-
tion from any test language.                                      executable language. We model this transduction
   Our approach improves the zero-shot case using                 task using a sequence-to-sequence neural network
only monolingual objectives to parse additional lan-              (Sutskever et al., 2014) based upon the Transformer
guages beyond English1 without semantic parsing                   architecture (Vaswani et al., 2017).
training data. We explore a hypothesis that a multi-                 The sequence x is encoded to a latent represen-
lingual latent space can be learned through auxil-                tation z = (z1 , z2 , . . . , zT ) through Equation (1)
iary tasks in tandem with logical form generation.                using a stacked self-attention Transformer encoder,
We desire to learn a language-agnostic representa-                E, with weights θE . The conditional probability
    1
                                                                  of the output sequence y is expressed as Equation
    English acts as the “source” language in our work as the
source language for all explored datasets. We express all other   (2) as each token yi is autoregressively generated
languages as “target” languages.                                  based upon z and prior outputs, y
tion is modeled in Equation (3) using Transformer       translation objective, using bitext, to future work.
decoder, DLF for logical forms with associated                                     X
weights θDLF .                                                        LNL = −           log p (x | x̃)           (8)
                                                                                    x

             z = E (x | θE )                      (1)   Language Prediction We augment the model
                    M                                   with a third objective designed to encourage
                    Y
      p (y | x) =         p (yi | y
2015), an eight-domain dataset covering Basket-
∂L        ∂LLF        ∂LN L        ∂LLP                   ball, Blocks, Calendar, Housing, Publications,
    = αLF      + αN L       − λαLP
∂θE        ∂θE         ∂θE          ∂θE                   Recipes, Restaurants, and Social Network do-
                                     (12)                 mains. This dataset comprises 13,682 English ut-
                      2                                   terances paired with λ−DCS logical forms, ex-
              λ=            +1       (13)
                 1 + e−10p                                ecutable in SEMPRE (Berant et al., 2013), split
                                                          into 8,754/2,188/2,740 for training/validation/test
   During the backward pass, each output head             respectively. The Overnight training data is only
backpropagates the gradient signal from the re-           available in English and we use the Chinese and
spective objective function. For the encoder, these       German test set translations from Sherborne et al.
signals are combined as Equation (12) where               (2020) for multilingual evaluation. Each domain
α{LF, NL, LP} are loss weightings and λ is the re-        employs some distinctive linguistic structures, such
versed gradient scheduling parameter from (Ganin          as spatial relationships in Blocks or temporal re-
et al., 2016). The value of λ increases with training     lationships in Calendar, and our results on this
progress p according to Equation (13) to limit the        dataset permit a detailed study on how cross-lingual
impact of noisy predictions during early training.        transfer applies to various linguistic phenomena.
   Our intuition is that the parser will adapt and rec-
ognize an encoding from an unfamiliar language            Reconstruction Data For the reconstruction
through jointly training to generate logical forms        task, we use questions from the MKQA corpus
and our auxiliary objectives using monolingual            (Longpre et al., 2020), a translation of 10,000 sam-
data. This sequence-to-sequence approach is highly        ples from NaturalQuestions (Kwiatkowski et al.,
flexible and may be useful for zero-shot approaches       2019) into 26 languages. We posit that MKQA is
to additional generation tasks.                           suitable for our auxiliary objective as (a) the utter-
                                                          ances, as questions, closely model our test set while
5     Data & Resources                                    being highly variable in domain, (b) the corpus in-
                                                          cludes all training languages in our experiments,
5.1    Semantic Parsing                                   and (c) the balanced data between languages limits
We analyze our model using two datasets to ex-            overexposure effects to any one test language to
amine a broader language ensemble and a three-            the detriment of others. For evaluating ATIS, we
language multi-domain parsing benchmark. The              use 60,000 utterances from MKQA in languages
former is a new version of the ATIS dataset of            with a training set (EN, DE, ZH, FR, ES, PT) for
travel information (Dahl et al., 1994). Starting with     comparison. For Overnight, we use only 30,000
the English utterances and simplified SQL queries         utterances in EN, DE, and ZH.
from Iyer et al. (2017), using the same dataset split
as Kwiatkowski et al. (2011), we align these to the       5.2   Pre-trained Models
parallel utterances from MultiATIS++ dataset for          The model described in Section 4 relies on some
spoken language understanding (Xu et al., 2020).          encoder model, E, to generate latent representa-
Therefore, we add executable SQL queries to 4473          tions amenable to both semantic parsing and our
training, 493 development, and 448 test utterances        additional objectives. We found our system per-
in Chinese (ZH), German (DE), French (FR), Span-          forms poorly without using some pre-trained model
ish (ES), and Portuguese (PT) that were previously        within the encoder to provide prior knowledge
constructed for the slot-filling format of ATIS. Ad-      from large external corpora. Prior work observed
ditionally, we align the test set to the Hindi (HI)       improvements using multilingual BERT (Devlin
and Turkish (TR) utterances from Upadhyay et al.          et al., 2019) and XLM-Roberta (Conneau et al.,
(2018). We now can predict SQL for the ATIS               2020a). However, we find that these models under-
dataset from eight natural languages. This repre-         performed at cross-lingual transfer into parsing
sents a significant improvement over 2 or 3 lan-          and we instead report results using the encoder of
guage approaches (Sherborne et al., 2020; Duong           the mBART50 pre-trained sequence-to-sequence
et al., 2017). Note that MultiATIS++ Japanese set         model (Tang et al., 2020). This model extends
is excluded as the utterance alignment between this       mBART (Liu et al., 2020), trained on a monolin-
language and others was not recoverable.                  gual de-noising objective over 25 languages, to 50
   We also examine Overnight (Wang et al.,                languages using multilingual bitext. We note that
the pre-training data is not balanced across lan-       glish queries. We consider improving upon this
guages, with English as the highest resource and        “minimum effort” approach as a lower-bound for
Portuguese as the lowest.                               justifying our approach. Additionally, we com-
                                                        pare to an upper-bound of professional training
6   Experiments                                         data translation for MultiATIS++ in DE, ZH, FR,
Setting The implementation of our model, de-            ES, and PT. This represents the “maximum effort”
scribed in Section 4, largely follows parameter         strategy that our system approaches.
settings from Liu et al. (2020) for a Transformer
                                                        7   Results
encoder-decoder model. The encoder, E, decoders,
{DLF , DNL }, and embedding matrices all use a          We implement the model described in Section 4,
dimension size of 1024 with the self-attention pro-     with hyper-parameters described in Section 6, and
jection of 4096 and 16 heads per layer. Both de-        report our results for ATIS in Table 1 and Overnight
coders are 6-layer stacks. Weights were initialized     in Table 2. Additional results for Overnight are
by sampling from normal distribution N (0, 0.02).       reported in Tables 3-5 in Appendix A. For both
When using mBART50 to initialize the encoder, we        datasets, we present ablated results for a parser
use all 12 pre-trained layers frozen and append one     without auxiliary objectives (DLF only), using only
additional layer. The domain prediction network is      parsing and reconstruction (DLF +DNL ) and finally
a two-layer feed-forward network projecting from        our full model using two auxiliary objectives with
z to 1024 hidden units then to |L| for L languages.     parsing (DLF + DNL + LP).
For the reconstruction noising function, we use to-        Comparing to baselines, we find that generally,
ken masking to randomly replace u tokens in x           the approach without auxiliary objectives performs
with “”. u is sampled from U (0, v) and           worse than back-translation. This is unsurprising,
we optimize v = 3 as the best setting.                  as this initial approach relies on only information
   The system was trained using the Adam opti-          captured during pre-training to be capable at pars-
mizer (Kingma and Ba, 2014) with a learning rate        ing non-English. This result is similar to Li et al.
of 1 × 10−4 , and a weight decay factor of 0.1.         (2021) in observing a large penalty from cross-
We use a “Noam” schedule for the learning rate          lingual transfer. However, our approach is differen-
(Vaswani et al., 2017) with a warmup of 5000 steps.     tiated in that we now improve upon this with our
Loss weighting values for α{LF, NL, LP} were set        multi-task model. Comparing our auxiliary objec-
to {1, 0.33, 0.1} respectively. Batches during train-   tives, we broadly find more benefit to monolingual
ing were size 50 and homogeneously sampled from         reconstruction than language prediction. Four ATIS
either SLF or SNL , with an epoch consuming one         test languages improve by > 10% using the recon-
pass over both. Models were trained for a maxi-         struction objective, whereas the largest improve-
mum of 100 epochs with early stopping. Model se-        ment from adding language prediction is +8%.
lection and hyperparameters were tuned on the SLF
                                                        ATIS We identify improvements in accuracy
validation set in English. Test predictions were gen-
                                                        across all languages by incorporating our recon-
erated using beam search with 5 hypotheses. Evalu-
                                                        struction objective and then further benefit with
ation is reported as denotation accuracy, computed
                                                        the language prediction objective. The strongest
by comparing the retrieved denotation from the pre-
                                                        improvements are for Chinese (+20.7%) and Por-
diction, ŷ, inside the knowledge base to that from
                                                        tuguese (+18.0%) from Model (1) to Model (3).
executing the gold-standard logical form.
                                                        This is a sensible result for Portuguese, given that
   Each model is trained on 1 NVIDIA RTX3090
                                                        this language has comparatively very little data
GPU. All models were implemented using Al-
                                                        (49,446 sentences) during pre-training in Tang et al.
lenNLP (Gardner et al., 2018) and PyTorch (Paszke
                                                        (2020). Chinese is strongly represented during this
et al., 2017), using pre-trained models from Hug-
                                                        same pre-training, with 10,082,367 sentences, how-
gingFace (Wolf et al., 2019).
                                                        ever, Model (1) performs extremely poorly here.
Baseline We compare to a back-translation base-         Our observed improvement for Chinese could be
line for both datasets. Here, the test set in all       a consequence of this language being less ortho-
languages is translated to English using Google         graphically similar to other pre-training languages.
Translate (Wu et al., 2016) and input to reference      This could result in poorer cross-lingual informa-
sequence-to-sequence model trained on only En-          tion sharing during pre-training and therefore leave
Model                       EN         DE     ZH     FR     ES      PT      HI     TR
             Baseline
           Monolingual training        77.2    66.6      64.9   67.8   64.1   66.1     -       -
           Back-translation             -      56.9      51.4   58.2   57.9   57.3    52.6    52.7
             Our model
           (1) DLF only                77.2    50.2      38.5   61.3   46.5   42.5    40.4    37.3
           (2) DLF + DNL               77.7    61.1      51.2   62.7   58.2   57.5    49.5    44.7
           (3) DLF + DNL + LP          76.3    67.1      59.2   66.9   58.2   60.5    54.1    47.1
Table 1: Denotation accuracy for ATIS (Dahl et al., 1994) in English (EN), German (DE), Chinese (ZH), French
(FR), Spanish (ES), Portuguese (PT), Hindi (HI) and Turkish (TR) using data from Upadhyay et al. (2018); Xu
et al. (2020). We report results for multiple systems: (1) using only English semantic parsing data, (2) Multi-
task combined parsing and reconstruction and (3) Multitask parsing, reconstruction and language prediction. We
compare to a back-translation baseline and monolingual upper-bound results where available.

     Model                    EN     DE     ZH            tives and encouraging language-agnostic encodings
      Baseline
                                                          – we find the model can generate better latent repre-
                                                          sentations for two typologically diverse languages
     Back-translation          -    60.1   48.1           without guidance. This result suggests that our ap-
      Our model                                           proach has a wider benefit to cross-lingual transfer
     (1) DLF only            80.5   58.4   48.0           beyond the languages we explicitly desire to trans-
     (2) DLF + DNL           81.3   62.7   49.5           fer towards. We find our system improves above
     (3) DLF + DNL + LP      81.4   64.3   52.7           our baseline for Hindi but not Turkish. This could
                                                          be another consequence of unbalanced pre-training,
Table 2: Average denotation accuracy across all do-       given the 1,327,206 Hindi sentences compared to
mains for Overnight (Wang et al., 2015). for English
                                                          204,200 for Turkish.
(EN), German (DE) and Chinese (ZH). We report re-
sults for multiple systems: (1) using only English se-
mantic parsing data, (2) Multi-task combined parsing
                                                          Overnight Table 2 similarly identifies improve-
and reconstruction and (3) Multitask parsing, recon-      ment for both German and Chinese using mono-
struction and language prediction. We compare to a        lingual reconstruction and language prediction ob-
back-translation baseline for both ZH and DE.             jectives. We observe smaller improvements be-
                                                          tween Model (1) and Model (3) compared to ATIS,
                                                          +5.9% for German and +4.7% for Chinese, which
more to be gained in our downstream task. We              may be a consequence of increased question com-
find less improvement across our models in lan-           plexity across Overnight domains. Results for indi-
guages closer to English, finding only a +5.6%            vidual domains are given in Appendix A for brevity,
improvement for French and +11.7% for Spanish.            however, we did not observe any strong trends
However, only for these languages do we approach          across domains in either language. Comparatively,
the supervised upper-bound with −0.9% error for           our best system is more accurate for German than
French and a +0.5% gain for German. Given we              Chinese by 9.6%, which may be another conse-
used no semantic parsing data in these languages,         quence of orthographic dissimilarity between lan-
this represents significant competition to methods        guages during pre-training and our approach.
requiring dataset translation for success.                   Finally, we also note the capability of our model
   We also note that our system improves parsing          for English appears mostly unaffected by our addi-
on Hindi and Turkish, even though these are absent        tional objectives. For both ATIS and Overnight, we
from the auxiliary training objectives. The model         observe no catastrophic forgetting (McCloskey and
is not trained to reconstruct or identify either of       Cohen, 1989) for the source language and, in some
these languages, however, we find that our sys-           cases, a marginal performance improvement from
tem improves parsing regardless. By adapting our          our multi-task objectives. While semantic parsing
latent representation to multiple generation objec-       for English is well studied and not the focus of our
work, this result suggests there is minimal required         cross-lingual structure in pretrained language mod-
compromise in maintaining parsing accuracy for               els. In Proceedings of the 58th Annual Meeting
                                                             of the Association for Computational Linguistics,
English and transferring this capability to other
                                                             pages 6022–6034, Online. Association for Compu-
languages.                                                   tational Linguistics.

8   Conclusion                                             Deborah A. Dahl, Madeleine Bates, Michael Brown,
                                                             William Fisher, Kate Hunicke-Smith, David Pallett,
We present a new approach to zero-shot cross-                Christine Pao, Alexander Rudnicky, and Elizabeth
lingual semantic parsing for accurate parsing of             Shriberg. 1994. Expanding the scope of the ATIS
languages without paired training data. We define            task: The ATIS-3 corpus. In Proceedings of the
                                                             Workshop on Human Language Technology, HLT
and evaluate a multi-task model combining logical            ’94, pages 43–48, Stroudsburg, PA, USA.
form generation with auxiliary objectives that re-
quire only unlabeled, monolingual corpora. This            Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
approach minimizes the error from cross-lingual               Kristina Toutanova. 2019. BERT: Pre-training of
                                                              deep bidirectional transformers for language under-
transfer and improves parsing accuracy across all             standing. In Proceedings of the 2019 Conference of
test languages. We demonstrate that an absence                the North American Chapter of the Association for
of semantic parsing data can be overcome through              Computational Linguistics: Human Language Tech-
aligning latent representations and extend this to            nologies, Volume 1 (Long and Short Papers), pages
                                                              4171–4186, Minneapolis, Minnesota.
examine languages also unseen during our multi-
task alignment training. In the future, we plan to         Li Dong and Mirella Lapata. 2016. Language to Log-
explore additional objectives, such as translation,           ical Form with Neural Attention. In Proceedings
in our model and consider larger and more diverse             of the 54th Annual Meeting of the Association for
                                                              Computational Linguistics (Volume 1: Long Papers),
corpora for our auxiliary objectives.
                                                              pages 33–43, Stroudsburg, PA, USA.

                                                           Li Dong and Mirella Lapata. 2018. Coarse-to-Fine De-
References                                                    coding for Neural Semantic Parsing. In Proceedings
                                                              of the 56th Annual Meeting of the Association for
Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2020.
                                                              Computational Linguistics (Volume 1: Long Papers),
  Translation artifacts in cross-lingual transfer learn-
                                                              pages 731–742, Melbourne, Australia.
  ing. arXiv preprint arXiv:2004.04721.

Mikel Artetxe and Holger Schwenk. 2018. Mas-               Zi-Yi Dou and Graham Neubig. 2021. Word alignment
  sively multilingual sentence embeddings for zero-           by fine-tuning embeddings on parallel corpora.
  shot cross-lingual transfer and beyond. ArXiv,
  abs/1812.10464.                                          Long Duong, Hadi Afshar, Dominique Estival, Glen
                                                             Pink, Philip Cohen, and Mark Johnson. 2017. Multi-
Jonathan Berant, Andrew Chou, Roy Frostig, and Percy         lingual Semantic Parsing And Code-Switching. In
  Liang. 2013. Semantic parsing on Freebase from             Proceedings of the 21st Conference on Computa-
  question-answer pairs. In Proceedings of the 2013          tional Natural Language Learning, pages 379–389,
  Conference on Empirical Methods in Natural Lan-            Vancouver, Canada.
  guage Processing, pages 1533–1544, Seattle, Wash-
  ington, USA.                                             Arash Einolghozati, Abhinav Arora, Lorena Sainz-
                                                             Maza Lecanda, Anuj Kumar, and Sonal Gupta. 2021.
Alexis Conneau, Kartikay Khandelwal, Naman Goyal,            El volumen louder por favor: Code-switching in
  Vishrav Chaudhary, Guillaume Wenzek, Francisco             task-oriented semantic parsing. In EACL2021. As-
  Guzmán, Edouard Grave, Myle Ott, Luke Zettle-              sociation for Computational Linguistics.
  moyer, and Veselin Stoyanov. 2020a. Unsupervised
  cross-lingual representation learning at scale. In       Orhan Firat, Baskaran Sankaran, Yaser Al-Onaizan,
  Proceedings of the 58th Annual Meeting of the Asso-        Fatos T. Yarman Vural, and Kyunghyun Cho. 2016.
  ciation for Computational Linguistics, pages 8440–         Zero-Resource Translation with Multi-Lingual Neu-
  8451, Online. Association for Computational Lin-           ral Machine Translation. In Proceedings of the 2016
  guistics.                                                  Conference on Empirical Methods in Natural Lan-
                                                             guage Processing, pages 268–277, Stroudsburg, PA,
Alexis Conneau, Guillaume Lample, Marc’Aurelio               USA.
  Ranzato, Ludovic Denoyer, and Hervé Jégou. 2017.
  Word translation without parallel data.   arXiv          Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan,
  preprint arXiv:1710.04087.                                 Pascal Germain, Hugo Larochelle, François Lavio-
                                                             lette, Mario Marchand, and Victor Lempitsky. 2016.
Alexis Conneau, Shijie Wu, Haoran Li, Luke Zettle-           Domain-adversarial training of neural networks. J.
  moyer, and Veselin Stoyanov. 2020b. Emerging               Mach. Learn. Res., 17(1):2096–2030.
Matt Gardner, Joel Grus, Mark Neumann, Oyvind              Empirical Methods in Natural Language Processing,
 Tafjord, Pradeep Dasigi, Nelson F. Liu, Matthew E.        pages 1619–1629, Brussels, Belgium.
 Peters, Michael Schmitz, and Luke S. Zettlemoyer.
 2018. Allennlp: A deep semantic natural language        Junjie Hu, Melvin Johnson, Orhan Firat, Aditya Sid-
 processing platform. ArXiv, abs/1803.07640.               dhant, and Graham Neubig. 2021. Explicit Align-
                                                           ment Objectives for Multilingual Bidirectional En-
Ofer Givoli and Roi Reichart. 2019. Zero-shot seman-       coders. In NAACL2021.
  tic parsing for instructions. In Proceedings of the
  57th Annual Meeting of the Association for Com-        Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham
  putational Linguistics, pages 4454–4464, Florence,       Neubig, Orhan Firat, and Melvin Johnson. 2020.
  Italy. Association for Computational Linguistics.        Xtreme: A massively multilingual multi-task bench-
                                                           mark for evaluating cross-lingual generalization.
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza,
   Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron     Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant
   Courville, and Yoshua Bengio. 2014. Generative ad-       Krishnamurthy, and Luke Zettlemoyer. 2017. Learn-
   versarial nets. In Advances in Neural Information        ing a neural semantic parser from user feedback. In
   Processing Systems, volume 27. Curran Associates,        Proceedings of the 55th Annual Meeting of the As-
   Inc.                                                     sociation for Computational Linguistics (Volume 1:
                                                            Long Papers), pages 963–973, Vancouver, Canada.
Jiaqi Guo, Zecheng Zhan, Yan Gao, Yan Xiao,
   Jian-Guang Lou, Ting Liu, and Dongmei Zhang.          Masoud Jalili Sabet, Philipp Dufter, François Yvon,
   2019.     Towards complex text-to-SQL in cross-        and Hinrich Schütze. 2020. SimAlign: High qual-
   domain database with intermediate representation.      ity word alignments without parallel training data us-
   In Proceedings of the 57th Annual Meeting of the       ing static and contextualized embeddings. In Find-
   Association for Computational Linguistics, pages       ings of the Association for Computational Linguis-
   4524–4535, Florence, Italy. Association for Compu-     tics: EMNLP 2020, pages 1627–1643, Online. As-
   tational Linguistics.                                  sociation for Computational Linguistics.

Suchin Gururangan, Ana Marasović, Swabha                Robin Jia and Percy Liang. 2016. Data Recombination
  Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey,           for Neural Semantic Parsing. In Proceedings of the
  and Noah A. Smith. 2020. Don’t stop pretraining:         54th Annual Meeting of the Association for Compu-
  Adapt language models to domains and tasks. In           tational Linguistics (Volume 1: Long Papers), pages
  Proceedings of the 58th Annual Meeting of the            12–22, Stroudsburg, PA, USA.
  Association for Computational Linguistics, pages
  8342–8360, Online. Association for Computational       Zhanming Jie and Wei Lu. 2014. Multilingual Seman-
  Linguistics.                                             tic Parsing : Parsing Multiple Languages into Se-
                                                           mantic Representations. In Proceedings of the 25th
Carolin Haas and Stefan Riezler. 2016. A Corpus            International Conference on Computational Linguis-
  and Semantic Parser for Multilingual Natural Lan-        tics: Technical Papers, pages 1291–1301, Dublin,
  guage Querying of OpenStreetMap. In Proceedings          Ireland.
  of the 2016 Conference of the North American Chap-
  ter of the Association for Computational Linguis-      Diederik P. Kingma and Jimmy Ba. 2014.          Adam:
  tics: Human Language Technologies, pages 740–            A method for stochastic optimization.         CoRR,
  750, Stroudsburg, PA, USA.                               abs/1412.6980.

Xiaodong He, Li Deng, Dilek Hakkani-Tür, and             Thomas Kollar, Danielle Berry, Lauren Stuart,
  Gokhan Tur. 2013. Multi-style adaptive training          Karolina Owczarzak, Tagyoung Chung, Lambert
  for robust cross-lingual spoken language understand-     Mathias, Michael Kayser, Bradford Snow, and Spy-
  ing. IEEE International Conference on Acoustics,         ros Matsoukas. 2018. The Alexa meaning repre-
  Speech, and Signal Processing (ICASSP).                  sentation language. In Proceedings of the 2018
                                                           Conference of the North American Chapter of the
Dan Hendrycks and Kevin Gimpel. 2020. Gaussian er-         Association for Computational Linguistics: Human
  ror linear units (gelus).                                Language Technologies, Volume 3 (Industry Papers),
                                                           pages 177–184, New Orleans - Louisiana.
Daniel Hershcovich, Zohar Aizenbud, Leshem
  Choshen, Elior Sulem, Ari Rappoport, and Omri          Jitin Krishnan, Antonios Anastasopoulos, Hemant
  Abend. 2019. Sem-Eval-2019 task 1: Cross-lingual           Purohit, and Huzefa Rangwala. 2021. Multilin-
  semantic parsing with UCCA. In Proceedings                 gual code-switching for zero-shot cross-lingual in-
  of the 13th International Workshop on Semantic             tent prediction and slot filling.
  Evaluation, pages 1–10, Minneapolis, Minnesota,
  USA.                                                   Tom Kwiatkowski, Jennimaria Palomaki, Olivia Red-
                                                           field, Michael Collins, Ankur Parikh, Chris Alberti,
Jonathan Herzig and Jonathan Berant. 2018. Decou-          Danielle Epstein, Illia Polosukhin, Matthew Kelcey,
  pling Structure and Lexicon for Zero-Shot Semantic       Jacob Devlin, Kenton Lee, Kristina N. Toutanova,
  Parsing. In Proceedings of the 2018 Conference on        Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob
Uszkoreit, Quoc Le, and Slav Petrov. 2019. Natu-         Jonathan Mallinson, Rico Sennrich, and Mirella Lap-
  ral questions: a benchmark for question answering          ata. 2020. Zero-shot crosslingual sentence simplifi-
  research. Transactions of the Association of Compu-        cation. In Proceedings of the 2020 Conference on
  tational Linguistics.                                      Empirical Methods in Natural Language Process-
                                                             ing (EMNLP), pages 5109–5126, Online. Associa-
Tom Kwiatkowski, Luke Zettlemoyer, Sharon Gold-              tion for Computational Linguistics.
  water, and Mark Steedman. 2011. Lexical Gener-
  alization in CCG Grammar Induction for Semantic          Michael McCloskey and Neal J. Cohen. 1989. Catas-
  Parsing. In Proceedings of the 2011 Conference on          trophic interference in connectionist networks: The
  Empirical Methods in Natural Language Processing,          sequential learning problem. Psychology of Learn-
  pages 1512–1523, Edinburgh, Scotland, UK.                  ing and Motivation - Advances in Research and The-
                                                             ory, 24(C):109–165.
Guillaume Lample, Alexis Conneau, Ludovic Denoyer,
  and Marc’Aurelio Ranzato. 2018. Unsupervised             Mehrad Moradshahi, Giovanni Campagna, Sina Sem-
  Machine Translation Using Monolingual Corpora             nani, Silei Xu, and Monica Lam. 2020. Localizing
  Only. In ICLR2018. arXiv.                                 open-ontology QA semantic parsers in a day using
Mike Lewis, Yinhan Liu, Naman Goyal, Mar-                   machine translation. In Proceedings of the 2020
  jan Ghazvininejad, Abdelrahman Mohamed, Omer              Conference on Empirical Methods in Natural Lan-
  Levy, Veselin Stoyanov, and Luke Zettlemoyer.             guage Processing (EMNLP), pages 5970–5983, On-
  2020. BART: Denoising sequence-to-sequence pre-           line. Association for Computational Linguistics.
  training for natural language generation, translation,
                                                           Adam Paszke, Sam Gross, Soumith Chintala, Gregory
  and comprehension. In Proceedings of the 58th An-
                                                             Chanan, Edward Yang, Zachary DeVito, Zeming
  nual Meeting of the Association for Computational
                                                             Lin, Alban Desmaison, Luca Antiga, and Adam
  Linguistics, pages 7871–7880, Online. Association
                                                             Lerer. 2017. Automatic differentiation in PyTorch.
  for Computational Linguistics.
                                                             In NIPS Autodiff Workshop.
Haoran Li, Abhinav Arora, Shuohui Chen, Anchit
  Gupta, Sonal Gupta, and Yashar Mehdad. 2021.             Parker Riley, Isaac Caswell, Markus Freitag, and David
  Mtop: A comprehensive multilingual task-oriented           Grangier. 2020. Translationese as a language in
  semantic parsing benchmark.                                “multilingual” NMT. In Proceedings of the 58th An-
                                                             nual Meeting of the Association for Computational
Percy Liang. 2016. Learning executable semantic              Linguistics, pages 7737–7746, Online. Association
  parsers for natural language understanding. Com-           for Computational Linguistics.
  mun. ACM, 59(9):68–76.
                                                           Tal Schuster, Ori Ram, Regina Barzilay, and Amir
Yaobo Liang, Nan Duan, Yeyun Gong, Ning Wu,                  Globerson. 2019. Cross-lingual alignment of con-
  Fenfei Guo, Weizhen Qi, Ming Gong, Linjun                  textual word embeddings, with applications to zero-
  Shou, Daxin Jiang, Guihong Cao, Xiaodong Fan,              shot dependency parsing. In Proceedings of the
  Bruce Zhang, Rahul Agrawal, Edward Cui, Sining             2019 Conference of the North American Chapter of
  Wei, Taroon Bharti, Jiun-Hung Chen, Winnie Wu,             the Association for Computational Linguistics: Hu-
  Shuguang Liu, Fan Yang, and Ming Zhou. 2020.               man Language Technologies, Volume 1 (Long and
  Xglue: A new benchmark dataset for cross-lingual           Short Papers), pages 1599–1613, Minneapolis, Min-
  pre-training, understanding and generation.                nesota. Association for Computational Linguistics.
Kevin Lin, Ben Bogin, Mark Neumann, Jonathan Be-           A. See, Peter J. Liu, and Christopher D. Manning.
  rant, and Matt Gardner. 2019. Grammar-based neu-           2017. Get to the point: Summarization with pointer-
  ral text-to-sql generation. CoRR, abs/1905.13326.          generator networks. In ACL.
Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey        Peter Shaw, Philip Massey, Angelica Chen, Francesco
  Edunov, Marjan Ghazvininejad, Mike Lewis, and              Piccinno, and Yasemin Altun. 2019. Generating log-
  Luke Zettlemoyer. 2020. Multilingual denoising             ical forms from graph representations of text and
  pre-training for neural machine translation.               entities. In Proceedings of the 57th Annual Meet-
Shayne Longpre, Yi Lu, and Joachim Daiber. 2020.             ing of the Association for Computational Linguistics,
  Mkqa: A linguistically diverse benchmark for multi-        pages 95–106, Florence, Italy. Association for Com-
  lingual open domain question answering.                    putational Linguistics.

Wei Lu. 2014. Semantic parsing with relaxed hybrid         Tom Sherborne, Yumo Xu, and Mirella Lapata. 2020.
  trees. In Proceedings of the 2014 Conference on            Bootstrapping a crosslingual semantic parser. In
 Empirical Methods in Natural Language Processing            Findings of the Association for Computational Lin-
 (EMNLP), pages 1308–1318, Doha, Qatar.                      guistics: EMNLP 2020, pages 499–517, Online. As-
                                                             sociation for Computational Linguistics.
Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol
  Vinyals, and Lukasz Kaiser. 2016. Multi-task se-         Alane Suhr, Ming-Wei Chang, Peter Shaw, and Ken-
  quence to sequence learning. In International Con-         ton Lee. 2020. Exploring unexplored generalization
  ference on Learning Representations.                       challenges for cross-database semantic parsing. In
Proceedings of the 58th Annual Meeting of the Asso-     Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V.
  ciation for Computational Linguistics, pages 8372–        Le, Mohammad Norouzi, Wolfgang Macherey,
  8388, Online. Association for Computational Lin-          Maxim Krikun, Yuan Cao, Qin Gao, Klaus
  guistics.                                                 Macherey, Jeff Klingner, Apurva Shah, Melvin John-
                                                            son, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws,
Raymond Hendy Susanto and Wei Lu. 2017a. Neural             Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith
  Architectures for Multilingual Semantic Parsing. In       Stevens, George Kurian, Nishant Patil, Wei Wang,
  Proceedings of the 55th Annual Meeting of the As-         Cliff Young, Jason Smith, Jason Riesa, Alex Rud-
  sociation for Computational Linguistics (Volume 2:        nick, Oriol Vinyals, Greg Corrado, Macduff Hughes,
  Short Papers), pages 38–44, Stroudsburg, PA, USA.         and Jeffrey Dean. 2016. Google’s neural machine
                                                            translation system: Bridging the gap between human
Raymond Hendy Susanto and Wei Lu. 2017b. Seman-             and machine translation. CoRR, abs/1609.08144.
  tic parsing with neural hybrid trees. In AAAI Confer-
  ence on Artificial Intelligence, San Francisco, Cali-   Haoran Xu and Philipp Koehn. 2021. Zero-shot cross-
  fornia, USA.                                              lingual dependency parsing through contextual em-
                                                            bedding transformation.
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014.      Weijia Xu, Batool Haider, and Saab Mansour. 2020.
   Sequence to sequence learning with neural networks.      End-to-end slot alignment and recognition for cross-
   CoRR, abs/1409.3215.                                     lingual NLU. In Proceedings of the 2020 Confer-
                                                            ence on Empirical Methods in Natural Language
Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Na-        Processing (EMNLP), pages 5052–5063, Online. As-
  man Goyal, Vishrav Chaudhary, Jiatao Gu, and An-          sociation for Computational Linguistics.
  gela Fan. 2020. Multilingual translation with exten-
  sible multilingual pretraining and finetuning.          Pengcheng Yin and Graham Neubig. 2017. A syntactic
                                                            neural model for general-purpose code generation.
Jörg Tiedemann, Željko Agić, and Joakim Nivre. 2014.       In Proceedings of the 55th Annual Meeting of the As-
   Treebank translation for cross-lingual parser induc-     sociation for Computational Linguistics (Volume 1:
   tion. In Proceedings of the Eighteenth Confer-           Long Papers), pages 440–450, Vancouver, Canada.
   ence on Computational Natural Language Learning,
   pages 130–140, Ann Arbor, Michigan. Association        Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga,
   for Computational Linguistics.                           Dongxu Wang, Zifan Li, James Ma, Irene Li,
                                                            Qingning Yao, Shanelle Roman, Zilin Zhang, and
Shyam Upadhyay, Manaal Faruqui, Gokhan Tur, Dilek           Dragomir Radev. 2018. Spider: A Large-Scale
  Hakkani-Tur, and Larry Heck. 2018. (almost) zero-         Human-Labeled Dataset for Complex and Cross-
  shot cross-lingual spoken language understanding.         Domain Semantic Parsing and Text-to-SQL Task. In
  In Proceedings of the IEEE ICASSP.                        Proceedings of the 2018 Conference on Empirical
                                                            Methods in Natural Language Processing, Brussels,
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob            Belgium.
  Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz          Mengjie Zhao, Yi Zhu, Ehsan Shareghi, Roi Reichart,
  Kaiser, and Illia Polosukhin. 2017. Attention is all     Anna Korhonen, and Hinrich Schütze. 2020. A
  you need. In Advances in Neural Information Pro-         closer look at few-shot crosslingual transfer: Vari-
  cessing Systems 30, pages 5998–6008. Curran Asso-        ance, benchmarks and baselines.
  ciates, Inc.
                                                          Victor Zhong, Mike Lewis, Sida I. Wang, and Luke
Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr          Zettlemoyer. 2020. Grounded adaptation for zero-
  Polozov, and Matthew Richardson. 2019. Rat-sql:           shot executable semantic parsing. In Proceedings of
  Relation-aware schema encoding and linking for            the 2020 Conference on Empirical Methods in Nat-
  text-to-sql parsers.                                      ural Language Processing (EMNLP), pages 6869–
                                                            6882, Online. Association for Computational Lin-
Yushi Wang, Jonathan Berant, and Percy Liang. 2015.         guistics.
  Building a Semantic Parser Overnight. In Proceed-
  ings of the 53rd Annual Meeting of the Association      Qile Zhu, Haidar Khan, Saleh Soltan, Stephen Rawls,
  for Computational Linguistics and the 7th Interna-        and Wael Hamza. 2020. Don’t parse, insert: Multi-
  tional Joint Conference on Natural Language Pro-          lingual semantic parsing with insertion based decod-
  cessing (Volume 1: Long Papers), volume 1, pages          ing. In Proceedings of the 24th Conference on Com-
  1332–1342, Stroudsburg, PA, USA.                          putational Natural Language Learning, pages 496–
                                                            506, Online. Association for Computational Linguis-
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien            tics.
  Chaumond, Clement Delangue, Anthony Moi, Pier-
  ric Cistac, Tim Rault, R’emi Louf, Morgan Funtow-       A    Additional Results
  icz, and Jamie Brew. 2019. Huggingface’s trans-
  formers: State-of-the-art natural language process-
  ing. ArXiv, abs/1910.03771.
EN                     Avg.    Ba.     Bl.    Ca.    Ho.     Pu.    Res.    Rec.    So.
           DLF only               80.5   90.0    66.7    82.7   76.7    75.8   87.7    83.3   80.9
           DLF + DNL              81.3   88.5    60.9    83.3   78.8    83.9   88.6    83.8   82.6
           DLF + DNL + LP         81.4   87.7    64.4    85.1   77.8    80.1   88.0    85.6   83.0

Table 3: Denotation accuracy for Overnight (Wang et al., 2015) using the source English data. Domains are
Basketball, Blocks, Calendar, Housing, Publications, Recipes, Restaurants and Social Network.

           DE                     Avg.    Ba.     Bl.    Ca.    Ho.     Pu.    Res.    Rec.    So.
            Baseline
           Back-translation       60.1   75.7    50.9    61.0   55.6    50.4   69.9    46.3   71.4
            Our model
           DLF only               58.4   70.3    51.1    61.9   54.0    49.7   65.4    42.1   73.1
           DLF + DNL              62.7   73.1    56.1    66.1   58.7    49.7   70.2    57.9   70.1
           DLF + DNL + LP         64.3   78.8    56.6    68.5   58.7    46.6   70.8    59.3   75.5

Table 4: Denotation accuracy for Overnight (Wang et al., 2015) using the German test set from Sherborne et al.
(2020). Domains are Basketball, Blocks, Calendar, Housing, Publications, Recipes, Restaurants and Social Net-
work.

           ZH                     Avg.    Ba.     Bl.    Ca.    Ho.     Pu.    Res.    Rec.    So.
            Baseline
           Back-translation       48.1   62.3    39.6    49.8   43.1    48.3   51.4    29.2   61.2
            Our model
           DLF only               48.0   53.7    49.6    53.0   50.8    36.0   52.1    23.1   65.3
           DLF + DNL              49.5   56.5    49.4    55.4   56.1    35.4   54.2    24.9   64.1
           DLF + DNL + LP         52.7   59.1    50.1    67.9   54.5    41.6   59.6    20.8   67.6

Table 5: Denotation accuracy for Overnight (Wang et al., 2015) using the Chinese test set from Sherborne et al.
(2020). Domains are Basketball, Blocks, Calendar, Housing, Publications, Recipes, Restaurants and Social Net-
work.
You can also read