A Survey on Retrieval-Augmented Text Generation

Page created by Clifford Walsh

Finance

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

A Survey on Retrieval-Augmented Text Generation

A Survey on Retrieval-Augmented Text Generation
                                                    Huayang Li♥,∗ Yixuan Su♠,∗ Deng Cai♦,∗ Yan Wang♣,∗ Lemao Liu♣,∗
                                                      ♥
                                                        Nara Institute of Science and Technology ♠ University of Cambridge
                                                            ♦
                                                              The Chinese University of Hong Kong ♣ Tencent AI Lab
                                                          li.huayang.lh6@is.naist.jp, ys484@cam.ac.uk
                                                       thisisjcykcd@gmail.com, brandenwang@tencent.com
                                                                              lemaoliu@gmail.com

                                                                  Abstract                      firstly present the generic paradigm of retrieval-
                                                                                                augmented generation as well as three key com-
                                            Recently, retrieval-augmented text generation       ponents under this paradigm, which are retrieval
                                            attracted increasing attention of the compu-        sources, retrieval metrics and generation models.
                                            tational linguistics community. Compared
arXiv:2202.01110v1 [cs.CL] 2 Feb 2022

                                                                                                   Then, we introduce notable methods about
                                            with conventional generation models, retrieval-
                                            augmented text generation has remarkable ad-        retrieval-augmented generation, which are orga-
                                            vantages and particularly has achieved state-of-    nized with respect to different tasks. Specifically,
                                            the-art performance in many NLP tasks. This         on the dialogue response generation task, exem-
                                            paper aims to conduct a survey about retrieval-     plar/template retrieval as an intermediate step has
                                            augmented text generation. It firstly highlights    been shown beneficial to informative response gen-
                                            the generic paradigm of retrieval-augmented         eration (Weston et al., 2018; Wu et al., 2019; Cai
                                            generation, and then it reviews notable ap-
                                                                                                et al., 2019a,b). In addition, there has been growing
                                            proaches according to different tasks including
                                            dialogue response generation, machine trans-
                                                                                                interest in knowledge-grounded generation explor-
                                            lation, and other generation tasks. Finally, it     ing different forms of knowledge such as knowl-
                                            points out some important directions on top of      edge bases and external documents (Dinan et al.,
                                            recent methods to facilitate future research.       2018; Zhou et al., 2018; Lian et al., 2019; Li et al.,
                                                                                                2019; Qin et al., 2019; Wu et al., 2021; Zhang et al.,
                                        1   Introduction                                        2021). On the machine translation task, we summa-
                                                                                                rize the early work on how the retrieved sentences
                                        Retrieval-augmented text generation, as a new           (called translation memory) are used to improve
                                        text generation paradigm that fuses emerging deep       statistical machine translation (SMT) (Koehn et al.,
                                        learning technology and traditional retrieval tech-     2003) models (Simard and Isabelle, 2009; Koehn
                                        nology, has achieved state-of-the-art (SOTA) per-       and Senellart, 2010) and in particular, we inten-
                                        formance in many NLP tasks and attracted the at-        sively highlight several popular methods to inte-
                                        tention of the computational linguistics community      grating translation memory to NMT models (Gu
                                        (Weston et al., 2018; Dinan et al., 2018; Cai et al.,   et al., 2018; Zhang et al., 2018; Xu et al., 2020;
                                        2021). Compared with generation-based counter-          He et al., 2021). We also review the applications
                                        part, this new paradigm has some remarkable ad-         of retrieval-augmented generation in other genera-
                                        vantages: 1) The knowledge is not necessary to be       tion tasks such as abstractive summarization (Peng
                                        implicitly stored in model parameters, but is explic-   et al., 2019), code generation (Hashimoto et al.,
                                        itly acquired in a plug-and-play manner, leading        2018), paraphrase (Kazemnejad et al., 2020; Su
                                        to great scalibility; 2) Instead of generating from     et al., 2021b), and knowledge-intensive generation
                                        scratch, the paradigm generating text from some re-     (Lewis et al., 2020b). Finally, we also point out
                                        trieved human-written reference, which potentially      some promising directions on retrieval-augmented
                                        alleviates the difficulty of text generation.           generation to push forward the future research.
                                           This paper aims to review many representative
                                        approaches for retrieval-augmented text generation      2     Retrieval-augmented Paradigm
                                        tasks including dialogue response generation (We-
                                        ston et al., 2018), machine translation (Gu et al.,     2.1    Formulation and Motivation
                                        2018) and others (Hashimoto et al., 2018). We
                                                                                                Most text generation tasks can be formulated as a
                                            ∗
                                                All authors contribute equally.                 mapping from input sequence x to output sequence

Information Retrieval
Sources Training Unsupervised Sec. 3: Dialogue Sec. 4: Machine Sec. 5: Other
External Data Tasks:
(Sec. 2.2): Corpus Data Generation Translation Tasks

Metrics Sparse-vector Dense-vector Task-specific Models Data Attention Skeleton &
(Sec. 2.3): Retrieval Retrieval Retrieval (Sec 2.4): Augmentation Mechanism Templates

Input Retrieval Memory Generation Model Output

Figure 1: The overview of this survey.

y : y = f (x). For example, x and y could be et al., 2018; Weston et al., 2018). In the inference
the dialogue history and its response in dialogue time, retrieved examples with high relevant scores
generation, sequences in source language and target could be regarded as extra references and reduce
language in machine translation, and so on. model’s uncertainty in generation. The main moti-
Recently, some researchers suggest to endow vation of those works is to to store knowledge not
models the capability to access external memory only in the model parameters but also in an explicit
via some information retrieval techniques, so that and accessible form, making the model be able to
they can acquire more information in the generation re-access it during inference.
process (Gu et al., 2018; Weston et al., 2018; Cai Some researchers also propose to retrieval rel-
et al., 2019b). The retrieval-augmented generation evant samples from external datasets (Su et al.,
can be further formulated as: 2021c; Xiao et al., 2021). In these studies, the re-
trieval pool is different with the training corpus,
y = f (x, z) (1)
which can further provide additional information
where z = {hxr , y r i} is a set of relevant instances that are not contained in the training corpus. This
retrieved from the original training set or external is especially beneficial for applications such as do-
datasets. The main idea of this paradigm is that main adaptation and knowledge update. For exam-
y r may benefit the response generation, if xr (or ple, Khandelwal et al. (2020a); Zheng et al. (2021a)
y r ) is extremely relevant to the input x. It is worth employ the in-domain dataset as the external mem-
noting that xr = ∅ when unsupervised retrieval ory to achieve fast domain adaptation for machine
sources are used. More details about how to get z translation.
will be discussed in §2.3. One limitation for previous two sources is that
In this section, we will briefly introduce some the datasets have to be supervised datasets con-
basic IR techniques. In general, the retrieval mem- sisting of aligned input-output pairs. For machine
ory can be retrieved from three kinds of sources: translation, Cai et al. (2021) propose a cross-lingual
the training corpus, external datasets in the same retriever to directly retrieve target sentence from
format with the training corpus, and large-scale unsupervised monolingual corpus. The main idea
unsupervised corpus (§2.2). Metrics that evaluate is aligning source-side sentences and the corre-
the relevance between text are varied as well, in sponding target-side translations in a dense vector
§2.3 we divided them into three categories: sparse- space, i.e., aligning x and y r when xr is absent.
vector retrieval, dense-vector retrieval, and training- As a result, the retriever directly connects the dots
based retrieval. Finally, how to integrate the re- between the source-side input and target-side trans-
trieval memory to the generation model is also sig- lations, enabling monolingual data in the target
nificant, we also introduce some popular integra- language to be used alone as memories.
tion approaches in §2.4.
2.3 Retrieval Metrics
2.2 Retrieval Sources Given an input sequence x and a retrieval corpus,
Most previous studies search the external memory retrieval model aims to retrieve a set of relevant
from its training corpus (Song et al., 2016; Gu examples z = {hxr , y r i} from the corpus. When

a supervised corpus is used, {hxr , y r i} is retrieved 2017) a key module in lots of NLP models, integrat-
by measuring the similarity between x and xr . ing retrieved memory through attention becomes a
For similarity measurement, sparse-vector very nature and efficient way.
retrieval methods such as TF-IDF and In previous two methods, an NLP model learns
BM25 (Robertson and Zaragoza, 2009) are how to filter out irrelevant or even harmful informa-
widely used. They match keywords efficiently tion from the retrieved examples implicitly. There
with an inverted index. However, these methods also exist some works that try to explicitly extract
prefer examples with similar surfaces, and may useful information, i.e., skeleton extraction, from
fail to retrieve examples that are only semantically the retrieved memory (Cai et al., 2019a; Wu et al.,
relevant. 2019; Cai et al., 2019b). For example, one skeleton
To alleviate above problem, some studies (Cao should be a part of a whole utterance with irrelevant
and Xiong, 2018) attempt to retrieve in dense- content masked, and the generation model only in-
vector space instead of the lexical overlap. Re- tegrate this skeleton in the generation process.
cent work (Lee et al., 2019) makes use of pre-
trained language models, which encodes the text to 3 Dialogue Response Generation
low-dimensional dense vectors via BERT-based en-
coders. The retrieval score are computed via inner Background Dialogue systems can be grouped
products between vectors. into two categories: chit-chat systems and task-
oriented systems. While task-oriented dialogue
Similarity-based retrieval is based on a simple
systems are designed to accomplish specific user
heuristic. That is, the more xr resembles with x,
tasks such as air tickets booking, chit-chat dialogue
the more likely xr and y r will help the generation.
systems aim at giving a meaningful and fluent re-
However, the most similar one by universal textual
sponse for any dialogue history in the open domain.
similarity does not necessarily serve the best for
Dialogue response generation in chit-chat dialogue
downstream models. Ideally, the retrieval metric
system is challenging partly due to the diversity
would be learned from the data in a task-dependent
of possible responses to a single dialogue history
way: we wish to consider a memory only if it can
(i.e., the one-to-many problem). The dialogue his-
indeed boost the quality of final generation. Cai
tory alone cannot decide a meaningful and specific
et al. (2021) propose to unify the memory retriever
response. Also, external knowledge that is not
and its downstream NMT model into a learnable
present in the dialogue history are often necessary
whole. Such memory retrieval is end-to-end opti-
for avoiding safe but boring responses. We focus
mized for task-specific objectives.
on recent efforts tackling the challenges to develop
chit-chat dialogue systems.
2.4 Integration
Most modern chit-chat dialogue systems can
There are several ways to integrate the retrieved be categorized into two classes, namely, retrieval-
external memory in generation. One straightfor- based models and generation-based models. The
ward way is data augmentation, which constructs retrieval-based models (Ji et al., 2014; Hu et al.,
some augmented inputs by concatenating spans 2014) directly copy an existing response from cu-
from {hxr , y r i} with the original input x. By train- rated dialogue corpora (i.e., the retrieval pool)
ing on the augmented inputs, a generation model when receiving a response request. The retrieved
implicitly leans how to integrate the retrieved infor- responses are often informative and grammatical
mation. Despite the simplicity, this kind of methods as they are collected from real-world conversa-
works efficiently in lots of tasks (Song et al., 2016; tions and possibly post-edited by a human. How-
Weston et al., 2018; Bulte and Tezcan, 2019). ever, such systems perform poorly when a given
Another integration method is based on the at- dialogue history is substantially different from
tention mechanism (Bahdanau et al., 2014). The those in the retrieval pool. On the other hand,
main idea of this fashion is adopting additional en- the generation-based models (Shang et al., 2015;
coders (in various architectures) to encode retrieved Vinyals and Le, 2015; Li et al., 2016a) generate
target sentences, and integrate them through atten- a new utterance from scratch. Those generation-
tion (Cao and Xiong, 2018; Gu et al., 2018; Bapna based models have better generalization capacity
and Firat, 2019). Since the attention mechanism is when handling unseen dialogue contexts. Never-
becoming (Bahdanau et al., 2014; Vaswani et al., theless, the generated utterances are inclined to be

dull and non-informative (e.g., “I don’t know”, “I generating the skeletons used for training, which
think so”, “Me too” etc.) (Li et al., 2016a). extract skeletons from the corresponding responses
with some deliberate disturbance. Paranjape et al.
Shallow Integration As discussed, retrieval- (2021) propose to model the retriever after the pos-
based models may give informative but inappro- terior distribution of retrieval given the input and
priate responses while generation-based models the target output and train it jointly with the stan-
often do the opposite. It is desirable to combine the dard retriever and the generator by maximizing the
best of both worlds. Early work (Qiu et al., 2017) evidence lower bound (ELBo) in expectation over
attempts to re-rank the output from both models. retrieval.
For a deep integration, Song et al. (2016) and Yang
et al. (2019) extend the standard S EQ 2S EQ encoder- Knowledge-Enhanced Generation The afore-
decoder model (Bahdanau et al., 2014) with an ex- mentioned work demonstrates that retrieval-based
tra encoder for encoding the retrieval result. The dialogue systems can be used for building bet-
output of the extra encoder, along with the output ter generation-based models. In general, this is
from the original encoder for dialogue history, is done by conditioning the generation on some re-
used to feed the decoder. Weston et al. (2018) use trieved responses. More traditionally, to infuse
a single encoder that takes the concatenation of the response with external knowledge, the retrieval
the original dialogue history and the retrieved as pool is not necessarily a dialogue corpus. In fact,
input. Wu et al. (2019) note that the retrieved infor- knowledge-grounded dialogue response generation
mation should be used in awareness of the context exploring different forms of knowledge such as
difference, and further proposed to construct an knowledge bases and external documents (Dinan
edit vector by explicitly encoding the lexical differ- et al., 2018; Zhou et al., 2018; Lian et al., 2019;
ences between the input dialogue history and the Li et al., 2019; Qin et al., 2019; Wu et al., 2021;
retrieved dialogue history. Pandey et al. (2018) fur- Zhang et al., 2021; Komeili et al., 2021) has been
ther propose to weight different training instances actively explored.
by context similarity.
Limitations We note that there are three major
limitations in existing work for dialogue response
Deep Integration To prevent the inflow of er-
generation. First, current methods only use one
roneous information, Cai et al. (2019a) propose
retrieved response for generation. It can be more
a general framework that first extracts a skeleton
beneficial to combine multiple retrieval responses.
from the retrieved response and then generates the
However, this can be difficult due to the one-to-
response based on the extracted skeleton. This
many nature of dialogue response generation. Sec-
framework is also adopted for stylistic response
ond, current methods use universal relevance score
generation (Su et al., 2021c). Gupta et al. (2021)
for retrieval. It can be more effective if we can
suggest to use the semantic structure of an exem-
use more customized retrieval metric especially
plar response, instead of the tokens of the exem-
for controlled dialogue response generation (e.g.,
plar response, to guide generation. Despite their
persona, emotion, etc). Third, the retrieval pool
differences, a common issue is that the genera-
of existing methods is limited to dialogue corpora
tion model easily learns to ignore the retrieved re-
(context-response pairs) or documents. It might
sponse entirely and collapses to a vanilla seq2seq
be useful to enlarge the retrieval pool by including
model. This happens with improper training in-
more corpora in other domains or in other modali-
stances. Due to the one-to-many nature, it hap-
ties. As discussed, there leaves plenty of possible
pens frequently that a retrieved response (extracted
directions to explore in the future.
skeleton) is suitable for responding to the query,
but inconsistent with the current target response.
4 Machine Translation
Earlier studies (Weston et al., 2018; Wu et al.,
2019; Cai et al., 2019a) alleviate the above prob- Retrieval augmented translation originates from hu-
lems by putting hard constraints on the data (e.g., man translation scenarios (Somers, 2003). When
discarding data with low similarity of the retrieved translating ŷ from an input source sentence x, a hu-
response and the target response), which, however, man translator typically involves a search engine to
greatly reduces the amount of usable data. Cai retrieve similar sentences {hxr , y r i} from a bilin-
et al. (2019b) employ a random mechanism for gual database. Such a technique called translation

memory is helpful to improve the translation qual-      translation rules into the phrase table in a shallow
ity and efficiency for human translators (Dillon        combination way. They introduce an additional fea-
and Fraser, 2006). As the development of ma-            ture to indicate that whether translation rule is from
chine translation techniques, there is a surge of       {hxr , y r i} or not and then train all feature weights
interests in improving machine translation models       with MERT (Och, 2003). One characteristic of
with translation memory. In the rest of this section,   these work is that a translation rule extracted from
we will review translation memory for both statisti-    {hxr , y r i} which can not exactly match any seg-
cal machine translation (SMT) and neural machine        ments in x is useless even if it may contain some
translation (NMT).                                      useful words in its target side. To remedy this ob-
                                                        servation, Wang et al. (2013, 2014) resort to a deep
4.1   Translation Memory in SMT                         combination way to using the extracted translation
Generally, SMT includes three key components in         rules. For each rule in the phrase table, it designs
a pipeline manner such as phrase table extraction,      a generative model to reward the rules which are
parameter tuning and decoding (Koehn et al., 2003;      similar to those extracted from {hxr , y r i}. Then
Chiang, 2007). As a result, many efforts have been      this generative model is used as a feature in the log-
made to make use of translation memory (TM) on          linear based SMT model whose weight is tuned
top of each component.                                  together with other features by MERT. In addition,
Constrained Decoding with TM Constrained                Li et al. (2014) employ a similar way to reward
decoding is the most straightforward way to in-         the rules but it relies on a discriminative model
tegrating TM into SMT (Smith and Clark, 2009;           which is easy to integrate potential features from
Koehn and Senellart, 2010; Zhechev and Van Gen-         {hxr , y r i}.
abith, 2010; Ma et al., 2011). Its basic idea is
                                                        Parameter Tuning with TM Unlike the above
to reuse the useful segments in y r while trans-
                                                        two research lines, Liu et al. (2012, 2014) make use
late other segments by SMT. Specifically, the ap-
                                                        of translation memory only in tuning parameters.
proach consists of three steps: 1) identify the un-
                                                        To be specific, when translating an input sentence
matched segments in both xr and x through the
                                                        x, they firstly retrieve many similar bilingual sen-
edit-distance algorithm; 2) identify the unmatched
                                                        tences {hxr , y r i}, and then tune the parameters on
segments in y r , each of which is aligned to one
                                                        top of the retrieved sentences as well as a given de-
unmatched segment in xr by a word alignment
                                                        velopment dataset in a sentence-wise manner, i.e.,
algorithm; 3) decode each unmatched segment in
                                                        it performs an independent tuning for each input
x by SMT and then use the result to replace its
                                                        sentence. To improve the efficiency of each tuning
corresponding unmatched segment in y r . Li et al.
                                                        step, it propose a local update on top of {hxr , y r i}
(2016b) further extend this approach from sentence
                                                        from a baseline model.
level to phrase level. The advantage in constrained
                                                           Despite the successes of translation memory in
decoding is that it does not require to change the
                                                        SMT, there are still some limitations for the above
translation model (including phrase table and pa-
                                                        three kinds of methods. Firstly, all these methods
rameters) and can be applied in a plug-and-play
                                                        employ fuzzy score for retrieval which is highly de-
way. This approach is successful when x is highly
                                                        pendent on word matching and thus can not recall
similar to xr ; otherwise its performance is de-
                                                        such examples which are similar in word seman-
graded largely, because it explicitly isolates TM
                                                        tics but different in surface form. Secondly, these
matching and SMT decoding and reuses the results
                                                        methods integrate the retrieved examples into a
in xr or not in a deterministic way.
                                                        module of SMT in the ways which can not make
Phrase Table Aggregation with TM There are              full use of the knowledge in retrieved examples.
also notable efforts to augment the phrase table        For example, the integration ways in the first two
for SMT by extracting translation rules from the        kinds (constrained decoding and phrase table ag-
retrieved bilingual sentences {hxr , y r i}. Then       gregation) are heuristic and not optimized towards
they re-tune the parameters for the SMT model           translation quality; the parameter tuning method
which makes use of translation knowledge from           fine-tunes few parameters for log-linear based SMT
{hxr , y r i} in a implicit way when translating x.     which are not enough to preserve sufficient knowl-
For example, Biçici and Dymetman (2008); Simard         edge from retrieved examples. Thirdly, since SMT
and Isabelle (2009) directly combine the extracted      performs in a pipeline manner, it is intractable to

jointly optimize retrieval metrics as well as SMT a light-weight network to learn the reward score.
models. Consequently, all these methods adopt an Since dense retrieval has the potential of cross-
off-the-shelf metric for retrieval, leading to sub- lingual retrieval, Zheng et al. (2021b) use a similar
optimal performance. approach to achieve unsupervised domain adapta-
tion, where a main change is to create the datastore
4.2 Translation Memory in NMT based on synthetic sources sentence and the real
Translation memory has been widely explored in target sentences.
Neural Machine Translation (NMT). Depending
on when retrieval is involved, we can categorize Training Phase Different from those model-
previous works into two classes: 1) an NMT model agnostic approaches, previous works in this line
leans how to cooperate with the retrieval model in aim to train the generation model to learn how
the training phase; 2) an NMT model is only aware to cooperate with the retrieval model. It is also
of the retrieved data in the inference phase. worth noting that most works in this line adopt
the sentence-level retrieval, when integrating the
Inference Phase The key point of literature in retrieval information in the training process. To
this line is to reward some target words based on achieve its goal, Bulte and Tezcan (2019) and
words in y r in the inference process. Thus, a de- Hossain et al. (2020) propose a data augmenta-
cision can be made based on both the distribution tion method to integrate the retrieved information,
of generation model and the additional reward of where x is concatenated with y r before feeding
retrieval model. Some previous works propose to into the model . Following the data augmentation
reward target words based on the sentence-level approach, Xu et al. (2020) propose more matching
similarity between x and xr , and the word align- methods to determine including which retrieved
ment between xr and y r . Given the input sentence example in the source is better.
x, Zhang et al. (2018) try to assign target words There also exist some works that propose new
in ŷ with higher rewards, when they appear in y r architectures to integrate the retrieval information.
and the aligned source words are in both xr and Under the RNN-based framework, Cao and Xiong
x. He et al. (2019) follow a similar framework (2018) and Gu et al. (2018) use the gating and at-
and consider the position information of those tar- tention mechanism to incorporate the retrieved tar-
get words when rewarding. Those works reward get sentences. When Transformer (Vaswani et al.,
the target words in an explicit way, however, the 2017) becomes the backbone of NMT, some works
one-sentence-one-model approach (Li et al., 2016c; also use additional transformer encoders to en-
Turchi et al., 2017) propose to reward target word code retrieved target sentences, and integrate them
implicitly. For each testing input x, their approach through attention mechanism (Bapna and Firat,
will first finetune the translation model on retrieved 2019; Cao et al., 2019). Xia et al. (2019) repre-
memory {hxr , y r i} and then translate x. sent the retrieved target sentences in a different
Others try to reward target words based on token- data structure, i.e., a graph structure, and integrate
level similarity score. Most works in this line are it through attention mechanism. He et al. (2021)
based on the dense retriever (Khandelwal et al., propose a light-weight method to encode the re-
2020a), e.g., faiss. Khandelwal et al. (2020a) build trieved target sentences and leverage the alignment
a key-value datastore, where key h(xr , y r

primary feature to derive reward scores. How- augmented text generation. Peng et al. (2019)
ever, some information, e.g., frequencies of words propose an adaptive decoding framework which
and context, may also be beneficial for integrating first retrieves an exemplar document given the
the translation memory. Second, it remains to be source document. Then, the summarization of the
an open question that when should we use the resource document is derived through an adaptive
trieved information and when not. In the inference generation process based on the retrieved template.
phase, approaches tend to integrate the translation Different from Peng et al. (2019), Cao et al.
memory excessively, e.g., at each time step, which (2018) and Hossain et al. (2020) introduce an
not only reduces the translation efficiency but may intermediate re-ranking stage into the generation
also dampen the fluency of generated results. pipeline. Specifically, before generating the
document summary, the retrieval documents are
5 Other Tasks first re-ranked based on their similarity scores
with respect to the source document. Then, the
In addition to dialogue system and machine trans-
document summarization is produced by re-writing
lation, retrieval-augmented generation techniques
the selected templates.
have shown to be beneficial in many other tasks. In
the following, we highlight several key tasks that Paraphrase Generation To address the lack of
apply retrieval-augmented generation approaches.1 quality as well as diversity in the generation of para-
phrases, Kazemnejad et al. (2020) propose a gen-
Language Modelling It has been shown that
eration framework which first retrieves a sentence
properly leveraging information from retrieval
that is similar to input sentence. Then, based on
memory could improve the performance of large
the retrieved sentence, a neural editor produces the
pre-trained language model. To build a more accu-
resulting paraphrased sentence. Chen et al. (2019)
rate language model, Khandelwal et al. (2020b) pro-
investigate a different aspect of paraphrasing, i.e.
pose to incorporate a soft memory module into the
how to control the linguistic syntax displayed in
system. Specifically, an index is built by caching
the generated text. To achieve this goal, Chen et al.
the hidden states of the training corpus. Then, the
(2019) propose to first extract a sentential exem-
language model accesses the index via k-NN search
plar that serves as the syntax template. A neural
and displays a greatly improved performance. As
model then generates the paraphrase with desired
another example, Guu et al. (2020) propose a new
linguistic syntax following the retrieved exemplar.
paradigm that applies retrieval-augmented tech-
nique into the pre-training of generative language Text Style Transfer To improve the quality of
model. During learning, they train a neural se- generated text, Li et al. (2018) propose a retrieval-
lector that dynamically samples a relevant text to augmented framework which first retrieves texts
guide the reconstruction of a corrupted input se- that are similar to the input based on lexical-level
quence. In this way, the pre-trained model deliv- similarity. Then, the retrieved tokens that are irrel-
ers better results by explicitly grounding on the evant to the source are deleted, and the output is
retrieval memory. Lewis et al. (2020a) combine derived from the edited template. Xiao et al. (2021)
language model pre-training with a paraphrasing also adopte this framework by incorporating re-
approach. During learning, an input sequence to trieval information from two sources (i.e. sparse
the model is first corrupted. In the meantime, a set and dense memories) and obtained an improved
of multi-lingual texts are retrieved based on which model performance.
the model learns to reconstruct the original input
sequence. Recently, Borgeaud et al. (2021) pro- Data-to-Text Generation Recently, retrieval-
pose RETRO, a large pre-trained language model augmented generation has been adapted to the task
enhanced with retrieved documents, and obtained of data-to-text generation. To bridge the gap be-
comparable performances with GPT-3 using 25× tween the structured data and natural language
fewer parameters. text, Su et al. (2021a) propose a novel retrieval-
augmented framework. Specifically, given the
Summarization Text summarization is another source data, a set of candidate texts are first re-
research area that benefits from retrieval- trieved from a large unlabelled corpus. Then, a
1
Here, we focus on tasks other than question answering. neural selector is applied to measure the similari-
We refer readers interested in QA to Chen and Yih (2020). ties between the source data and candidate texts,

and extract a set of more fine-grained prototypes and generation models. However, in practice, there
from the candidates. Lastly, a generation model is an essential gap about the retrieval metric be-
takes the prototypes as input to produce the text tween the training and inference phrases. In the
that describes the given structured data. training phase, the loss is locally back-propagated
While retrieval-augmented generation has been to only a few retrieved examples while in the infer-
widely explored in the NLP community, we sug- ence phase the metric is globally conducted among
gest that future research could extend this approach all examples in the memory. It would be interesting
to tasks that involve data from multiple modali- to narrow such a gap when learning a better metric
ties. For instance, with recent advancements in for generation tasks.
image-text retrieval (Jia et al., 2021; Radford et al.,
2021), the structural gap between images and texts Multi-Modalities With recent advancement in
is largely bridged. Some early studies (Zhang et al., image-text retrieval, directly associating images
2020) have shown that information retrieved from with relevant text becomes possible. This urges
images could improve the performance of neural researchers to investigate the possibility of retrieval-
machine translation model. Naturally, such meth- based text generation in tasks that involve data from
ods could be extended to other multi-modal tasks, different modalities. One typical task is image
such as image captioning (Karpathy and Li, 2015). captioning. Beyond images, other tasks like speech-
A similar idea could also be applied to tasks be- to-text transcription could potentially benefit from
yond images, such as speech-to-text transcription retrieval-based generation methods as well.
(Gales and Young, 2007).
Diverse & Controllable Retrieval Most of the
6 Future Directions existing approaches adopt a universal metric for
Despite the current success of retrieval augmented retrieval, such as lexical similarities of sentences.
text generation, there is still a long way to go as Future work should explore how to use customized
discussed in previous sections. We highlight some metrics for retrieval. This can be beneficial for
directions to facilitate the future research as fol- more controlled text generation. For example, in-
lows: stances with emotions and styles may be more de-
sirable in the personalized dialogue generation, par-
Retrieval Sensitivity The performance of re- allel data that contains specific terminologies is
trieval augmented text generation is very sensitive more helpful in machine translation, and so on. On
to the retrieval quality, i.e., the similarity between the other hand, using a universal metric for retrieval
the query and the retrieved examples. Currently, re- may lead to the lack of diversity of the retrieval re-
trieval augmented text generation models perform sults. Collecting a diverse set of retrieval results
well when the retrieved examples are very simi- can improve the coverage of useful information.
lar to the query. However, they are even worse Thus, considering multiple different metrics for re-
than the generation models without retrieval when trieval may lead to generation with higher quality
the retrieval examples are less similar. Therefore, in the future.
it would be important to exploit new methods to
address such an issue on similarity. 7 Conclusion
Retrieval Efficeincy Generally, if one enlarges
In this paper, we surveyed recent approaches for
the retrieval memory to some extent, it would be
retrieval-augmented text generation. We reviewed
possible to retrieve an example which is very simi-
and summarized the development of different com-
lar to the query.Unfortunately, the downside is that
ponents of retrieval-augmented text generation in-
the overall inference for the retrieval augmented
cluding retrieval metrics, retrieval sources, and in-
generation models is less efficient due the consid-
tegration paradigms. We gave in-depth discussions
erable retrieval overhead. In this sense, it is urgent
when retrieval-augmented text generation comes to
to consider some methods to trade off the retrieval
different applications including dialogue response
memory size and retrieval efficiency, for example,
generation, machine translation, and other genera-
data compression for the retrieval memory.
tion tasks. We also pointed out some future direc-
Local vs. Global Optimization Theoretically, it tions for retrieval-augmented text generation.
seems promising to jointly learn retrieval metrics

References Qian Cao, Shaohui Kuang, and Deyi Xiong. 2019.
Learning to reuse translations: Guiding neural ma-
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben- chine translation with examples. arXiv preprint
gio. 2014. Neural machine translation by jointly arXiv:1911.10732.
learning to align and translate. arXiv preprint
arXiv:1409.0473. Qian Cao and Deyi Xiong. 2018. Encoding gated
translation memory into neural machine translation.
Ankur Bapna and Orhan Firat. 2019. Non-parametric In Proceedings of the 2018 Conference on Empiri-
adaptation for neural machine translation. In Pro- cal Methods in Natural Language Processing, pages
ceedings of the 2019 Conference of the North Amer- 3042–3047.
ican Chapter of the Association for Computational
Linguistics: Human Language Technologies, Vol- Ziqiang Cao, Wenjie Li, Sujian Li, and Furu Wei.
ume 1 (Long and Short Papers), pages 1921–1931. 2018. Retrieve, rerank and rewrite: Soft template
based neural summarization. In Proceedings of the
Ergun Biçici and Marc Dymetman. 2008. Dynamic
56th Annual Meeting of the Association for Com-
translation memory: Using statistical machine trans-
putational Linguistics, ACL 2018, Melbourne, Aus-
lation to improve translation memory fuzzy matches.
tralia, July 15-20, 2018, Volume 1: Long Papers,
In International Conference on Intelligent Text Pro-
pages 152–161. Association for Computational Lin-
cessing and Computational Linguistics, pages 454–
guistics.
465. Springer.

Sebastian Borgeaud, Arthur Mensch, Jordan Hoff- Danqi Chen and Wen-tau Yih. 2020. Open-domain
mann, Trevor Cai, Eliza Rutherford, Katie Millican, question answering. In Proceedings of the 58th An-
George van den Driessche, Jean-Baptiste Lespiau, nual Meeting of the Association for Computational
Bogdan Damoc, Aidan Clark, Diego de Las Casas, Linguistics: Tutorial Abstracts, pages 34–37, On-
Aurelia Guy, Jacob Menick, Roman Ring, Tom Hen- line. Association for Computational Linguistics.
nigan, Saffron Huang, Loren Maggiore, Chris Jones,
Albin Cassirer, Andy Brock, Michela Paganini, Ge- Mingda Chen, Qingming Tang, Sam Wiseman, and
offrey Irving, Oriol Vinyals, Simon Osindero, Karen Kevin Gimpel. 2019. Controllable paraphrase gen-
Simonyan, Jack W. Rae, Erich Elsen, and Laurent eration with a syntactic exemplar. In Proceedings of
Sifre. 2021. Improving language models by retriev- the 57th Conference of the Association for Compu-
ing from trillions of tokens. CoRR, abs/2112.04426. tational Linguistics, ACL 2019, Florence, Italy, July
28- August 2, 2019, Volume 1: Long Papers, pages
Bram Bulte and Arda Tezcan. 2019. Neural fuzzy re- 5972–5984. Association for Computational Linguis-
pair: Integrating fuzzy matches into neural machine tics.
translation. In Proceedings of the 57th Annual Meet-
ing of the Association for Computational Linguistics, David Chiang. 2007. Hierarchical phrase-based trans-
pages 1800–1809. lation. computational linguistics, 33(2):201–228.

Deng Cai, Yan Wang, Wei Bi, Zhaopeng Tu, Xi- Sarah Dillon and Janet Fraser. 2006. Translators and
aojiang Liu, Wai Lam, and Shuming Shi. 2019a. tm: An investigation of translators’ perceptions of
Skeleton-to-response: Dialogue generation guided translation memory adoption. Machine Translation,
by retrieval memory. In Proceedings of the 2019 20(2):67–79.
Conference of the North American Chapter of the
Association for Computational Linguistics: Human Emily Dinan, Stephen Roller, Kurt Shuster, Angela
Language Technologies, Volume 1 (Long and Short Fan, Michael Auli, and Jason Weston. 2018. Wizard
Papers), pages 1219–1228. of wikipedia: Knowledge-powered conversational
agents. arXiv preprint arXiv:1811.01241.
Deng Cai, Yan Wang, Wei Bi, Zhaopeng Tu, Xiao-
jiang Liu, and Shuming Shi. 2019b. Retrieval- Mark J. F. Gales and Steve J. Young. 2007. The applica-
guided dialogue response generation via a matching- tion of hidden markov models in speech recognition.
to-generation framework. In Proceedings of the Found. Trends Signal Process., 1(3):195–304.
2019 Conference on Empirical Methods in Natu-
ral Language Processing and the 9th International Jiatao Gu, Yong Wang, Kyunghyun Cho, and Vic-
Joint Conference on Natural Language Processing tor OK Li. 2018. Search engine guided neural ma-
(EMNLP-IJCNLP), pages 1866–1875. chine translation. In Proceedings of the AAAI Con-
ference on Artificial Intelligence, volume 32.
Deng Cai, Yan Wang, Huayang Li, Wai Lam, and
Lemao Liu. 2021. Neural machine translation with Prakhar Gupta, Jeffrey Bigham, Yulia Tsvetkov, and
monolingual translation memory. In Proceedings of Amy Pavel. 2021. Controlling dialogue generation
the 59th Annual Meeting of the Association for Com- with semantic exemplars. In Proceedings of the
putational Linguistics and the 11th International 2021 Conference of the North American Chapter of
Joint Conference on Natural Language Processing the Association for Computational Linguistics: Hu-
(Volume 1: Long Papers), pages 7307–7318, Online. man Language Technologies, pages 3018–3029, On-
Association for Computational Linguistics. line. Association for Computational Linguistics.

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasu- Urvashi Khandelwal, Angela Fan, Dan Jurafsky, Luke
pat, and Ming-Wei Chang. 2020. REALM: retrieval- Zettlemoyer, and Mike Lewis. 2020a. Near-
augmented language model pre-training. CoRR, est neighbor machine translation. arXiv preprint
abs/2002.08909. arXiv:2010.00710.

Tatsunori B Hashimoto, Kelvin Guu, Yonatan Oren, Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke
and Percy S Liang. 2018. A retrieve-and-edit frame- Zettlemoyer, and Mike Lewis. 2020b. Generaliza-
work for predicting structured outputs. In Advances tion through memorization: Nearest neighbor lan-
in Neural Information Processing Systems, pages guage models. In 8th International Conference on
10052–10062. Learning Representations, ICLR 2020, Addis Ababa,
Ethiopia, April 26-30, 2020. OpenReview.net.
Qiuxiang He, Guoping Huang, Qu Cui, Li Li, and
Lemao Liu. 2021. Fast and accurate neural machine Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003.
translation with translation memory. In Proceed- Statistical phrase-based translation. In Proceedings
ings of the 59th Annual Meeting of the Association of the 2003 Human Language Technology Confer-
for Computational Linguistics and the 11th Interna- ence of the North American Chapter of the Associa-
tional Joint Conference on Natural Language Pro- tion for Computational Linguistics, pages 127–133.
cessing (Volume 1: Long Papers), pages 3170–3180. Philipp Koehn and Jean Senellart. 2010. Convergence
of translation memory and statistical machine trans-
Qiuxiang He, Guoping Huang, Lemao Liu, and Li Li. lation. In Proceedings of AMTA Workshop on MT
2019. Word position aware translation memory for Research and the Translation Industry, pages 21–31.
neural machine translation. In CCF International
Conference on Natural Language Processing and Mojtaba Komeili, Kurt Shuster, and Jason Weston.
Chinese Computing, pages 367–379. Springer. 2021. Internet-augmented dialogue generation.
arXiv preprint arXiv:2107.07566.
Nabil Hossain, Marjan Ghazvininejad, and Luke Zettle-
moyer. 2020. Simple and effective retrieve-edit- Kenton Lee, Ming-Wei Chang, and Kristina Toutanova.
rerank text generation. In Proceedings of the 58th 2019. Latent retrieval for weakly supervised
Annual Meeting of the Association for Computa- open domain question answering. arXiv preprint
tional Linguistics, pages 2532–2538. arXiv:1906.00300.

Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Mike Lewis, Marjan Ghazvininejad, Gargi Ghosh, Ar-
Chen. 2014. Convolutional neural network architec- men Aghajanyan, Sida Wang, and Luke Zettlemoyer.
tures for matching natural language sentences. In 2020a. Pre-training via paraphrasing. In Advances
NIPS, pages 2042–2050. in Neural Information Processing Systems 33: An-
nual Conference on Neural Information Processing
Zongcheng Ji, Zhengdong Lu, and Hang Li. 2014. An Systems 2020, NeurIPS 2020, December 6-12, 2020,
information retrieval approach to short text conver- virtual.
sation. arXiv preprint arXiv:1408.6988.
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio
Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Petroni, Vladimir Karpukhin, Naman Goyal, Hein-
Parekh, Hieu Pham, Quoc V. Le, Yun-Hsuan Sung, rich Küttler, Mike Lewis, Wen-tau Yih, Tim Rock-
Zhen Li, and Tom Duerig. 2021. Scaling up visual täschel, et al. 2020b. Retrieval-augmented gen-
and vision-language representation learning with eration for knowledge-intensive nlp tasks. arXiv
noisy text supervision. In Proceedings of the 38th In- preprint arXiv:2005.11401.
ternational Conference on Machine Learning, ICML Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao,
2021, 18-24 July 2021, Virtual Event, volume 139 of and Bill Dolan. 2016a. A diversity-promoting ob-
Proceedings of Machine Learning Research, pages jective function for neural conversation models. In
4904–4916. PMLR. NAACL, pages 110–119.
Andrej Karpathy and Fei-Fei Li. 2015. Deep visual- Juncen Li, Robin Jia, He He, and Percy Liang. 2018.
semantic alignments for generating image descrip- Delete, retrieve, generate: a simple approach to sen-
tions. In IEEE Conference on Computer Vision and timent and style transfer. In Proceedings of the 2018
Pattern Recognition, CVPR 2015, Boston, MA, USA, Conference of the North American Chapter of the
June 7-12, 2015, pages 3128–3137. IEEE Computer Association for Computational Linguistics: Human
Society. Language Technologies, NAACL-HLT 2018, New
Orleans, Louisiana, USA, June 1-6, 2018, Volume
Amirhossein Kazemnejad, Mohammadreza Salehi, and 1 (Long Papers), pages 1865–1874. Association for
Mahdieh Soleymani Baghshah. 2020. Paraphrase Computational Linguistics.
generation by learning how to edit from samples. In
Proceedings of the 58th Annual Meeting of the Asso- Liangyou Li, Andy Way, and Qun Liu. 2014. A
ciation for Computational Linguistics, pages 6010– discriminative framework of integrating translation
6021, Online. Association for Computational Lin- memory features into smt. In Proceedings of the
guistics. 11th Conference of the Association for Machine

Translation in the Americas, volume 1, pages 249– Hao Peng, Ankur P. Parikh, Manaal Faruqui, Bhuwan
260. Dhingra, and Das Dipanjan. 2019. Text generation
with exemplar-based adaptive decoding. In Proceed-
Liangyou Li, Andy Way, and Qun Liu. 2016b. Phrase- ings of the Conference of the North American Chap-
level combination of smt and tm using constrained ter of the Association for Computational Linguistics:
word lattice. Association for Computational Lin- Human Language Technologies.
guistics (ACL).
Lianhui Qin, Michel Galley, Chris Brockett, Xiaodong
Xiaoqing Li, Jiajun Zhang, and Chengqing Zong. Liu, Xiang Gao, William B Dolan, Yejin Choi, and
2016c. One sentence one model for neural machine Jianfeng Gao. 2019. Conversing by reading: Con-
translation. arXiv preprint arXiv:1609.06490. tentful neural conversation with on-demand machine
reading. In Proceedings of the 57th Annual Meet-
Zekang Li, Cheng Niu, Fandong Meng, Yang Feng, ing of the Association for Computational Linguistics,
Qian Li, and Jie Zhou. 2019. Incremental trans- pages 5427–5436.
former with deliberation decoder for document
grounded conversations. In Proceedings of the 57th Minghui Qiu, Feng-Lin Li, Siyu Wang, Xing Gao, Yan
Annual Meeting of the Association for Computa- Chen, Weipeng Zhao, Haiqing Chen, Jun Huang,
tional Linguistics, pages 12–21. and Wei Chu. 2017. Alime chat: A sequence to se-
quence and rerank based chatbot engine. In ACL,
Rongzhong Lian, Min Xie, Fan Wang, Jinhua Peng, pages 498–503.
and Hua Wu. 2019. Learning to select knowledge
for response generation in dialog systems. arXiv Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya
preprint arXiv:1902.04911. Ramesh, Gabriel Goh, Sandhini Agarwal, Girish
Sastry, Amanda Askell, Pamela Mishkin, Jack Clark,
Lemao Liu, Hailong Cao, Taro Watanabe, Tiejun Zhao, Gretchen Krueger, and Ilya Sutskever. 2021. Learn-
Mo Yu, and Conghui Zhu. 2012. Locally training ing transferable visual models from natural lan-
the log-linear model for smt. In Proceedings of the guage supervision. In Proceedings of the 38th In-
2012 Joint Conference on Empirical Methods in Nat- ternational Conference on Machine Learning, ICML
ural Language Processing and Computational Natu- 2021, 18-24 July 2021, Virtual Event, volume 139 of
ral Language Learning, pages 402–411. Proceedings of Machine Learning Research, pages
8748–8763. PMLR.
Lemao Liu, Tiejun Zhao, Taro Watanabe, Hailong Cao,
and Conghui Zhu. 2014. Discriminative training for Stephen Robertson and Hugo Zaragoza. 2009. The
log-linear based smt: Global or local methods. ACM probabilistic relevance framework: BM25 and be-
Transactions on Asian Language Information Pro- yond. Now Publishers Inc.
cessing (TALIP), 13(4):1–25.
Lifeng Shang, Zhengdong Lu, and Hang Li. 2015. Neu-
Yanjun Ma, Yifan He, Andy Way, and Josef van Gen- ral responding machine for short-text conversation.
abith. 2011. Consistent translation using discrim- In ACL, pages 1577–1586.
inative learning-a translation memory-inspired ap- Michel Simard and Pierre Isabelle. 2009. Phrase-based
proach. In Proceedings of the 49th Annual Meet- machine translation in a computer-assisted transla-
ing of the Association for Computational Linguistics: tion environment. Proceedings of the Twelfth Ma-
Human Language Technologies, pages 1239–1248. chine Translation Summit (MT Summit XII), pages
120–127.
Yuxian Meng, Xiaoya Li, Xiayu Zheng, Fei Wu, Xi-
aofei Sun, Tianwei Zhang, and Jiwei Li. 2021. James Smith and Stephen Clark. 2009. Ebmt for smt:
Fast nearest neighbor machine translation. arXiv a new ebmt-smt hybrid. In Proceedings of the 3rd
preprint arXiv:2105.14528. International Workshop on Example-Based Machine
Translation, pages 3–10. Citeseer.
Franz Josef Och. 2003. Minimum error rate training in
statistical machine translation. In Proceedings of the Harold Somers. 2003. Translation memory systems.
41st Annual Meeting of the Association for Compu- Benjamins Translation Library, 35:31–48.
tational Linguistics, pages 160–167, Sapporo, Japan.
Association for Computational Linguistics. Yiping Song, Rui Yan, Xiang Li, Dongyan Zhao, and
Ming Zhang. 2016. Two are better than one: An en-
Gaurav Pandey, Danish Contractor, Vineet Kumar, and semble of retrieval-and generation-based dialog sys-
Sachindra Joshi. 2018. Exemplar encoder-decoder tems. arXiv preprint arXiv:1610.07149.
for neural conversation generation. In ACL, pages
1329–1338. Yixuan Su, Zaiqiao Meng, Simon Baker, and Nigel Col-
lier. 2021a. Few-shot table-to-text generation with
Ashwin Paranjape, Omar Khattab, Christopher Potts, prototype memory. In Findings of the Association
Matei Zaharia, and Christopher D Manning. 2021. for Computational Linguistics: EMNLP 2021, Vir-
Hindsight: Posterior-guided training of retrievers for tual Event / Punta Cana, Dominican Republic, 16-
improved open-ended generation. arXiv preprint 20 November, 2021, pages 910–917. Association for
arXiv:2110.07752. Computational Linguistics.

Yixuan Su, David Vandyke, Simon Baker, Yan Wang, Fei Xiao, Liang Pang, Yanyan Lan, Yan Wang, Huawei
and Nigel Collier. 2021b. Keep the primary, rewrite Shen, and Xueqi Cheng. 2021. Transductive learn-
the secondary: A two-stage approach for paraphrase ing for unsupervised text style transfer. In Proceed-
generation. In Findings of the Association for Com- ings of the 2021 Conference on Empirical Methods
putational Linguistics: ACL-IJCNLP 2021, pages in Natural Language Processing, EMNLP 2021, Vir-
560–569, Online. Association for Computational tual Event / Punta Cana, Dominican Republic, 7-11
Linguistics. November, 2021, pages 2510–2521. Association for
Computational Linguistics.
Yixuan Su, Yan Wang, Deng Cai, Simon Baker, Anna
Korhonen, and Nigel Collier. 2021c. PROTOTYPE- Jitao Xu, Josep M Crego, and Jean Senellart. 2020.
TO-STYLE: dialogue generation with style-aware Boosting neural machine translation with similar
editing on retrieval memory. IEEE ACM Trans. Au- translations. In Proceedings of the 58th Annual
dio Speech Lang. Process., 29:2152–2161. Meeting of the Association for Computational Lin-
Marco Turchi, Matteo Negri, M Farajian, and Marcello guistics, pages 1580–1590.
Federico. 2017. Continuous learning from human
post-edits for neural machine translation. Liu Yang, Junjie Hu, Minghui Qiu, Chen Qu, Jian-
feng Gao, W Bruce Croft, Xiaodong Liu, Yelong
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Shen, and Jingjing Liu. 2019. A hybrid retrieval-
Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz generation neural conversation model. In Proceed-
Kaiser, and Illia Polosukhin. 2017. Attention is all ings of the 28th ACM international conference on in-
you need. In Advances in neural information pro- formation and knowledge management, pages 1341–
cessing systems, pages 5998–6008. 1350.
Oriol Vinyals and Quoc Le. 2015. A neural conversa-
tional model. In ICML (Deep Learning Workshop). Jingyi Zhang, Masao Utiyama, Eiichiro Sumita, Gra-
ham Neubig, and Satoshi Nakamura. 2018. Guiding
Kun Wang, Chengqing Zong, and Keh-Yih Su. 2013. neural machine translation with retrieved translation
Integrating translation memory into phrase-based pieces. In Proceedings of the 2018 Conference of the
machine translation during decoding. In Proceed- North American Chapter of the Association for Com-
ings of the 51st Annual Meeting of the Association putational Linguistics: Human Language Technolo-
for Computational Linguistics (Volume 1: Long Pa- gies, Volume 1 (Long Papers), pages 1325–1335.
pers), pages 11–21.
Yizhe Zhang, Siqi Sun, Xiang Gao, Yuwei Fang, Chris
Kun Wang, Chengqing Zong, and Keh-Yih Su. 2014. Brockett, Michel Galley, Jianfeng Gao, and Bill
Dynamically integrating cross-domain translation Dolan. 2021. Joint retrieval and generation train-
memory into phrase-based machine translation during for grounded text generation. arXiv preprint
ing decoding. In Proceedings of COLING 2014, arXiv:2105.06597.
the 25th International Conference on Computational
Linguistics: Technical Papers, pages 398–408. Zhuosheng Zhang, Kehai Chen, Rui Wang, Masao
Jason Weston, Emily Dinan, and Alexander Miller. Utiyama, Eiichiro Sumita, Zuchao Li, and Hai Zhao.
2018. Retrieve and refine: Improved sequence gen- 2020. Neural machine translation with universal
eration models for dialogue. In Proceedings of the visual representation. In 8th International Confer-
2018 EMNLP Workshop SCAI: The 2nd Interna- ence on Learning Representations, ICLR 2020, Ad-
tional Workshop on Search-Oriented Conversational dis Ababa, Ethiopia, April 26-30, 2020. OpenRe-
AI, pages 87–92. view.net.

Yu Wu, Furu Wei, Shaohan Huang, Yunli Wang, Zhou- Ventsislav Zhechev and Josef Van Genabith. 2010.
jun Li, and Ming Zhou. 2019. Response generation Seeding statistical machine translation with trans-
by context-aware prototype editing. In Proceedings lation memory output through tree-based structural
of the AAAI Conference on Artificial Intelligence, alignment. In Proceedings of the 4th Workshop
volume 33, pages 7281–7288. on Syntax and Structure in Statistical Translation,
pages 43–51.
Zeqiu Wu, Michel Galley, Chris Brockett, Yizhe Zhang,
Xiang Gao, Chris Quirk, Rik Koncel-Kedziorski,
Jianfeng Gao, Hannaneh Hajishirzi, Mari Ostendorf, Xin Zheng, Zhirui Zhang, Junliang Guo, Shujian
et al. 2021. A controllable model of grounded re- Huang, Boxing Chen, Weihua Luo, and Jiajun Chen.
sponse generation. In Proceedings of the AAAI Con- 2021a. Adaptive nearest neighbor machine transla-
ference on Artificial Intelligence, volume 35, pages tion. arXiv preprint arXiv:2105.13022.
14085–14093.
Xin Zheng, Zhirui Zhang, Shujian Huang, Boxing
Mengzhou Xia, Guoping Huang, Lemao Liu, and Chen, Jun Xie, Weihua Luo, and Jiajun Chen. 2021b.
Shuming Shi. 2019. Graph based translation mem- Non-parametric unsupervised domain adaptation for
ory for neural machine translation. In Proceedings neural machine translation. In Findings of the As-
of the AAAI Conference on Artificial Intelligence, sociation for Computational Linguistics: EMNLP
volume 33, pages 7297–7304. 2021, pages 4234–4241.

Kangyan Zhou, Shrimai Prabhumoye, and Alan W
  Black. 2018. A dataset for document grounded con-
  versations. arXiv preprint arXiv:1809.07358.

You can also read

PETCI: A Parallel English Translation Dataset of Chinese Idioms

Part 4.2: Submission of Dispatch Data in the Real-Time Energy and Operating Reserve Markets - IESO

Large Scale Grid Integration of Renewable Energy Sources - Way Forward - Central Electricity Authority November 2013

ENGINEERING POLICY 305 - FACILITIES CONNECTION REQUIREMENTS APPLICABLE STANDARD(S)

Childhood Environment Influences Adrenarcheal Timing among First-Generation Bangladeshi Migrant Girls to the - UK

Report on Short-term Power Market in India: 2018-19 - Economics Division Central Electricity Regulatory Commission - Central ...

PV GRID PARITY MONITOR - Utility-scale 1st issue - QualEnergia

5TH GENERATION CYBER ATTACKS ARE HERE AND MOST BUSINESSES ARE BEHIND - A New Model For Assessing and Planning Security

NATIONAL ELECTRICITY MARKET - Australian Energy ...

Adjusting ITE's Trip Generation Handbook for urban context

THE ACCELERATION OF RENEWABLES DELIVERED IN 2019 - Encouraging renewable energy in Australia - Clean ...

Sustainable Human Resource Management and Generational Diversity: The Importance of the Age Management Pillars - MDPI

Ministry of Business, Innovation and Employment - Hydro generation stack update for large-scale plant - MBIE

2017 HALF-YEAR REPORT - The international database of regulated ...

Transcreation The Ultimate Guide to - Spectra Agency

Translate Request Into Spanish - Pageant of the World

Giving Meaning to the Concept of Sustainability in Architectural Design Practices: Setting Out the Analytical Framework of Translation - MDPI

"Cultural Mediator" or "Scrupulous Translator"? Revisiting Role, Context and Culture in Consecutive Conference Interpreting

Do Context-Aware Translation Models Pay the Right Attention?

Inferring efficiency of translation initiation and elongation from ribosome profiling

Improving Unsupervised Question Answering via Summarization-Informed Question Generation

Anglicisms in Korean: A diachronic corpus-based study with special reference to translation as a mode of language contact - ERIC

Enabling cognitive behavior of humans, animals, and machines: A situation model framework

Lexical Semantic Change Discovery - Sinan Kurtyigit Maike Park Dominik Schlechtweg Jonas Kuhn Sabine Schulte im Walde - ACL Anthology

Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need!

PANDORA Talks: Personality and Demographics on Reddit - Association for ...

PANDORA Talks: Personality and Demographics on Reddit - OSF

Reducing Complexity of Server Configuration through Public Cloud Storage

Accredited Employers Programme - Audit Standards Working with business Effective from 1 April 2017

CRAFTS AND HOME INDUSTRIES SECTION PRIZE LIST - 2016 ROYAL SHOW 27 MAY - 5 JUNE ENTRIES CLOSE 29th APRIL 2016

21-22 2021-2022 Guide to the Preparation of Theses and Dissertations - The ...

PROFESSIONAL PRINTING SYSTEMS LINE-UP - JANUARY 2021 - MJ Flood

Application of Machine Learning techniques in Cloud Services

DETECTING HALLUCINATED CONTENT IN CONDI-TIONAL NEURAL SEQUENCE GENERATION

ST ALBANS CITY AND DISTRICT COUNCIL GAMBLING POLICY STATEMENT OF PRINCIPLES 2019-2021

Children Books Rights List New and Forthcoming 2018 - genzia ervizi ditoriali - servizi editoriali

Improving Aspect Term Extraction with Bidirectional Dependency Tree Representation

Universal Identity and Access Management Framework for Future Ecosystems

Travel Insurance OPTIONS - Effective 28 October 2015

Conditional Generation Using Polynomial Expansions

Multi-year nest box occupancy and short-term resilience to wildfire disturbance by barn owls in a vineyard agroecosystem

Intelligent Micro Energy Grid in 5G Era: Platforms, Business Cases, Testbeds, and Next Generation Applications - MDPI

THE CURIOUS CASE OF POSTS ON STACK OVERFLOW - SHAILJA SHUKLA - DIVA PORTAL

The Equilibrium Value of The Euro/$ US Exchange Rate: An Evaluation of Research

Sentiment-Guided Adversarial Learning for Stock Price Prediction

Zero-Shot Information Extraction as a Unified Text-to-Triple Translation

How Environmental Awareness and Knowledge Affect Urban Residents' Willingness to Participate in Rubber Plantation Ecological Restoration Programs: ...

Comparing different approaches for operationalizing subjective cognitive decline: impact on syndromic and biomarker profiles - Nature

Factors Influencing the Likelihood of Customer Defection: The Role of Consumer Knowledge

Owner's Manual Fusion F1 Table Saw | MTSF132110150-0130 - KEEP THIS MANUAL WITH THE MACHINE - Laguna Tools