A Survey on Retrieval-Augmented Text Generation

Page created by Clifford Walsh
 
CONTINUE READING
A Survey on Retrieval-Augmented Text Generation
A Survey on Retrieval-Augmented Text Generation
                                                    Huayang Li♥,∗ Yixuan Su♠,∗ Deng Cai♦,∗ Yan Wang♣,∗ Lemao Liu♣,∗
                                                      ♥
                                                        Nara Institute of Science and Technology ♠ University of Cambridge
                                                            ♦
                                                              The Chinese University of Hong Kong ♣ Tencent AI Lab
                                                          li.huayang.lh6@is.naist.jp, ys484@cam.ac.uk
                                                       thisisjcykcd@gmail.com, brandenwang@tencent.com
                                                                              lemaoliu@gmail.com

                                                                  Abstract                      firstly present the generic paradigm of retrieval-
                                                                                                augmented generation as well as three key com-
                                            Recently, retrieval-augmented text generation       ponents under this paradigm, which are retrieval
                                            attracted increasing attention of the compu-        sources, retrieval metrics and generation models.
                                            tational linguistics community. Compared
arXiv:2202.01110v1 [cs.CL] 2 Feb 2022

                                                                                                   Then, we introduce notable methods about
                                            with conventional generation models, retrieval-
                                            augmented text generation has remarkable ad-        retrieval-augmented generation, which are orga-
                                            vantages and particularly has achieved state-of-    nized with respect to different tasks. Specifically,
                                            the-art performance in many NLP tasks. This         on the dialogue response generation task, exem-
                                            paper aims to conduct a survey about retrieval-     plar/template retrieval as an intermediate step has
                                            augmented text generation. It firstly highlights    been shown beneficial to informative response gen-
                                            the generic paradigm of retrieval-augmented         eration (Weston et al., 2018; Wu et al., 2019; Cai
                                            generation, and then it reviews notable ap-
                                                                                                et al., 2019a,b). In addition, there has been growing
                                            proaches according to different tasks including
                                            dialogue response generation, machine trans-
                                                                                                interest in knowledge-grounded generation explor-
                                            lation, and other generation tasks. Finally, it     ing different forms of knowledge such as knowl-
                                            points out some important directions on top of      edge bases and external documents (Dinan et al.,
                                            recent methods to facilitate future research.       2018; Zhou et al., 2018; Lian et al., 2019; Li et al.,
                                                                                                2019; Qin et al., 2019; Wu et al., 2021; Zhang et al.,
                                        1   Introduction                                        2021). On the machine translation task, we summa-
                                                                                                rize the early work on how the retrieved sentences
                                        Retrieval-augmented text generation, as a new           (called translation memory) are used to improve
                                        text generation paradigm that fuses emerging deep       statistical machine translation (SMT) (Koehn et al.,
                                        learning technology and traditional retrieval tech-     2003) models (Simard and Isabelle, 2009; Koehn
                                        nology, has achieved state-of-the-art (SOTA) per-       and Senellart, 2010) and in particular, we inten-
                                        formance in many NLP tasks and attracted the at-        sively highlight several popular methods to inte-
                                        tention of the computational linguistics community      grating translation memory to NMT models (Gu
                                        (Weston et al., 2018; Dinan et al., 2018; Cai et al.,   et al., 2018; Zhang et al., 2018; Xu et al., 2020;
                                        2021). Compared with generation-based counter-          He et al., 2021). We also review the applications
                                        part, this new paradigm has some remarkable ad-         of retrieval-augmented generation in other genera-
                                        vantages: 1) The knowledge is not necessary to be       tion tasks such as abstractive summarization (Peng
                                        implicitly stored in model parameters, but is explic-   et al., 2019), code generation (Hashimoto et al.,
                                        itly acquired in a plug-and-play manner, leading        2018), paraphrase (Kazemnejad et al., 2020; Su
                                        to great scalibility; 2) Instead of generating from     et al., 2021b), and knowledge-intensive generation
                                        scratch, the paradigm generating text from some re-     (Lewis et al., 2020b). Finally, we also point out
                                        trieved human-written reference, which potentially      some promising directions on retrieval-augmented
                                        alleviates the difficulty of text generation.           generation to push forward the future research.
                                           This paper aims to review many representative
                                        approaches for retrieval-augmented text generation      2     Retrieval-augmented Paradigm
                                        tasks including dialogue response generation (We-
                                        ston et al., 2018), machine translation (Gu et al.,     2.1    Formulation and Motivation
                                        2018) and others (Hashimoto et al., 2018). We
                                                                                                Most text generation tasks can be formulated as a
                                            ∗
                                                All authors contribute equally.                 mapping from input sequence x to output sequence
Information Retrieval
 Sources       Training                          Unsupervised                     Sec. 3: Dialogue   Sec. 4: Machine Sec. 5: Other
                               External Data                             Tasks:
 (Sec. 2.2):   Corpus                               Data                          Generation         Translation     Tasks

 Metrics       Sparse-vector     Dense-vector     Task-specific          Models        Data             Attention        Skeleton &
 (Sec. 2.3):     Retrieval        Retrieval         Retrieval            (Sec 2.4): Augmentation       Mechanism         Templates

      Input                                         Retrieval Memory                    Generation Model                     Output

                                                Figure 1: The overview of this survey.

y : y = f (x). For example, x and y could be                            et al., 2018; Weston et al., 2018). In the inference
the dialogue history and its response in dialogue                       time, retrieved examples with high relevant scores
generation, sequences in source language and target                     could be regarded as extra references and reduce
language in machine translation, and so on.                             model’s uncertainty in generation. The main moti-
   Recently, some researchers suggest to endow                          vation of those works is to to store knowledge not
models the capability to access external memory                         only in the model parameters but also in an explicit
via some information retrieval techniques, so that                      and accessible form, making the model be able to
they can acquire more information in the generation                     re-access it during inference.
process (Gu et al., 2018; Weston et al., 2018; Cai                         Some researchers also propose to retrieval rel-
et al., 2019b). The retrieval-augmented generation                      evant samples from external datasets (Su et al.,
can be further formulated as:                                           2021c; Xiao et al., 2021). In these studies, the re-
                                                                        trieval pool is different with the training corpus,
                          y = f (x, z)                            (1)
                                                                        which can further provide additional information
where z = {hxr , y r i} is a set of relevant instances                  that are not contained in the training corpus. This
retrieved from the original training set or external                    is especially beneficial for applications such as do-
datasets. The main idea of this paradigm is that                        main adaptation and knowledge update. For exam-
y r may benefit the response generation, if xr (or                      ple, Khandelwal et al. (2020a); Zheng et al. (2021a)
y r ) is extremely relevant to the input x. It is worth                 employ the in-domain dataset as the external mem-
noting that xr = ∅ when unsupervised retrieval                          ory to achieve fast domain adaptation for machine
sources are used. More details about how to get z                       translation.
will be discussed in §2.3.                                                 One limitation for previous two sources is that
   In this section, we will briefly introduce some                      the datasets have to be supervised datasets con-
basic IR techniques. In general, the retrieval mem-                     sisting of aligned input-output pairs. For machine
ory can be retrieved from three kinds of sources:                       translation, Cai et al. (2021) propose a cross-lingual
the training corpus, external datasets in the same                      retriever to directly retrieve target sentence from
format with the training corpus, and large-scale                        unsupervised monolingual corpus. The main idea
unsupervised corpus (§2.2). Metrics that evaluate                       is aligning source-side sentences and the corre-
the relevance between text are varied as well, in                       sponding target-side translations in a dense vector
§2.3 we divided them into three categories: sparse-                     space, i.e., aligning x and y r when xr is absent.
vector retrieval, dense-vector retrieval, and training-                 As a result, the retriever directly connects the dots
based retrieval. Finally, how to integrate the re-                      between the source-side input and target-side trans-
trieval memory to the generation model is also sig-                     lations, enabling monolingual data in the target
nificant, we also introduce some popular integra-                       language to be used alone as memories.
tion approaches in §2.4.
                                                                        2.3   Retrieval Metrics
2.2     Retrieval Sources                                               Given an input sequence x and a retrieval corpus,
Most previous studies search the external memory                        retrieval model aims to retrieve a set of relevant
from its training corpus (Song et al., 2016; Gu                         examples z = {hxr , y r i} from the corpus. When
a supervised corpus is used, {hxr , y r i} is retrieved   2017) a key module in lots of NLP models, integrat-
by measuring the similarity between x and xr .            ing retrieved memory through attention becomes a
   For similarity measurement, sparse-vector              very nature and efficient way.
retrieval methods such as TF-IDF and                         In previous two methods, an NLP model learns
BM25 (Robertson and Zaragoza, 2009) are                   how to filter out irrelevant or even harmful informa-
widely used. They match keywords efficiently              tion from the retrieved examples implicitly. There
with an inverted index. However, these methods            also exist some works that try to explicitly extract
prefer examples with similar surfaces, and may            useful information, i.e., skeleton extraction, from
fail to retrieve examples that are only semantically      the retrieved memory (Cai et al., 2019a; Wu et al.,
relevant.                                                 2019; Cai et al., 2019b). For example, one skeleton
   To alleviate above problem, some studies (Cao          should be a part of a whole utterance with irrelevant
and Xiong, 2018) attempt to retrieve in dense-            content masked, and the generation model only in-
vector space instead of the lexical overlap. Re-          tegrate this skeleton in the generation process.
cent work (Lee et al., 2019) makes use of pre-
trained language models, which encodes the text to        3   Dialogue Response Generation
low-dimensional dense vectors via BERT-based en-
coders. The retrieval score are computed via inner        Background Dialogue systems can be grouped
products between vectors.                                 into two categories: chit-chat systems and task-
                                                          oriented systems. While task-oriented dialogue
   Similarity-based retrieval is based on a simple
                                                          systems are designed to accomplish specific user
heuristic. That is, the more xr resembles with x,
                                                          tasks such as air tickets booking, chit-chat dialogue
the more likely xr and y r will help the generation.
                                                          systems aim at giving a meaningful and fluent re-
However, the most similar one by universal textual
                                                          sponse for any dialogue history in the open domain.
similarity does not necessarily serve the best for
                                                          Dialogue response generation in chit-chat dialogue
downstream models. Ideally, the retrieval metric
                                                          system is challenging partly due to the diversity
would be learned from the data in a task-dependent
                                                          of possible responses to a single dialogue history
way: we wish to consider a memory only if it can
                                                          (i.e., the one-to-many problem). The dialogue his-
indeed boost the quality of final generation. Cai
                                                          tory alone cannot decide a meaningful and specific
et al. (2021) propose to unify the memory retriever
                                                          response. Also, external knowledge that is not
and its downstream NMT model into a learnable
                                                          present in the dialogue history are often necessary
whole. Such memory retrieval is end-to-end opti-
                                                          for avoiding safe but boring responses. We focus
mized for task-specific objectives.
                                                          on recent efforts tackling the challenges to develop
                                                          chit-chat dialogue systems.
2.4   Integration
                                                             Most modern chit-chat dialogue systems can
There are several ways to integrate the retrieved         be categorized into two classes, namely, retrieval-
external memory in generation. One straightfor-           based models and generation-based models. The
ward way is data augmentation, which constructs           retrieval-based models (Ji et al., 2014; Hu et al.,
some augmented inputs by concatenating spans              2014) directly copy an existing response from cu-
from {hxr , y r i} with the original input x. By train-   rated dialogue corpora (i.e., the retrieval pool)
ing on the augmented inputs, a generation model           when receiving a response request. The retrieved
implicitly leans how to integrate the retrieved infor-    responses are often informative and grammatical
mation. Despite the simplicity, this kind of methods      as they are collected from real-world conversa-
works efficiently in lots of tasks (Song et al., 2016;    tions and possibly post-edited by a human. How-
Weston et al., 2018; Bulte and Tezcan, 2019).             ever, such systems perform poorly when a given
   Another integration method is based on the at-         dialogue history is substantially different from
tention mechanism (Bahdanau et al., 2014). The            those in the retrieval pool. On the other hand,
main idea of this fashion is adopting additional en-      the generation-based models (Shang et al., 2015;
coders (in various architectures) to encode retrieved     Vinyals and Le, 2015; Li et al., 2016a) generate
target sentences, and integrate them through atten-       a new utterance from scratch. Those generation-
tion (Cao and Xiong, 2018; Gu et al., 2018; Bapna         based models have better generalization capacity
and Firat, 2019). Since the attention mechanism is        when handling unseen dialogue contexts. Never-
becoming (Bahdanau et al., 2014; Vaswani et al.,          theless, the generated utterances are inclined to be
dull and non-informative (e.g., “I don’t know”, “I       generating the skeletons used for training, which
think so”, “Me too” etc.) (Li et al., 2016a).            extract skeletons from the corresponding responses
                                                         with some deliberate disturbance. Paranjape et al.
Shallow Integration As discussed, retrieval-             (2021) propose to model the retriever after the pos-
based models may give informative but inappro-           terior distribution of retrieval given the input and
priate responses while generation-based models           the target output and train it jointly with the stan-
often do the opposite. It is desirable to combine the    dard retriever and the generator by maximizing the
best of both worlds. Early work (Qiu et al., 2017)       evidence lower bound (ELBo) in expectation over
attempts to re-rank the output from both models.         retrieval.
For a deep integration, Song et al. (2016) and Yang
et al. (2019) extend the standard S EQ 2S EQ encoder-    Knowledge-Enhanced Generation The afore-
decoder model (Bahdanau et al., 2014) with an ex-        mentioned work demonstrates that retrieval-based
tra encoder for encoding the retrieval result. The       dialogue systems can be used for building bet-
output of the extra encoder, along with the output       ter generation-based models. In general, this is
from the original encoder for dialogue history, is       done by conditioning the generation on some re-
used to feed the decoder. Weston et al. (2018) use       trieved responses. More traditionally, to infuse
a single encoder that takes the concatenation of         the response with external knowledge, the retrieval
the original dialogue history and the retrieved as       pool is not necessarily a dialogue corpus. In fact,
input. Wu et al. (2019) note that the retrieved infor-   knowledge-grounded dialogue response generation
mation should be used in awareness of the context        exploring different forms of knowledge such as
difference, and further proposed to construct an         knowledge bases and external documents (Dinan
edit vector by explicitly encoding the lexical differ-   et al., 2018; Zhou et al., 2018; Lian et al., 2019;
ences between the input dialogue history and the         Li et al., 2019; Qin et al., 2019; Wu et al., 2021;
retrieved dialogue history. Pandey et al. (2018) fur-    Zhang et al., 2021; Komeili et al., 2021) has been
ther propose to weight different training instances      actively explored.
by context similarity.
                                                         Limitations We note that there are three major
                                                         limitations in existing work for dialogue response
Deep Integration To prevent the inflow of er-
                                                         generation. First, current methods only use one
roneous information, Cai et al. (2019a) propose
                                                         retrieved response for generation. It can be more
a general framework that first extracts a skeleton
                                                         beneficial to combine multiple retrieval responses.
from the retrieved response and then generates the
                                                         However, this can be difficult due to the one-to-
response based on the extracted skeleton. This
                                                         many nature of dialogue response generation. Sec-
framework is also adopted for stylistic response
                                                         ond, current methods use universal relevance score
generation (Su et al., 2021c). Gupta et al. (2021)
                                                         for retrieval. It can be more effective if we can
suggest to use the semantic structure of an exem-
                                                         use more customized retrieval metric especially
plar response, instead of the tokens of the exem-
                                                         for controlled dialogue response generation (e.g.,
plar response, to guide generation. Despite their
                                                         persona, emotion, etc). Third, the retrieval pool
differences, a common issue is that the genera-
                                                         of existing methods is limited to dialogue corpora
tion model easily learns to ignore the retrieved re-
                                                         (context-response pairs) or documents. It might
sponse entirely and collapses to a vanilla seq2seq
                                                         be useful to enlarge the retrieval pool by including
model. This happens with improper training in-
                                                         more corpora in other domains or in other modali-
stances. Due to the one-to-many nature, it hap-
                                                         ties. As discussed, there leaves plenty of possible
pens frequently that a retrieved response (extracted
                                                         directions to explore in the future.
skeleton) is suitable for responding to the query,
but inconsistent with the current target response.
                                                         4   Machine Translation
   Earlier studies (Weston et al., 2018; Wu et al.,
2019; Cai et al., 2019a) alleviate the above prob-       Retrieval augmented translation originates from hu-
lems by putting hard constraints on the data (e.g.,      man translation scenarios (Somers, 2003). When
discarding data with low similarity of the retrieved     translating ŷ from an input source sentence x, a hu-
response and the target response), which, however,       man translator typically involves a search engine to
greatly reduces the amount of usable data. Cai           retrieve similar sentences {hxr , y r i} from a bilin-
et al. (2019b) employ a random mechanism for             gual database. Such a technique called translation
memory is helpful to improve the translation qual-      translation rules into the phrase table in a shallow
ity and efficiency for human translators (Dillon        combination way. They introduce an additional fea-
and Fraser, 2006). As the development of ma-            ture to indicate that whether translation rule is from
chine translation techniques, there is a surge of       {hxr , y r i} or not and then train all feature weights
interests in improving machine translation models       with MERT (Och, 2003). One characteristic of
with translation memory. In the rest of this section,   these work is that a translation rule extracted from
we will review translation memory for both statisti-    {hxr , y r i} which can not exactly match any seg-
cal machine translation (SMT) and neural machine        ments in x is useless even if it may contain some
translation (NMT).                                      useful words in its target side. To remedy this ob-
                                                        servation, Wang et al. (2013, 2014) resort to a deep
4.1   Translation Memory in SMT                         combination way to using the extracted translation
Generally, SMT includes three key components in         rules. For each rule in the phrase table, it designs
a pipeline manner such as phrase table extraction,      a generative model to reward the rules which are
parameter tuning and decoding (Koehn et al., 2003;      similar to those extracted from {hxr , y r i}. Then
Chiang, 2007). As a result, many efforts have been      this generative model is used as a feature in the log-
made to make use of translation memory (TM) on          linear based SMT model whose weight is tuned
top of each component.                                  together with other features by MERT. In addition,
Constrained Decoding with TM Constrained                Li et al. (2014) employ a similar way to reward
decoding is the most straightforward way to in-         the rules but it relies on a discriminative model
tegrating TM into SMT (Smith and Clark, 2009;           which is easy to integrate potential features from
Koehn and Senellart, 2010; Zhechev and Van Gen-         {hxr , y r i}.
abith, 2010; Ma et al., 2011). Its basic idea is
                                                        Parameter Tuning with TM Unlike the above
to reuse the useful segments in y r while trans-
                                                        two research lines, Liu et al. (2012, 2014) make use
late other segments by SMT. Specifically, the ap-
                                                        of translation memory only in tuning parameters.
proach consists of three steps: 1) identify the un-
                                                        To be specific, when translating an input sentence
matched segments in both xr and x through the
                                                        x, they firstly retrieve many similar bilingual sen-
edit-distance algorithm; 2) identify the unmatched
                                                        tences {hxr , y r i}, and then tune the parameters on
segments in y r , each of which is aligned to one
                                                        top of the retrieved sentences as well as a given de-
unmatched segment in xr by a word alignment
                                                        velopment dataset in a sentence-wise manner, i.e.,
algorithm; 3) decode each unmatched segment in
                                                        it performs an independent tuning for each input
x by SMT and then use the result to replace its
                                                        sentence. To improve the efficiency of each tuning
corresponding unmatched segment in y r . Li et al.
                                                        step, it propose a local update on top of {hxr , y r i}
(2016b) further extend this approach from sentence
                                                        from a baseline model.
level to phrase level. The advantage in constrained
                                                           Despite the successes of translation memory in
decoding is that it does not require to change the
                                                        SMT, there are still some limitations for the above
translation model (including phrase table and pa-
                                                        three kinds of methods. Firstly, all these methods
rameters) and can be applied in a plug-and-play
                                                        employ fuzzy score for retrieval which is highly de-
way. This approach is successful when x is highly
                                                        pendent on word matching and thus can not recall
similar to xr ; otherwise its performance is de-
                                                        such examples which are similar in word seman-
graded largely, because it explicitly isolates TM
                                                        tics but different in surface form. Secondly, these
matching and SMT decoding and reuses the results
                                                        methods integrate the retrieved examples into a
in xr or not in a deterministic way.
                                                        module of SMT in the ways which can not make
Phrase Table Aggregation with TM There are              full use of the knowledge in retrieved examples.
also notable efforts to augment the phrase table        For example, the integration ways in the first two
for SMT by extracting translation rules from the        kinds (constrained decoding and phrase table ag-
retrieved bilingual sentences {hxr , y r i}. Then       gregation) are heuristic and not optimized towards
they re-tune the parameters for the SMT model           translation quality; the parameter tuning method
which makes use of translation knowledge from           fine-tunes few parameters for log-linear based SMT
{hxr , y r i} in a implicit way when translating x.     which are not enough to preserve sufficient knowl-
For example, Biçici and Dymetman (2008); Simard         edge from retrieved examples. Thirdly, since SMT
and Isabelle (2009) directly combine the extracted      performs in a pipeline manner, it is intractable to
jointly optimize retrieval metrics as well as SMT        a light-weight network to learn the reward score.
models. Consequently, all these methods adopt an         Since dense retrieval has the potential of cross-
off-the-shelf metric for retrieval, leading to sub-      lingual retrieval, Zheng et al. (2021b) use a similar
optimal performance.                                     approach to achieve unsupervised domain adapta-
                                                         tion, where a main change is to create the datastore
4.2   Translation Memory in NMT                          based on synthetic sources sentence and the real
Translation memory has been widely explored in           target sentences.
Neural Machine Translation (NMT). Depending
on when retrieval is involved, we can categorize         Training Phase Different from those model-
previous works into two classes: 1) an NMT model         agnostic approaches, previous works in this line
leans how to cooperate with the retrieval model in       aim to train the generation model to learn how
the training phase; 2) an NMT model is only aware        to cooperate with the retrieval model. It is also
of the retrieved data in the inference phase.            worth noting that most works in this line adopt
                                                         the sentence-level retrieval, when integrating the
Inference Phase The key point of literature in           retrieval information in the training process. To
this line is to reward some target words based on        achieve its goal, Bulte and Tezcan (2019) and
words in y r in the inference process. Thus, a de-       Hossain et al. (2020) propose a data augmenta-
cision can be made based on both the distribution        tion method to integrate the retrieved information,
of generation model and the additional reward of         where x is concatenated with y r before feeding
retrieval model. Some previous works propose to          into the model . Following the data augmentation
reward target words based on the sentence-level          approach, Xu et al. (2020) propose more matching
similarity between x and xr , and the word align-        methods to determine including which retrieved
ment between xr and y r . Given the input sentence       example in the source is better.
x, Zhang et al. (2018) try to assign target words           There also exist some works that propose new
in ŷ with higher rewards, when they appear in y r       architectures to integrate the retrieval information.
and the aligned source words are in both xr and          Under the RNN-based framework, Cao and Xiong
x. He et al. (2019) follow a similar framework           (2018) and Gu et al. (2018) use the gating and at-
and consider the position information of those tar-      tention mechanism to incorporate the retrieved tar-
get words when rewarding. Those works reward             get sentences. When Transformer (Vaswani et al.,
the target words in an explicit way, however, the        2017) becomes the backbone of NMT, some works
one-sentence-one-model approach (Li et al., 2016c;       also use additional transformer encoders to en-
Turchi et al., 2017) propose to reward target word       code retrieved target sentences, and integrate them
implicitly. For each testing input x, their approach     through attention mechanism (Bapna and Firat,
will first finetune the translation model on retrieved   2019; Cao et al., 2019). Xia et al. (2019) repre-
memory {hxr , y r i} and then translate x.               sent the retrieved target sentences in a different
     Others try to reward target words based on token-   data structure, i.e., a graph structure, and integrate
level similarity score. Most works in this line are      it through attention mechanism. He et al. (2021)
based on the dense retriever (Khandelwal et al.,         propose a light-weight method to encode the re-
2020a), e.g., faiss. Khandelwal et al. (2020a) build     trieved target sentences and leverage the alignment
a key-value datastore, where key h(xr , y r
primary feature to derive reward scores. How-                 augmented text generation. Peng et al. (2019)
ever, some information, e.g., frequencies of words            propose an adaptive decoding framework which
and context, may also be beneficial for integrating           first retrieves an exemplar document given the
the translation memory. Second, it remains to be              source document. Then, the summarization of the
an open question that when should we use the re-              source document is derived through an adaptive
trieved information and when not. In the inference            generation process based on the retrieved template.
phase, approaches tend to integrate the translation           Different from Peng et al. (2019), Cao et al.
memory excessively, e.g., at each time step, which            (2018) and Hossain et al. (2020) introduce an
not only reduces the translation efficiency but may           intermediate re-ranking stage into the generation
also dampen the fluency of generated results.                 pipeline. Specifically, before generating the
                                                              document summary, the retrieval documents are
5   Other Tasks                                               first re-ranked based on their similarity scores
                                                              with respect to the source document. Then, the
In addition to dialogue system and machine trans-
                                                              document summarization is produced by re-writing
lation, retrieval-augmented generation techniques
                                                              the selected templates.
have shown to be beneficial in many other tasks. In
the following, we highlight several key tasks that            Paraphrase Generation To address the lack of
apply retrieval-augmented generation approaches.1             quality as well as diversity in the generation of para-
                                                              phrases, Kazemnejad et al. (2020) propose a gen-
Language Modelling It has been shown that
                                                              eration framework which first retrieves a sentence
properly leveraging information from retrieval
                                                              that is similar to input sentence. Then, based on
memory could improve the performance of large
                                                              the retrieved sentence, a neural editor produces the
pre-trained language model. To build a more accu-
                                                              resulting paraphrased sentence. Chen et al. (2019)
rate language model, Khandelwal et al. (2020b) pro-
                                                              investigate a different aspect of paraphrasing, i.e.
pose to incorporate a soft memory module into the
                                                              how to control the linguistic syntax displayed in
system. Specifically, an index is built by caching
                                                              the generated text. To achieve this goal, Chen et al.
the hidden states of the training corpus. Then, the
                                                              (2019) propose to first extract a sentential exem-
language model accesses the index via k-NN search
                                                              plar that serves as the syntax template. A neural
and displays a greatly improved performance. As
                                                              model then generates the paraphrase with desired
another example, Guu et al. (2020) propose a new
                                                              linguistic syntax following the retrieved exemplar.
paradigm that applies retrieval-augmented tech-
nique into the pre-training of generative language            Text Style Transfer To improve the quality of
model. During learning, they train a neural se-               generated text, Li et al. (2018) propose a retrieval-
lector that dynamically samples a relevant text to            augmented framework which first retrieves texts
guide the reconstruction of a corrupted input se-             that are similar to the input based on lexical-level
quence. In this way, the pre-trained model deliv-             similarity. Then, the retrieved tokens that are irrel-
ers better results by explicitly grounding on the             evant to the source are deleted, and the output is
retrieval memory. Lewis et al. (2020a) combine                derived from the edited template. Xiao et al. (2021)
language model pre-training with a paraphrasing               also adopte this framework by incorporating re-
approach. During learning, an input sequence to               trieval information from two sources (i.e. sparse
the model is first corrupted. In the meantime, a set          and dense memories) and obtained an improved
of multi-lingual texts are retrieved based on which           model performance.
the model learns to reconstruct the original input
sequence. Recently, Borgeaud et al. (2021) pro-               Data-to-Text Generation Recently, retrieval-
pose RETRO, a large pre-trained language model                augmented generation has been adapted to the task
enhanced with retrieved documents, and obtained               of data-to-text generation. To bridge the gap be-
comparable performances with GPT-3 using 25×                  tween the structured data and natural language
fewer parameters.                                             text, Su et al. (2021a) propose a novel retrieval-
                                                              augmented framework. Specifically, given the
Summarization Text summarization is another                   source data, a set of candidate texts are first re-
research area that benefits from retrieval-                   trieved from a large unlabelled corpus. Then, a
   1
     Here, we focus on tasks other than question answering.   neural selector is applied to measure the similari-
We refer readers interested in QA to Chen and Yih (2020).     ties between the source data and candidate texts,
and extract a set of more fine-grained prototypes         and generation models. However, in practice, there
from the candidates. Lastly, a generation model           is an essential gap about the retrieval metric be-
takes the prototypes as input to produce the text         tween the training and inference phrases. In the
that describes the given structured data.                 training phase, the loss is locally back-propagated
   While retrieval-augmented generation has been          to only a few retrieved examples while in the infer-
widely explored in the NLP community, we sug-             ence phase the metric is globally conducted among
gest that future research could extend this approach      all examples in the memory. It would be interesting
to tasks that involve data from multiple modali-          to narrow such a gap when learning a better metric
ties. For instance, with recent advancements in           for generation tasks.
image-text retrieval (Jia et al., 2021; Radford et al.,
2021), the structural gap between images and texts        Multi-Modalities With recent advancement in
is largely bridged. Some early studies (Zhang et al.,     image-text retrieval, directly associating images
2020) have shown that information retrieved from          with relevant text becomes possible. This urges
images could improve the performance of neural            researchers to investigate the possibility of retrieval-
machine translation model. Naturally, such meth-          based text generation in tasks that involve data from
ods could be extended to other multi-modal tasks,         different modalities. One typical task is image
such as image captioning (Karpathy and Li, 2015).         captioning. Beyond images, other tasks like speech-
A similar idea could also be applied to tasks be-         to-text transcription could potentially benefit from
yond images, such as speech-to-text transcription         retrieval-based generation methods as well.
(Gales and Young, 2007).
                                                          Diverse & Controllable Retrieval Most of the
6   Future Directions                                     existing approaches adopt a universal metric for
Despite the current success of retrieval augmented        retrieval, such as lexical similarities of sentences.
text generation, there is still a long way to go as       Future work should explore how to use customized
discussed in previous sections. We highlight some         metrics for retrieval. This can be beneficial for
directions to facilitate the future research as fol-      more controlled text generation. For example, in-
lows:                                                     stances with emotions and styles may be more de-
                                                          sirable in the personalized dialogue generation, par-
Retrieval Sensitivity The performance of re-              allel data that contains specific terminologies is
trieval augmented text generation is very sensitive       more helpful in machine translation, and so on. On
to the retrieval quality, i.e., the similarity between    the other hand, using a universal metric for retrieval
the query and the retrieved examples. Currently, re-      may lead to the lack of diversity of the retrieval re-
trieval augmented text generation models perform          sults. Collecting a diverse set of retrieval results
well when the retrieved examples are very simi-           can improve the coverage of useful information.
lar to the query. However, they are even worse            Thus, considering multiple different metrics for re-
than the generation models without retrieval when         trieval may lead to generation with higher quality
the retrieval examples are less similar. Therefore,       in the future.
it would be important to exploit new methods to
address such an issue on similarity.                      7   Conclusion
Retrieval Efficeincy Generally, if one enlarges
                                                          In this paper, we surveyed recent approaches for
the retrieval memory to some extent, it would be
                                                          retrieval-augmented text generation. We reviewed
possible to retrieve an example which is very simi-
                                                          and summarized the development of different com-
lar to the query.Unfortunately, the downside is that
                                                          ponents of retrieval-augmented text generation in-
the overall inference for the retrieval augmented
                                                          cluding retrieval metrics, retrieval sources, and in-
generation models is less efficient due the consid-
                                                          tegration paradigms. We gave in-depth discussions
erable retrieval overhead. In this sense, it is urgent
                                                          when retrieval-augmented text generation comes to
to consider some methods to trade off the retrieval
                                                          different applications including dialogue response
memory size and retrieval efficiency, for example,
                                                          generation, machine translation, and other genera-
data compression for the retrieval memory.
                                                          tion tasks. We also pointed out some future direc-
Local vs. Global Optimization Theoretically, it           tions for retrieval-augmented text generation.
seems promising to jointly learn retrieval metrics
References                                                Qian Cao, Shaohui Kuang, and Deyi Xiong. 2019.
                                                            Learning to reuse translations: Guiding neural ma-
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben-            chine translation with examples. arXiv preprint
  gio. 2014. Neural machine translation by jointly          arXiv:1911.10732.
  learning to align and translate. arXiv preprint
  arXiv:1409.0473.                                        Qian Cao and Deyi Xiong. 2018. Encoding gated
                                                            translation memory into neural machine translation.
Ankur Bapna and Orhan Firat. 2019. Non-parametric           In Proceedings of the 2018 Conference on Empiri-
  adaptation for neural machine translation. In Pro-        cal Methods in Natural Language Processing, pages
  ceedings of the 2019 Conference of the North Amer-        3042–3047.
  ican Chapter of the Association for Computational
  Linguistics: Human Language Technologies, Vol-          Ziqiang Cao, Wenjie Li, Sujian Li, and Furu Wei.
  ume 1 (Long and Short Papers), pages 1921–1931.           2018. Retrieve, rerank and rewrite: Soft template
                                                            based neural summarization. In Proceedings of the
Ergun Biçici and Marc Dymetman. 2008. Dynamic
                                                            56th Annual Meeting of the Association for Com-
  translation memory: Using statistical machine trans-
                                                            putational Linguistics, ACL 2018, Melbourne, Aus-
  lation to improve translation memory fuzzy matches.
                                                            tralia, July 15-20, 2018, Volume 1: Long Papers,
  In International Conference on Intelligent Text Pro-
                                                            pages 152–161. Association for Computational Lin-
  cessing and Computational Linguistics, pages 454–
                                                            guistics.
  465. Springer.

Sebastian Borgeaud, Arthur Mensch, Jordan Hoff-           Danqi Chen and Wen-tau Yih. 2020. Open-domain
  mann, Trevor Cai, Eliza Rutherford, Katie Millican,       question answering. In Proceedings of the 58th An-
  George van den Driessche, Jean-Baptiste Lespiau,          nual Meeting of the Association for Computational
  Bogdan Damoc, Aidan Clark, Diego de Las Casas,            Linguistics: Tutorial Abstracts, pages 34–37, On-
  Aurelia Guy, Jacob Menick, Roman Ring, Tom Hen-           line. Association for Computational Linguistics.
  nigan, Saffron Huang, Loren Maggiore, Chris Jones,
  Albin Cassirer, Andy Brock, Michela Paganini, Ge-       Mingda Chen, Qingming Tang, Sam Wiseman, and
  offrey Irving, Oriol Vinyals, Simon Osindero, Karen       Kevin Gimpel. 2019. Controllable paraphrase gen-
  Simonyan, Jack W. Rae, Erich Elsen, and Laurent           eration with a syntactic exemplar. In Proceedings of
  Sifre. 2021. Improving language models by retriev-        the 57th Conference of the Association for Compu-
  ing from trillions of tokens. CoRR, abs/2112.04426.       tational Linguistics, ACL 2019, Florence, Italy, July
                                                            28- August 2, 2019, Volume 1: Long Papers, pages
Bram Bulte and Arda Tezcan. 2019. Neural fuzzy re-          5972–5984. Association for Computational Linguis-
  pair: Integrating fuzzy matches into neural machine       tics.
  translation. In Proceedings of the 57th Annual Meet-
  ing of the Association for Computational Linguistics,   David Chiang. 2007. Hierarchical phrase-based trans-
  pages 1800–1809.                                          lation. computational linguistics, 33(2):201–228.

Deng Cai, Yan Wang, Wei Bi, Zhaopeng Tu, Xi-              Sarah Dillon and Janet Fraser. 2006. Translators and
  aojiang Liu, Wai Lam, and Shuming Shi. 2019a.             tm: An investigation of translators’ perceptions of
  Skeleton-to-response: Dialogue generation guided          translation memory adoption. Machine Translation,
  by retrieval memory. In Proceedings of the 2019           20(2):67–79.
  Conference of the North American Chapter of the
  Association for Computational Linguistics: Human        Emily Dinan, Stephen Roller, Kurt Shuster, Angela
  Language Technologies, Volume 1 (Long and Short           Fan, Michael Auli, and Jason Weston. 2018. Wizard
  Papers), pages 1219–1228.                                 of wikipedia: Knowledge-powered conversational
                                                            agents. arXiv preprint arXiv:1811.01241.
Deng Cai, Yan Wang, Wei Bi, Zhaopeng Tu, Xiao-
  jiang Liu, and Shuming Shi. 2019b. Retrieval-           Mark J. F. Gales and Steve J. Young. 2007. The applica-
  guided dialogue response generation via a matching-      tion of hidden markov models in speech recognition.
  to-generation framework. In Proceedings of the           Found. Trends Signal Process., 1(3):195–304.
  2019 Conference on Empirical Methods in Natu-
  ral Language Processing and the 9th International       Jiatao Gu, Yong Wang, Kyunghyun Cho, and Vic-
  Joint Conference on Natural Language Processing            tor OK Li. 2018. Search engine guided neural ma-
  (EMNLP-IJCNLP), pages 1866–1875.                           chine translation. In Proceedings of the AAAI Con-
                                                             ference on Artificial Intelligence, volume 32.
Deng Cai, Yan Wang, Huayang Li, Wai Lam, and
  Lemao Liu. 2021. Neural machine translation with        Prakhar Gupta, Jeffrey Bigham, Yulia Tsvetkov, and
  monolingual translation memory. In Proceedings of         Amy Pavel. 2021. Controlling dialogue generation
  the 59th Annual Meeting of the Association for Com-       with semantic exemplars. In Proceedings of the
  putational Linguistics and the 11th International         2021 Conference of the North American Chapter of
  Joint Conference on Natural Language Processing           the Association for Computational Linguistics: Hu-
  (Volume 1: Long Papers), pages 7307–7318, Online.         man Language Technologies, pages 3018–3029, On-
  Association for Computational Linguistics.                line. Association for Computational Linguistics.
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasu-          Urvashi Khandelwal, Angela Fan, Dan Jurafsky, Luke
  pat, and Ming-Wei Chang. 2020. REALM: retrieval-           Zettlemoyer, and Mike Lewis. 2020a.         Near-
  augmented language model pre-training. CoRR,               est neighbor machine translation. arXiv preprint
  abs/2002.08909.                                            arXiv:2010.00710.

Tatsunori B Hashimoto, Kelvin Guu, Yonatan Oren,           Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke
  and Percy S Liang. 2018. A retrieve-and-edit frame-        Zettlemoyer, and Mike Lewis. 2020b. Generaliza-
  work for predicting structured outputs. In Advances        tion through memorization: Nearest neighbor lan-
  in Neural Information Processing Systems, pages            guage models. In 8th International Conference on
  10052–10062.                                               Learning Representations, ICLR 2020, Addis Ababa,
                                                             Ethiopia, April 26-30, 2020. OpenReview.net.
Qiuxiang He, Guoping Huang, Qu Cui, Li Li, and
  Lemao Liu. 2021. Fast and accurate neural machine        Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003.
  translation with translation memory. In Proceed-           Statistical phrase-based translation. In Proceedings
  ings of the 59th Annual Meeting of the Association         of the 2003 Human Language Technology Confer-
  for Computational Linguistics and the 11th Interna-        ence of the North American Chapter of the Associa-
  tional Joint Conference on Natural Language Pro-           tion for Computational Linguistics, pages 127–133.
  cessing (Volume 1: Long Papers), pages 3170–3180.        Philipp Koehn and Jean Senellart. 2010. Convergence
                                                             of translation memory and statistical machine trans-
Qiuxiang He, Guoping Huang, Lemao Liu, and Li Li.            lation. In Proceedings of AMTA Workshop on MT
  2019. Word position aware translation memory for           Research and the Translation Industry, pages 21–31.
  neural machine translation. In CCF International
  Conference on Natural Language Processing and            Mojtaba Komeili, Kurt Shuster, and Jason Weston.
  Chinese Computing, pages 367–379. Springer.               2021.    Internet-augmented dialogue generation.
                                                            arXiv preprint arXiv:2107.07566.
Nabil Hossain, Marjan Ghazvininejad, and Luke Zettle-
  moyer. 2020. Simple and effective retrieve-edit-         Kenton Lee, Ming-Wei Chang, and Kristina Toutanova.
  rerank text generation. In Proceedings of the 58th         2019.     Latent retrieval for weakly supervised
  Annual Meeting of the Association for Computa-             open domain question answering. arXiv preprint
  tional Linguistics, pages 2532–2538.                       arXiv:1906.00300.

Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai             Mike Lewis, Marjan Ghazvininejad, Gargi Ghosh, Ar-
  Chen. 2014. Convolutional neural network architec-         men Aghajanyan, Sida Wang, and Luke Zettlemoyer.
  tures for matching natural language sentences. In          2020a. Pre-training via paraphrasing. In Advances
  NIPS, pages 2042–2050.                                     in Neural Information Processing Systems 33: An-
                                                             nual Conference on Neural Information Processing
Zongcheng Ji, Zhengdong Lu, and Hang Li. 2014. An            Systems 2020, NeurIPS 2020, December 6-12, 2020,
  information retrieval approach to short text conver-       virtual.
  sation. arXiv preprint arXiv:1408.6988.
                                                           Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio
Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana          Petroni, Vladimir Karpukhin, Naman Goyal, Hein-
  Parekh, Hieu Pham, Quoc V. Le, Yun-Hsuan Sung,             rich Küttler, Mike Lewis, Wen-tau Yih, Tim Rock-
  Zhen Li, and Tom Duerig. 2021. Scaling up visual           täschel, et al. 2020b. Retrieval-augmented gen-
  and vision-language representation learning with           eration for knowledge-intensive nlp tasks. arXiv
  noisy text supervision. In Proceedings of the 38th In-     preprint arXiv:2005.11401.
  ternational Conference on Machine Learning, ICML         Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao,
  2021, 18-24 July 2021, Virtual Event, volume 139 of         and Bill Dolan. 2016a. A diversity-promoting ob-
  Proceedings of Machine Learning Research, pages             jective function for neural conversation models. In
  4904–4916. PMLR.                                            NAACL, pages 110–119.
Andrej Karpathy and Fei-Fei Li. 2015. Deep visual-         Juncen Li, Robin Jia, He He, and Percy Liang. 2018.
  semantic alignments for generating image descrip-          Delete, retrieve, generate: a simple approach to sen-
  tions. In IEEE Conference on Computer Vision and           timent and style transfer. In Proceedings of the 2018
  Pattern Recognition, CVPR 2015, Boston, MA, USA,           Conference of the North American Chapter of the
  June 7-12, 2015, pages 3128–3137. IEEE Computer            Association for Computational Linguistics: Human
  Society.                                                   Language Technologies, NAACL-HLT 2018, New
                                                             Orleans, Louisiana, USA, June 1-6, 2018, Volume
Amirhossein Kazemnejad, Mohammadreza Salehi, and             1 (Long Papers), pages 1865–1874. Association for
 Mahdieh Soleymani Baghshah. 2020. Paraphrase                Computational Linguistics.
 generation by learning how to edit from samples. In
 Proceedings of the 58th Annual Meeting of the Asso-       Liangyou Li, Andy Way, and Qun Liu. 2014. A
 ciation for Computational Linguistics, pages 6010–          discriminative framework of integrating translation
 6021, Online. Association for Computational Lin-            memory features into smt. In Proceedings of the
 guistics.                                                   11th Conference of the Association for Machine
Translation in the Americas, volume 1, pages 249–        Hao Peng, Ankur P. Parikh, Manaal Faruqui, Bhuwan
  260.                                                       Dhingra, and Das Dipanjan. 2019. Text generation
                                                             with exemplar-based adaptive decoding. In Proceed-
Liangyou Li, Andy Way, and Qun Liu. 2016b. Phrase-           ings of the Conference of the North American Chap-
  level combination of smt and tm using constrained          ter of the Association for Computational Linguistics:
  word lattice. Association for Computational Lin-           Human Language Technologies.
  guistics (ACL).
                                                           Lianhui Qin, Michel Galley, Chris Brockett, Xiaodong
Xiaoqing Li, Jiajun Zhang, and Chengqing Zong.               Liu, Xiang Gao, William B Dolan, Yejin Choi, and
  2016c. One sentence one model for neural machine           Jianfeng Gao. 2019. Conversing by reading: Con-
  translation. arXiv preprint arXiv:1609.06490.              tentful neural conversation with on-demand machine
                                                             reading. In Proceedings of the 57th Annual Meet-
Zekang Li, Cheng Niu, Fandong Meng, Yang Feng,               ing of the Association for Computational Linguistics,
  Qian Li, and Jie Zhou. 2019. Incremental trans-            pages 5427–5436.
  former with deliberation decoder for document
  grounded conversations. In Proceedings of the 57th       Minghui Qiu, Feng-Lin Li, Siyu Wang, Xing Gao, Yan
  Annual Meeting of the Association for Computa-             Chen, Weipeng Zhao, Haiqing Chen, Jun Huang,
  tional Linguistics, pages 12–21.                           and Wei Chu. 2017. Alime chat: A sequence to se-
                                                             quence and rerank based chatbot engine. In ACL,
Rongzhong Lian, Min Xie, Fan Wang, Jinhua Peng,              pages 498–503.
  and Hua Wu. 2019. Learning to select knowledge
  for response generation in dialog systems. arXiv         Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya
  preprint arXiv:1902.04911.                                 Ramesh, Gabriel Goh, Sandhini Agarwal, Girish
                                                             Sastry, Amanda Askell, Pamela Mishkin, Jack Clark,
Lemao Liu, Hailong Cao, Taro Watanabe, Tiejun Zhao,          Gretchen Krueger, and Ilya Sutskever. 2021. Learn-
  Mo Yu, and Conghui Zhu. 2012. Locally training             ing transferable visual models from natural lan-
  the log-linear model for smt. In Proceedings of the        guage supervision. In Proceedings of the 38th In-
  2012 Joint Conference on Empirical Methods in Nat-         ternational Conference on Machine Learning, ICML
  ural Language Processing and Computational Natu-           2021, 18-24 July 2021, Virtual Event, volume 139 of
  ral Language Learning, pages 402–411.                      Proceedings of Machine Learning Research, pages
                                                             8748–8763. PMLR.
Lemao Liu, Tiejun Zhao, Taro Watanabe, Hailong Cao,
  and Conghui Zhu. 2014. Discriminative training for       Stephen Robertson and Hugo Zaragoza. 2009. The
  log-linear based smt: Global or local methods. ACM          probabilistic relevance framework: BM25 and be-
  Transactions on Asian Language Information Pro-             yond. Now Publishers Inc.
  cessing (TALIP), 13(4):1–25.
                                                           Lifeng Shang, Zhengdong Lu, and Hang Li. 2015. Neu-
Yanjun Ma, Yifan He, Andy Way, and Josef van Gen-             ral responding machine for short-text conversation.
  abith. 2011. Consistent translation using discrim-          In ACL, pages 1577–1586.
  inative learning-a translation memory-inspired ap-       Michel Simard and Pierre Isabelle. 2009. Phrase-based
  proach. In Proceedings of the 49th Annual Meet-            machine translation in a computer-assisted transla-
  ing of the Association for Computational Linguistics:      tion environment. Proceedings of the Twelfth Ma-
  Human Language Technologies, pages 1239–1248.              chine Translation Summit (MT Summit XII), pages
                                                            120–127.
Yuxian Meng, Xiaoya Li, Xiayu Zheng, Fei Wu, Xi-
  aofei Sun, Tianwei Zhang, and Jiwei Li. 2021.            James Smith and Stephen Clark. 2009. Ebmt for smt:
  Fast nearest neighbor machine translation. arXiv           a new ebmt-smt hybrid. In Proceedings of the 3rd
  preprint arXiv:2105.14528.                                 International Workshop on Example-Based Machine
                                                             Translation, pages 3–10. Citeseer.
Franz Josef Och. 2003. Minimum error rate training in
  statistical machine translation. In Proceedings of the   Harold Somers. 2003. Translation memory systems.
  41st Annual Meeting of the Association for Compu-          Benjamins Translation Library, 35:31–48.
  tational Linguistics, pages 160–167, Sapporo, Japan.
  Association for Computational Linguistics.               Yiping Song, Rui Yan, Xiang Li, Dongyan Zhao, and
                                                             Ming Zhang. 2016. Two are better than one: An en-
Gaurav Pandey, Danish Contractor, Vineet Kumar, and          semble of retrieval-and generation-based dialog sys-
  Sachindra Joshi. 2018. Exemplar encoder-decoder            tems. arXiv preprint arXiv:1610.07149.
  for neural conversation generation. In ACL, pages
  1329–1338.                                               Yixuan Su, Zaiqiao Meng, Simon Baker, and Nigel Col-
                                                             lier. 2021a. Few-shot table-to-text generation with
Ashwin Paranjape, Omar Khattab, Christopher Potts,           prototype memory. In Findings of the Association
  Matei Zaharia, and Christopher D Manning. 2021.            for Computational Linguistics: EMNLP 2021, Vir-
  Hindsight: Posterior-guided training of retrievers for     tual Event / Punta Cana, Dominican Republic, 16-
  improved open-ended generation. arXiv preprint             20 November, 2021, pages 910–917. Association for
  arXiv:2110.07752.                                          Computational Linguistics.
Yixuan Su, David Vandyke, Simon Baker, Yan Wang,         Fei Xiao, Liang Pang, Yanyan Lan, Yan Wang, Huawei
  and Nigel Collier. 2021b. Keep the primary, rewrite      Shen, and Xueqi Cheng. 2021. Transductive learn-
  the secondary: A two-stage approach for paraphrase       ing for unsupervised text style transfer. In Proceed-
  generation. In Findings of the Association for Com-      ings of the 2021 Conference on Empirical Methods
  putational Linguistics: ACL-IJCNLP 2021, pages           in Natural Language Processing, EMNLP 2021, Vir-
  560–569, Online. Association for Computational           tual Event / Punta Cana, Dominican Republic, 7-11
  Linguistics.                                             November, 2021, pages 2510–2521. Association for
                                                           Computational Linguistics.
Yixuan Su, Yan Wang, Deng Cai, Simon Baker, Anna
  Korhonen, and Nigel Collier. 2021c. PROTOTYPE-         Jitao Xu, Josep M Crego, and Jean Senellart. 2020.
  TO-STYLE: dialogue generation with style-aware            Boosting neural machine translation with similar
  editing on retrieval memory. IEEE ACM Trans. Au-          translations. In Proceedings of the 58th Annual
  dio Speech Lang. Process., 29:2152–2161.                  Meeting of the Association for Computational Lin-
Marco Turchi, Matteo Negri, M Farajian, and Marcello        guistics, pages 1580–1590.
 Federico. 2017. Continuous learning from human
 post-edits for neural machine translation.              Liu Yang, Junjie Hu, Minghui Qiu, Chen Qu, Jian-
                                                           feng Gao, W Bruce Croft, Xiaodong Liu, Yelong
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob           Shen, and Jingjing Liu. 2019. A hybrid retrieval-
  Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz            generation neural conversation model. In Proceed-
  Kaiser, and Illia Polosukhin. 2017. Attention is all     ings of the 28th ACM international conference on in-
  you need. In Advances in neural information pro-         formation and knowledge management, pages 1341–
  cessing systems, pages 5998–6008.                        1350.
Oriol Vinyals and Quoc Le. 2015. A neural conversa-
  tional model. In ICML (Deep Learning Workshop).        Jingyi Zhang, Masao Utiyama, Eiichiro Sumita, Gra-
                                                            ham Neubig, and Satoshi Nakamura. 2018. Guiding
Kun Wang, Chengqing Zong, and Keh-Yih Su. 2013.             neural machine translation with retrieved translation
  Integrating translation memory into phrase-based          pieces. In Proceedings of the 2018 Conference of the
  machine translation during decoding. In Proceed-          North American Chapter of the Association for Com-
  ings of the 51st Annual Meeting of the Association        putational Linguistics: Human Language Technolo-
  for Computational Linguistics (Volume 1: Long Pa-         gies, Volume 1 (Long Papers), pages 1325–1335.
  pers), pages 11–21.
                                                         Yizhe Zhang, Siqi Sun, Xiang Gao, Yuwei Fang, Chris
Kun Wang, Chengqing Zong, and Keh-Yih Su. 2014.            Brockett, Michel Galley, Jianfeng Gao, and Bill
  Dynamically integrating cross-domain translation         Dolan. 2021. Joint retrieval and generation train-
  memory into phrase-based machine translation dur-        ing for grounded text generation. arXiv preprint
  ing decoding. In Proceedings of COLING 2014,             arXiv:2105.06597.
  the 25th International Conference on Computational
  Linguistics: Technical Papers, pages 398–408.          Zhuosheng Zhang, Kehai Chen, Rui Wang, Masao
Jason Weston, Emily Dinan, and Alexander Miller.           Utiyama, Eiichiro Sumita, Zuchao Li, and Hai Zhao.
   2018. Retrieve and refine: Improved sequence gen-       2020. Neural machine translation with universal
   eration models for dialogue. In Proceedings of the      visual representation. In 8th International Confer-
   2018 EMNLP Workshop SCAI: The 2nd Interna-              ence on Learning Representations, ICLR 2020, Ad-
   tional Workshop on Search-Oriented Conversational       dis Ababa, Ethiopia, April 26-30, 2020. OpenRe-
   AI, pages 87–92.                                        view.net.

Yu Wu, Furu Wei, Shaohan Huang, Yunli Wang, Zhou-        Ventsislav Zhechev and Josef Van Genabith. 2010.
  jun Li, and Ming Zhou. 2019. Response generation         Seeding statistical machine translation with trans-
  by context-aware prototype editing. In Proceedings       lation memory output through tree-based structural
  of the AAAI Conference on Artificial Intelligence,       alignment. In Proceedings of the 4th Workshop
  volume 33, pages 7281–7288.                              on Syntax and Structure in Statistical Translation,
                                                           pages 43–51.
Zeqiu Wu, Michel Galley, Chris Brockett, Yizhe Zhang,
  Xiang Gao, Chris Quirk, Rik Koncel-Kedziorski,
  Jianfeng Gao, Hannaneh Hajishirzi, Mari Ostendorf,     Xin Zheng, Zhirui Zhang, Junliang Guo, Shujian
  et al. 2021. A controllable model of grounded re-        Huang, Boxing Chen, Weihua Luo, and Jiajun Chen.
  sponse generation. In Proceedings of the AAAI Con-       2021a. Adaptive nearest neighbor machine transla-
  ference on Artificial Intelligence, volume 35, pages     tion. arXiv preprint arXiv:2105.13022.
  14085–14093.
                                                         Xin Zheng, Zhirui Zhang, Shujian Huang, Boxing
Mengzhou Xia, Guoping Huang, Lemao Liu, and                Chen, Jun Xie, Weihua Luo, and Jiajun Chen. 2021b.
 Shuming Shi. 2019. Graph based translation mem-           Non-parametric unsupervised domain adaptation for
 ory for neural machine translation. In Proceedings        neural machine translation. In Findings of the As-
 of the AAAI Conference on Artificial Intelligence,        sociation for Computational Linguistics: EMNLP
 volume 33, pages 7297–7304.                               2021, pages 4234–4241.
Kangyan Zhou, Shrimai Prabhumoye, and Alan W
  Black. 2018. A dataset for document grounded con-
  versations. arXiv preprint arXiv:1809.07358.
You can also read