Logician and Orator: Learning from the Duality between Language and Knowledge in Open Domain

Page created by Manuel Morris
 
CONTINUE READING
Logician and Orator: Learning from the Duality between Language and
                      Knowledge in Open Domain

                              Mingming Sun1,2 , Xu Li1,2 , Ping Li1
                           1
                             Big Data Lab (BDL-US), Baidu Research
    2
      National Engineering Laboratory of Deep Learning Technology and Application, China
                 {sunmingming01,lixu13,liping11}@baidu.com

                     Abstract                              traction” or “relation classification”, which iden-
    We propose the task of Open-Domain Infor-              tifies instances of a fixed and finite set of rela-
    mation Narration (OIN) as the reverse task of          tions from natural language corpus, using super-
    Open Information Extraction (OIE), to imple-           vised methods (Kambhatla, 2004; Zelenko et al.,
    ment the dual structure between language and           2003; Miwa and Bansal, 2016; Zheng et al., 2017)
    knowledge in the open domain. We then de-              or weakly supervised methods (Mintz et al., 2009;
    velop an agent, called Orator, to accomplish
                                                           Lin et al., 2016). In the meantime, the close-
    the OIN task, and assemble the Orator and the
    recently proposed OIE agent – Logician (Sun            domain IN (CIN) task (Wiseman et al., 2017;
    et al., 2018) into a dual system to utilize the        Chisholm et al., 2017; Agarwal and Dymetman,
    duality structure with a reinforcement learning        2017; Vougiouklis et al., 2017; Yin et al., 2016)
    paradigm. Experimental results reveal the dual         transforms a set of facts with a pre-defined schema
    structure between OIE and OIN tasks helps to           or relation types (such as facts from Freebase (Bol-
    build better both OIE agents and OIN agents.           lacker et al., 2008), DBpedia (Auer et al., 2007),
                                                           or database tables), into natural language sen-
1   Introduction
                                                           tences/documents. Furthermore, the dual structure
The duality between language and knowledge is              between CIE and CIN tasks has been noticed and
natural for human intelligence. The human can ex-          utilized in (Chisholm et al., 2017).
tract knowledge from natural language to learn or              For the open-domain problem, the open-domain
remember, and then narrate the knowledge back              IE (OIE) task is to investigate how the natural
to natural language to communicate. Information            language sentences express the facts, and then
extraction (IE) is a task to simulate the first part       use the learned knowledge to extract entity and
of the duality, which is a long-term hot spot for          relation level intermediate structures from open-
NLP research. Recently, the task that fulfills the         domain sentences (Christensen et al., 2011; Et-
last part of the duality, that is, assembling a set        zioni et al., 2011; Schmitz et al., 2012; Pal and
of relation instances/facts or database records into       Mausam, 2016). Although the OIE task has at-
natural language sentences/documents, has also             tracted much interests and obtained many appli-
attracted many interests (Wiseman et al., 2017;            cations (Christensen et al., 2013, 2014; Mausam,
Chisholm et al., 2017; Agarwal and Dymetman,               2016; Stanovsky et al., 2015; Khot et al., 2017;
2017; Vougiouklis et al., 2017; Yin et al., 2016). In      Fader et al., 2014), the OIN task has not been
the literature, this task has been referred to as “data    stated, neither the duality between the language
to document generation” (Wiseman et al., 2017)             and knowledge in the open domain.
or “knowledge-to-text” (Chisholm et al., 2017). In
this paper, we name the task as information narra-                          Open-Domain        Closed-Domain
tion (IN), to emphasize the reverse relationship to          Extraction         OIE                 CIE
the information extraction (IE) task.                        Narration          OIN                 CIN
   The duality between language and knowledge
                                                           Table 1: Taxonomy: Tasks between knowledge and
(and thus between the IE and IN tasks) can be              natural language.
examined in closed-domain or open-domain. For
the closed-domain problem, the closed-domain IE              The tasks involved in the duality between lan-
(CIE) task is often referred to as “relation ex-           guage and knowledge is shown in Table 1, where

                                                      2119
      Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2119–2130
         Brussels, Belgium, October 31 - November 4, 2018. c 2018 Association for Computational Linguistics
the OIN task has not been stated. In this paper,         CIN agents face problems from different prob-
we focus on the OIN task and the duality between         lem domains, from people biographies to basket-
the OIE and OIN tasks, for following reasons: 1)         ball game records, but most of them follow the
the OIN task is an essential component for open-         same sequence-to-sequence pattern. First, the al-
domain information processing pipeline. For ex-          gorithm encodes a sequence of facts into a set
ample, it is helpful for building natural and in-        of annotations and then decodes the annotations
formative response for open-domain KBQA sys-             into a natural language text. Mechanisms such
tems (Khot et al., 2017; Fader et al., 2014). 2) (as     as attention (Bahdanau et al., 2014) and copy-
the results in this paper will illustrate) the duality   ing (Gu et al., 2016) are employed into the de-
between tasks can be valuable for building better        coder to improve the performance. Then, the mod-
agents for both tasks (Xia et al., 2017, 2016).          els are trained on a supervised dataset with back-
   A major historical obstacle for investigating         propagation.
the duality between OIE and OIN tasks is the                In this work, we adopt a similar sequence-to-
absence of parallel corpus between natural lan-          sequence architecture to build our baseline Ora-
guage sentence and open-domain facts. Recently,          tor agent, but with following differences: 1) the
the SAOKE dataset (Sun et al., 2018) was re-             Orator is proposed to narrate open domain facts,
leased, which contains more than forty thou-             where the encoder must encode words rather than
sand of human-labeled open-domain sentence-              the entities and relations in the closed domain; 2)
facts pairs, and thus essentially eliminates the ob-     the baseline Orator will be fine tuned using the
stacle for our investigation.                            dual learning algorithm proposed in this paper.
   The contribution of this paper lies in following
aspects:                                                 2.2   Dual Learning Systems
                                                         For many natural language processing tasks, there
    • We propose the concept of OIN task, which is       exist corresponding reverse/dual tasks. One ex-
      potentially an important component for open-       ample of a pair of dual problems is the question
      domain information pipeline. We develop the        answering (QA) and question generation (QG). In
      Orator agent to fulfill the task;                  (Tang et al., 2017), the duality between QA prob-
                                                         lem and QG problem was considered as a con-
    • We build a multi-agent system with Logician
                                                         straint that both problems must share the same
      and Orator to exploit the dual structure be-
                                                         joint probability. Then, a loss function that im-
      tween language and knowledge in open do-
                                                         plemented the constraint was involved in the su-
      main. Experimental results reveal that the
                                                         pervised learning procedures for both agents. Fur-
      dual information is beneficial for improving
                                                         thermore, researchers (Tang et al., 2017; Duan
      the performance of both agents.
                                                         et al., 2017; Sachan and Xing, 2018) use both
                                                         the question-answering agent and the question-
   The paper is organized as follows: Section 2
                                                         generation agent to identify extra high-confident
discusses the related work. Section 3 explains the
                                                         question-answering pairs, which are further used
Orator agent for OIN task. Section 4 describes the
                                                         to fine tune the pre-trained agents.
multi-agent system with Logician and Oration and
its algorithm to learn from the duality between lan-        Back-and-forth translation (or round-trip trans-
guage and sentence. The experimental results of          lation) 1 is another example of duality, in the field
the fine tuned agents are shown and discussed in         of machine translation. It has been employed to
Section 5. We conclude our work and discuss the          evaluate the quality of machine translation sys-
future direction in Section 6.                           tems (Van Zaanen and Zwarts, 2006), or to test
                                                         the suitability of text for machine translation (Gas-
2     Related Work                                       pari, 2006; Shigenobu, 2007). Recently, (Xia
                                                         et al., 2016) implemented the duality in a neural-
2.1    Information Narration                             based dual learning system, in which the quality
                                                         of each translation agent was improved on the un-
The closed-domain information narration (CIN)
                                                         labeled dataset using the rewards provided by its
task has been studied in (Wiseman et al., 2017;
Chisholm et al., 2017; Agarwal and Dymet-                  1
                                                             https://en.wikipedia.org/wiki/
man, 2017; Vougiouklis et al., 2017). These              Round-trip_translation

                                                     2120
Chinese                                      English Translation
 Sentence 他发明了有两个固定点(水的沸                               He invented a 100 degree centimeter temperature
          点和冰点)的100摄氏度计温尺,                             scale with two fixed points (the boiling point of wa-
          是世界大多数地区通用的摄氏温                               ter and the freezing point), which is the precursor of
          度计的前身。                                       the Celsius thermometer in most parts of the world.
 Facts    (他,发明,100摄氏度计温尺)(100摄                        (He, invented, the 100 degree Celsius temperature
          氏 度 计 温 尺,有,两 个 固 定 点)(水                     scale) (100 Celsius temperature scale, has, two fixed
          的[沸 点|冰 点],ISA,两 个 固 定                       points) (water [boiling point | freezing point], ISA,
          点)(100摄 氏 度 计 温 尺,是X的 前                      two fixed points) (100 degrees Celsius temperature
          身,世界大多数地区通用的摄氏温                              scale, is the predecessor of X, most Celsius ther-
          度计)                                          mometers in most parts of the world)

Table 2: An example sentence and the corresponding facts in the SAOKE dataset, where “ISA” is a symbol denoting
the “is-a” relationship in the SAOKE format.

corresponding dual agent, using the reinforcement        related rewards. Furthermore, our approach is
learning technique.                                      more adaptable than the weight sharing approach
                                                         adopted in (Chisholm et al., 2017).
   Parsing-reconstruction is also a pattern of du-
ality. (Konstas et al., 2017) considered the AMR         3       Orator
(Abstract Meaning Representation)(Banarescu
et al., 2013) parsing problem (text to AMR)              3.1      SAOKE Dataset
and AMR generation problem (AMR to text) in              Symbolic Aided Open Knowledge Expression
one system, in which the AMR parser generated            (SAOKE) is proposed in (Sun et al., 2018) as the
extra text-AMR pair data to fine tune the AMR            form to honestly record the facts that humans can
generator. The AMR generator, however, does not          extract from sentences when humans read them.
contribute to the performance improvement of the         SAOKE uses a unified form - an n-ary tuple:
AMR parser. The CIE agent and CIN agent in
(Chisholm et al., 2017) also follow this pattern,            (subject, predicate, object1 , · · · , objectN ),
where both agents help each other to improve
by sharing weights. Nevertheless, the sharing            to express four categories of facts: 1) Rela-
weight strategy cannot be applied to agents with         tion: Verb/preposition-based n-ary relations be-
different architecture, which is a typical situation     tween entity mentions; 2) Attribute: Nominal at-
in practice.                                             tributes for entity mentions; 3) Description: De-
                                                         scriptive phrases of entity mentions; 4) Concept:
   From these practices, it can be seen that the
                                                         Hyponymy and synonymy relations among con-
duality can be implemented with two major ap-
                                                         cepts and instances.
proaches: 1) by providing additional labeled sam-
                                                            Using this SAOKE format, Sun et al. (2018)
ples via bootstrapping, and 2) by adding losses or
                                                         manually labeled the SAOKE dataset DSAOKE by
rewards to the training procedure of the agents. In
                                                         crowdsourcing, which includes more than forty
this paper, we follow the second approach. We
                                                         thousand sentence-facts pairs < S, F >.2 The la-
design a set of rewards, among which some are
                                                         beling procedure is under the supervision of the
related to OIE and OIN tasks respectively, and
                                                         “Completeness” criterion (Sun et al., 2018), so the
some are related to the duality of the problems.
                                                         facts recorded information in the sentence as much
Then we optimize both agents using the reinforce-
                                                         as possible (only auxiliary information and rela-
ment learning technique. The learning algorithm
                                                         tion between facts are omitted (Sun et al., 2018)).
is similar to the dual-NMT algorithm described
                                                         As a result, the SAOKE dataset is a valid open-
in (Xia et al., 2016), but with adaption for the OIE
                                                         domain sentence-facts parallel dataset for both
and OIN tasks, especially on the task related re-
                                                         OIE and OIN tasks. Table 2 is an example from
wards. Compared to the approach of applying the
                                                         the SAOKE dataset for an easy understanding of
regularization about sharing the same joint prob-
                                                         the dual relationship between sentence and facts.
ability (Tang et al., 2017), our approach directly
                                                             2
optimizes the task objective by introducing task                 http://ai.baidu.com/broad

                                                    2121
3.2     Model                                               where pF is the probability of copying from F and
                                                            pV is the probability of selecting from V . The de-
The Orator is an agent O that assembles a set of
                                                            tails can be found in (Gu et al., 2016).
open-domain facts F into a sentence S with prob-
ability PO (S|F, ΘO ), where ΘO is the set of pa-           3.2.3 Coverage Mechanism
rameters of O :                                             To cope with the problem of information lost or
                                                            redundancy in the generated sentence, the copied
        Orator O: F       → PO (S|F, ΘO ).                  histories of previous generated words should be
                                                            remembered to guide future generation. This
   For each pair < S, F >∈ DSAOKE , the set                 could be done through the coverage mecha-
of facts F is actually expressed as a sequence of           nism (Tu et al., 2016), in which a coverage vec-
facts, in the order of the labeler wrote them. So,          tor mtj is introduced for each word wjF in F
the deep sequence to sequence paradigm is suit-             and updated at each step t as a gated function of
able to model the Orator. In this work, we build            hF                  t−1
                                                              j , αtj , st−1 , mj . By this means, the coverage
the base Orator model with the attention-based se-          vectors remember the historical attentions over
quence to sequence model, together with copy and            source sequence and can be incorporated in the
coverage mechanism, in a similar way of the im-             alignment model to generate complete and non-
plementation of the Logician in (Sun et al., 2018).         redundant sentences. Detailed formulations can be
                                                            found in (Tu et al., 2016) and (Sun et al., 2018).
3.2.1    Attention based Sequence-to-sequence
         Learning                                           4     Learning the Dual Structure between
The attention-based sequence-to-sequence learn-                   Knowledge and Natural Language
ing (Bahdanau et al., 2014) first encodes the in-
                                                            4.1    Dual Structure between Orator and
put fact sequence F (actually the sequence of Ne -
                                                                   Logician
dimensional word embedding vectors) into a Nh -
dimensional hidden states H F = [hF                  F      In (Sun et al., 2018), an agent L, called Logician,
                                        1 , · · · , hNS ]
using bi-directional GRU (Gated Recurrent Units)            was trained to convert a sentence S into a set of
network (Cho et al., 2014). Then, when gen-                 facts F with probability PL (F|S, ΘL ), where ΘL
erating word wt of the target sentence, the de-             is the set of parameters of L:
coder computes the probability of generating wt
                                                                   Logician L: S → PL (F|S, ΘL ).
by p(wt |{w1 , · · · , wt−1 }, ct ) = g(ht−1 , st , ct ),
where st is the hidden state of the GRU decoder,               Obviously, the Logician and Orator can coop-
g is the word generation model, and ct is the dy-           erate to supervise each other. Given < S, F >∈
namic context vector which focuses attention on             DSAOKE , the Logician produces a predicted set of
specific location l in the input hidden states H F .        facts F ∗ for the sentence S, and the Orator can
   For the Orator, we use the copy mechanism to             calculate the probability PO (S|F ∗ , ΘO ) of recon-
implement the word generation model g and use               struction S from F ∗ . Intuitively, if F ∗ loses major
the coverage mechanism to compute the dynamic               information of S, honestly reconstructing S from
context vector ct .                                         F ∗ would be impossible, and thus the probabil-
                                                            ity PO (S|F ∗ , ΘO ) would be small. Thus, it is a
3.2.2    Copy Mechanism
                                                            strong signal to evaluate the quality of F ∗ . Simi-
In the SAOKE dataset, the words in the set of facts         larly, when the Orator produces a sentence S ∗ for
(excluding the external symbols) must be in the             the set of facts F, the probability PL (F|S ∗ , ΘL )
corresponding sentence, so the problem is suitable          provided by the Logician is a strong signal for
to be modeled via the copy mechanism (Gu et al.,            evaluating the quality of S ∗ . These signals are
2016). In the copy mechanism, when the decoder              helpful to conquer several problems of the origi-
is considering generating a word wt , it can either         nal agents, including information lost, information
be copied from the source fact sequence F or se-            redundancy, and non-fluency.
lect from a vocabulary V :                                     Note      that    the     supervision       signals
                                                            PO (S|F ∗ , ΘO ) and PL (F|S ∗ , ΘL ) do not rely on
 p(wt |wt−1 , st , ct ) = pF (wt |wt−1 , st , ct ) +        any supervised parallel corpus. Thus, similar to
                             pV (wt |wt−1 , st , ct ),      the application of dual learning paradigm on NMT

                                                         2122
task (Xia et al., 2016), it is theoretically possible       4.2.2   Similarity Rewards
to use unparalleled sentences and sets of facts to          Since the SAOKE dataset has label information,
compute these signals. However, unsupervised                the similarities between the predicted results and
collections of fact-groups that can be reasonably           the ground truths can be used as rewards.
narrated in a sentence are not naturally available.           For the Orator, since the S ∗ can be viewed as
Currently, the only available collection is the             the summarization of S, we use the widely used
sets of facts provided by the SAOKE dataset,                ROUGE-L (Lin, 2004) measure in the text sum-
where the supervised information is available.              marization field to evaluate the quality of S ∗ :
As a result, we implement our dual learning
system in a supervised approach, which uses the                          SO = ROUGEL (S, S ∗ ).
reinforcement learning algorithm to optimize the
Orator and the Logician. The involved rewards                  For the Logician, we use following procedure to
are described in the next subsection, and then the          calculate the similarity between F and F ∗ . First,
algorithm is detailed in the last subsection.               we compute the similarity between each predicted
                                                            fact f ∗ ∈ F ∗ and each ground truth fact f ∈ F
4.2   Rewards                                               with following measure:
Given < S, F >∈ DSAOKE , we sample a set                                           Pmin(|f ∗ |,|f |)
of facts F ∗ from distribution PL (·|S, ΘL ) and a                       ∗            i=1        SimStr(fi∗ , fi )
                                                            SimF act(f , f ) =                                     ,
sentence S ∗ from distribution PO (·|F, ΘO ). Fol-                                          max(|f ∗ |, |f |)
lowing rewards are introduced into the proposed
dual learning system, and the relationships be-             where fi∗ and fi denote the i-th element of tu-
tween them are shown in Figure 1.                           ples of fact f ∗ and f , SimStr(·, ·) denotes the
                                                            gestalt pattern matching (Ratcliff and Metzener,
                                              VL (Fi )      1988) measure for two strings, and | · | is the car-
                                                            dinality function. Then, each predicted fact in
                                                            F ∗ is aligned to its corresponding ground-truth
                           Logician                         fact in F by solving a linear assignment prob-
         S                                     {Fi }        lem (Wikipedia, 2017) to maximize the sum of
                       RO (S, Fi )                          similarities between the aligned facts. Finally, the
                                                            similarity reward for the Logician is calculated by:
             SO (Sj , S)              SL (F, Fi )                                  P
                                                                          ∗           SimF act(f ∗, f )
                                                                   SL (F , F) =                         ,
                       RL (F, Sj )                                                   max(|F ∗ |, |F|)
       {Sj }                                    F           where f ∗ ∈ F ∗ , f ∈ F are aligned pair of facts.
                            Orator
                                                            4.2.3   Validity Rewards
      VO (Sj )                                              For the Orator, the output is expected as a valid
                                                            natural language sentence, so the validity reward
Figure 1: Illustration of the dual learning system of Lo-
                                                            can be defined as:
gician and Orator.
                                                                             VO (S ∗ ) = LM (S ∗ ),

                                                            where the LM (·) is a language model.
4.2.1 Reconstruction Rewards                                   For the Logician, the output should represent a
Following the idea described in above subsection,           valid collection of facts, which means: 1) the out-
we design the reconstruction reward for the Orator          put can be parsed into a collection of facts; 2) there
as:                                                         is no duplicated fact (identified by the SimF act
       RO (S ∗ , F) = log PL (F|S ∗ , ΘL ),                 value larger than 0.85) in the parsed collection.
                                                            The validity reward for Logician is defined as:
and that for the Logician as:
                                                                                 (
                                                                                   0     if F ∗ is valid;
         RL (F ∗ , S) = log PO (S|F ∗ , ΘO ).                              ∗
                                                                     VL (F ) =
                                                                                   −1 otherwise.

                                                         2123
Algorithm 1 A simple dual-learning algorithm for facts extraction and expression
Require:
    A set of sentence-facts pairs {< S, F >};
    An initial Logician L and an initial Orator O;
    Beam size K;
repeat

 1: Sample a sentence-facts pair < S, F >;                 1: Sample a sentence-facts pair < S, F >;
 2: Logician produces K sets of facts                      2: Orator produces K sentences S1 , · · · , SK
    F1 , · · · , FK from S via beam search;                   from F via beam search;
 3: for each set of facts Fi do                            3: for each sentence Si do
 4:     Compute the reward for Fi as:                      4:    Compute the reward for Si as:
      riF = α1 RO (S, Fi )+α2 VF (F)+α3 SF (F, Fi ).            riS = β1 RL (F, Si )+β2 VS (Si )+β3 SS (S, Si ).

 5: end for                                                5: end for
                                                              Compute the total reward r = K1 K       S
                                                                                              P
                                  1 PK
                                                                                                 i=1 ri ;
 6: Compute the total reward r = K          F              6:
                                       i=1 ri ;
 7: Compute the stochastic gradient of ΘL :                7: Compute the stochastic gradient of ΘO ,
                           K                                                         K
                        1 X F                                                     1 X S
       ∇ΘL Ê[r] =            ri DΘL (S, Fi )                    ∇ΘO Ê[r] =            ri DΘO (Si , F)
                        K                                                         K
                              i=1                                                        i=1

 8:   Compute the stochastic gradient of ΘO :              8:   Compute the stochastic gradient of ΘL :
                             K                                                         K
                        α1 X                                                      β1 X
          ∇ΘO Ê[r] =           DΘO (Fi , S)                        ∇ΘL Ê[r] =           DΘL (Fi , S)
                        K                                                         K
                              i=1                                                       i=1
 9:   Model updates:                                       9:   Model updates:
             ΘL ← ΘL + ηL · ∇ΘL Ê[r],                                ΘO ← ΘO + ηO · ∇ΘO Ê[r],
             ΘO ← ΘO + ηO · ∇ΘO Ê[r].                                 ΘL ← ΘL + ηL · ∇ΘL Ê[r].
until convergence

4.3     Algorithm                                      4.3.2 Learning from Facts to Sentence
For each pair < S, F >∈ DSAOKE , the following         We sample S ∗ from the Logician PO (·|F, ΘO )
procedures are performed respectively (details are     and define the total reward for S ∗ by:
shown in Algorithm 1):                                           rO = β1 RO (S ∗ , F) + β2 VO (S ∗ ) +
4.3.1 Learning from Sentence to Facts                                     β3 SO (S ∗ , S),
We sample F ∗ from the Logician PL (·|S, ΘL ) and      where
                                                               P
                                                                 βi = 1. The gradients can be computed
calculate the total reward for F ∗ by                  as follows:
      rL = α1 · RL (F ∗ , S) + α2 · VL (F ∗ ) +                   ∇ΘL E[rO ] = E[β1 DΘL (F, S ∗ )],
              α3 · SL (F ∗ , F),                                  ∇ΘO E[rO ] = E[rO DΘO (S ∗ , F)].
                                                          In practice, we use beam search (Sutskever
       P
where     αi = 1. The gradients of the expected
reward E[rL ] to the parameters of agents can be       et al., 2014) to obtain high-quality samples as F ∗
computed as follows, according to the policy gra-      and S ∗ , and estimate the true gradient with the em-
dient theorem (Sutton et al., 1999):                   pirical average of gradients over these samples.

         ∇ΘL E[rL ] = E[rL DΘL (F ∗ , S)],             5        Experimental Results
                                         ∗
        ∇ΘO E[rL ] = E[α1 DΘO (S, F )].                5.1       Experimental Design
where DΘL (F, S) = ∇ΘL log PL (F|S, ΘL ) and           First, we evaluate the performance of each agent
DΘO (S, F) = ∇ΘO log PO (S|F, ΘO ).                    fine-tuned by the dual learning procedure on the

                                                  2124
SAOKE dataset. Then we evaluate the Orator on            strings are concatenated with commas to form the
noisy facts, which accords with real OIN applica-        final sentence.
tion scenarios. Last, we investigate the behavior
                                                         5.1.3 Reward Implementation
of agents in the dual system.
   In the experiments, the SAOKE dataset is split        For the validity reward of the Orator, the lan-
into the training set, validating set and testing        guage model is trained using an RNN based
set with ratios of 80%, 10%, 10%, respectively.          method (Mikolov et al., 2010) with the same vo-
For each algorithm involved in the experiments,          cabulary V and the web pages from Baidu Baike
we perform grid search to find the optimal super-        website.
parameters, and the model with the best perfor-             For the reconstruction reward of the Orator,
mance on the validating set is chosen as the learnt      since the Logician needs the shallow tag and de-
model to be evaluated on the testing set.                pendency information of S ∗ as inputs, the infor-
                                                         mation is extracted using the LTP tool-set (Che
5.1.1 Evaluation Metric                                  et al., 2010) and then fed to the Logician.
For the Orator, BLEU-4 and ROUGE-L are used
                                                         5.1.4 Training
to measure how well the output matches the
ground truth sentence.                                   When training the base model for each agent, the
  For the Logician, based on the fact-equivalence        batch size is set to 20. When training two agents
judgment proposed in (Sun et al., 2018), we com-         in dual learning, the batch size is set to 12, and the
pute the Precision(P), Recall (R) and F1-score           beam size is set to 3. Both agents are trained using
over the testing set of the SAOKE dataset as the         the stochastic gradient descent (SGD) with RM-
evaluation metric.                                       SPROP strategy (Hinton et al., 2012) and early-
                                                         stop strategy on the validating set. In dual learn-
5.1.2 Agent Implementation                               ing, the super-parameters, including αi , βi , is de-
For the Orator, we make a vocabulary V with              termined by grid-search.
size 72,591 by collecting all web pages from
Baidu Baike website3 (a Chinese alternative to           5.2   Evaluation of Agents on the SAOKE
Wikipedia) and identifying the words occurred in               dataset
more than 100 web pages. For the Orator, the di-         First, we evaluate the performance of the agents
mension of embedding vectors is set to Ne = 256,         optimized by the dual learning method. To iden-
and the dimension of hidden states is set to Nh =        tify the contribution of the dual structure, we train
256. We use a three-layer bi-directional GRU with        another pair of agents with α1 = 0 and β1 = 0
dimension 128 as the encoder. All dimensions of          in Algorithm 1 to exclude the dual information.
hidden states in the decoder are set to 256.             Without the dual information, these two agents are
    For the Logician, we implement the model de-         trained independently to each other with reinforce-
scribed in (Sun et al., 2018), including the shallow     ment learning on their own supervised informa-
tag information and the gated dependency atten-          tion. We name these two agents as R-Logician and
tion mechanism.                                          R-Orator, where “R” means “Reinforced”. In the
    Furthermore, to provide an intuitive compre-         experimental results of this paper, the symbol at
hension of the OIN task, we implement a rule-            the top mark means that the marked result is sig-
based method for OIN task. For each sequence of          nificantly different (with p = 0.05) with the corre-
facts in the SAOKE dataset, the method first iden-       sponding result of the agent with the specific mark.
tifies the subsequences in which the facts share the
same subject. Then it preserves the subject of the             Methods        Precision    Recall     F1
first fact in each subsequence and removes the sub-           Logician∗       0.449        0.400      0.423
jects of following facts (by replacing it with an            R-Logician∓      0.462∗       0.432∗     0.446∗
empty string). It is necessary since the SAOKE              Logician@Dual     0.494∗∓      0.426∗     0.457∗∓
dataset requires the shared subject to be repeated
for completeness of the related facts. At last, each            Table 3: Performance of the Logicians.
fact is formatted into a string by filling the objects
                                                           The experimental results for the Logician
into the placeholders of the predicate and these
                                                         agents are shown in Table 3, from which we can
   3
       http://baike.baidu.com                            observe a significant performance improvement

                                                     2125
from Logician to R-Logician and also from R-              5.4    Evaluation of the Dual System
Logician to Logician@Dual. The experimental re-           In this section, we investigate the behavior of
sults for the Orator agents are shown in Table 4.         agents in the dual system. We first examine the
The neural based Orator agents significantly out-                         Orator        Logician
                                                          procedure F −−−−→ S ∗ −−−−−→ F ∗∗ , that is, for
perform the rule-based agent. For both evalua-
                                                          each F in the testing set of the SAOKE dataset, let
tion metric, the R-Orator and Orator@Dual are
                                                          the Orator narrate it into a sentence S ∗ , and then
both significantly outperform the original Orator.
                                                          let the Logician to extract facts F ∗∗ from S ∗ . Then
The Orator@Dual significantly outperforms the
                                                          the quality of F ∗∗ is measured by comparing it
R-Orator on the BLEU-4 score, but is not signifi-                                           Logician        Orator
cantly different on the ROUGE-L score.                    with F. Then we examine S −−−−−→ F ∗ −−−−→
                                                          S ∗∗ , which is the reverse procedure. The compar-
        Methods        BLEU-4       ROUGE-L               ison is made between the family of base agents
         Rule?         0.257        0.434                 and that of the dual-trained agents. The results are
        Orator∗        0.401?       0.556?                shown in Table 6, and two instance of these two
       R-Orator∓       0.405?∗      0.559?∗               experiments are shown in Table 7 and 8 respec-
      Orator@Dual      0.419?∗∓     0.559?∗               tively. From these results, we can observe large
                                                          improvements of reconstruction quality on both
        Table 4: Performance of the Orators.              directions.
                                                                                   Orator     Logician
   By comparing the performance of R-agents                                 F −−−−→ S ∗ −−−−−→ F ∗∗
and the agents@Dual, we can observe that                        Methods     Precision Recall   F1
agents@Dual generally achieve better perfor-                     Base∗      0.574     0.488  0.527
mance on precision, but may recall less informa-                 Dual       0.657∗    0.565∗ 0.608∗
tion, resulting in smaller advances in the balanced                            Logician            Orator
                                                                            S −−−−−→ F ∗ −−−−→ S ∗∗
evaluation metric (F1 and ROUGE-L). This may
                                                                Methods     BLEU-4     ROUGE-L
imply that the agents tend to provide easy input
                                                                 Base∗      0.428        0.565
for each other for higher accuracy, by neglecting                                ∗
                                                                 Dual       0.635        0.635∗
some difficult part of the problem which they cur-
rently cannot handle properly. This interesting
                                                          Table 6: Reconstruction performance for the Logicians
phenomenon is the subject of our future research.         and the Orators.
5.3   Evaluation of Orator on Noisy Facts                 6     Conclusion
Experiments in the previous subsection show the
                                                          In this paper, we investigate the OIN task and its
performance of Orators to narrate a set of human-
                                                          duality to the OIE task. The proposed Orator has
labeled facts. In practice, however, the input to
                                                          shown its ability to fulfill the OIN task, that is, as-
the Orator might not be the human-labeled perfect
                                                          sembling open-domain facts into high quality sen-
facts, but some noisy facts automatically extracted
                                                          tences. Furthermore, our attempt to utilize the du-
by OIE algorithms. In this subsection, we make
                                                          ality between OIN and OIE tasks for improving
a collection of sets of noisy facts by feeding the
                                                          the performances for both OIN and OIE agents ac-
sentences in the testing set of the SAOKE dataset
                                                          complishes a preliminary success.
to the base Logician model and collecting the out-
                                                             Our work suggests at least three future research
puts. Then we evaluate the series of Orator models
                                                          topics: Firstly, one can enrich the theoretical study
on these noisy facts, and report their performance
                                                          of the duality between the OIE and OIN tasks.
at Table 5, from which we can see the performance
                                                          Secondly, one can investigate how to conquer the
improvement from the Orator to Orator@Dual.
                                                          barrier of the absence of an extensive collection of
        Methods        BLEU-4       ROUGE-L               reasonable sets of open-domain facts and incorpo-
        Orator∗        0.428        0.565                 rate unsupervised information into this Logician-
       R-Orator∓       0.431∗       0.567∗                Orator dual learning structures for further im-
      Orator@Dual      0.458∗∓      0.572∗∓               provement. Lastly, one can also interested in de-
                                                          veloping task-oriented rewards for adapting the
 Table 5: Performance of the Orators on noisy facts.      agent to a specific task, for example, the answer
                                                          generation task for open-domain KBQA system.

                                                       2126
S         大综货物吞吐量均保持两位数增长,其中铁矿石吞吐量突破4500万吨,木材吞吐量
                 突破600万立方,均创历史新高,成为全国进口木材第一大港。
  S in English   The throughput of all integrated cargoes kept a double-digit growth. Among them, the
                 throughput of iron ore exceeded 45 million tons and the throughput of timber exceeded 6
                 million cubic meters, all of which hit record highs, and became the country’s largest port of
                 timber imports.
                                 Base Models                                       Dual Models
   Logician
S −−−−−−→ F ∗    (大综货物吞吐量, 保持, 两位数增长) (铁                           (大综货物吞吐量, 均保持, 两位数增长) (铁
                 矿石吞吐量, 突破, 4500万吨) (木材吞吐量,                        矿石吞吐量, 突破, 4500万吨) (木材吞吐量,
                 突破, 600万立方) (创历史, DESC, 新高) (_,                   突破, 600万立方) (_, 均创, 历史新高) (_, 成
                 成为, 全国进口木材第一大港)                                   为, 全国进口木材第一大港)
   Logician
S −−−−−−→ F ∗    (The throughput of all integrated cargoes,        (The throughput of integrated cargoes, all
   in English    kept, double-digits growth) (Throughput           kept, double-digit growth) (Throughput
                 of iron ore, exceeded, 45 million tons)           of iron ore, exceeded, 45 million tons)
                 (Throughput of wood, breakthrough, 6 mil-         (Throughput of timber, exceeded 6, million
                 lion cubic meters) (Hit historical, DESC, new     cubic meters) (_, all hit, record highs) (_,
                 high) (_, become, the country’s largest port of   become, the country’s largest port of timber
                 timber imports)                                   imports)
     Logician
 S −−−−−−→       大综货物吞吐量保持两位数增长,突                                  大综货物吞吐量均保持两位数增长,铁
  ∗ Orator
F −−−−→ S ∗∗     破4500万吨,突破600万立方,新高,成                             矿石吞吐量突破4500万吨,木材吞吐量突
                 为全国进口木材第一大港。                                      破600万立方,均创历史新高,成为全国
                                                                   进口木材第一大港。
     Logician
 S −−−−−−→       The throughput of all integrated cargoes          The throughput of all integrated cargoes kept
  ∗ Orator
F −−−−→ S ∗∗     kept double-digit growth, breaking 45 million     double-digit growth. The throughput of iron
  in English     tons, breaking 6 million cubic meters, a new      ore exceeded 45 million tons and the through-
                 high, became the country’s largest port of tim-   put of timber exceeded 6 million cubic me-
                 ber imports.                                      ters, all hit a record high and became the
                                                                   country’s largest port of timber imports.

                          Table 7: Examples for sentence reconstruction.

       F         (营业法人, 具备, 中专(高中)以上学历) (营业法人, 有, 一定的[管理|经营]能力) (营
                 业法人, 具备, [较强的服务理念|团队合作精神])
  F in English   (Business legal person, is with, technical secondary school (high school) or above) (Busi-
                 ness legal person, has, certain [managerial | operational] ability) (Business legal person, is
                 with, [strong philosophy of service | teamwork spirit])
                                  Base Models                                        Dual Models
    Orator
 F −−−−→ S ∗     营业法人具备中专(高中)以上学历,有                                营业法人具备中专(高中)以上学历,营
                 一定的管理和经营能力,具备较强的服务                                业法人有一定的管理和经营能力,具备较
                 理念和团队合作精神。                                        强的服务理念和团队合作精神。
   Orator
F −−−−→ S ∗ in   Business legal person is with secondary (high     Business legal person is with secondary (high
   English       school) or above, has certain managerial and      school) or above, business legal person has
                 operational capabilities, and is with strong      certain managerial and operational capabili-
                 philosophy of service and teamwork spirit.        ties, and is with strong philosophy of service
                                                                   and teamwork spirit.
      Orator
  F −−−−→        (营业法人具备中专(高中)以上学历, 有,                             (营业法人, 具备, 中专(高中)以上学历)
 ∗ Logician
S −−−−−−→ F ∗∗   一定的[管理|经营]能力) (营业法人具备中                            (营业法人,有,[一定的管理|经营能力]) (营
                 专的[管理|经营]能力, 具备, 较强的[服务理                          业法人, 具备, 较强的[服务理念|团队合作
                 念|团队合作精神])                                        精神])
      Orator
  F −−−−→        (Business legal person with technical sec-        (Business legal person, is with, technical sec-
 ∗ Logician
S −−−−−−→ F ∗∗   ondary school (high school) or above, has,        ondary school (high school) or above) (Busi-
   in English    certain [managerial | operational] ability)       ness legal person, has, [certain managerial |
                 (Business legal person with technical sec-        operational ability]) (Business legal person,
                 ondary school (high school) or above, is          is with, [strong philosophy of service | team-
                 with, [strong philosophy of service | team-       work spirit])
                 work spirit])

                             Table 8: Examples for fact reconstruction.

                                                   2127
References                                                Janara Christensen, Stephen Soderland, and Gagan
                                                             Bansal. 2014. Hierarchical Summarization: Scaling
Shubham Agarwal and Marc Dymetman. 2017. A                   Up Multi-Document Summarization. In Proceed-
  Surprisingly Effective Out-of-the-Box Char2char            ings of the 52nd Annual Meeting of the Association
  Model on the E2E NLG Challenge Dataset. In Pro-            for Computational Linguistics, pages 902–912.
  ceedings of the 18th Annual SIGdial Meeting on Dis-
  course and Dialogue, August, pages 158–163.             Nan Duan, Duyu Tang, Peng Chen, and Ming Zhou.
                                                            2017. Question Generation for Question Answer-
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens
                                                            ing. In Proceedings of the 2017 Conference on Em-
  Lehmann, Richard Cyganiak, and Zachary Ives.
                                                            pirical Methods in Natural Language Processing,
  2007. DBpedia: A Nucleus for a Web of Open Data.
                                                            pages 866–874.
  The Semantic Web, 4825 LNCS:722–735.
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben-          Oren Etzioni, Anthony Fader, Janara Christensen,
  gio. 2014. Neural Machine Translation by Jointly          Stephen Soderland, and Mausam. 2011. Open In-
  Learning to Align and Translate. In International         formation Extraction: The Second Generation. In
  Conference on Learning Representations.                   Proceeding of International Joint Conference on Ar-
                                                            tificial Intelligence, pages 3–10.
Laura Banarescu, Claire Bonial, Shu Cai, Madalina
  Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin          Anthony Fader, Luke S Zettlemoyer, and Oren Et-
  Knight, Philipp Koehn, Martha Palmer, and Nathan          zioni. 2014. Open Question Answering Over Cu-
  Schneider. 2013. Abstract Meaning Representation          rated and Extracted Knowledge Bases. In Proceed-
  for Sembanking. Proceedings of the 7th Linguistic         ings of the 20th ACM SIGKDD International Con-
  Annotation Workshop and Interoperability with Dis-        ference on Knowledge Discovery and Data Mining,
  course, pages 178–186.                                    pages 1156–1165.

Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim        Federico Gaspari. 2006. Look Who’s Translating: Im-
  Sturge, and Jamie Taylor. 2008. Freebase: a Col-          personations, Chinese Whispers and Fun with Ma-
  laboratively Created Graph database for Structuring       chine Translation on the Internet. In EAMT-2006:
  Human Knowledge. In Proceedings of the 2008               11th Annual Conference of the European Associa-
  ACM SIGMOD International Conference on Man-               tion for Machine Translation, pages 19–20.
  agement of Data, pages 1247–1250. ACM.
                                                          Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O.K.
Wanxiang Che, Zhenghua Li, and Ting Liu. 2010. LTP:          Li. 2016. Incorporating Copying Mechanism in
  A Chinese Language Technology Platform. In Pro-            Sequence-to-Sequence Learning. In Proceedings of
  ceedings of the Coling, pages 13–16.                       the 54th Annual Meeting of the Association for Com-
                                                             putational Linguistics, pages 1631–1640.
Andrew Chisholm, Will Radford, and Ben Hachey.
  2017. Learning to Generate One-sentence Biogra-         Geoffrey Hinton, Nitish Srivastava, and Kevin Swer-
  phies from Wikidata. In Proceedings of the 15th           sky. 2012. Overview of Mini-batch Gradient De-
  Conference of the European Chapter of the Associa-        scent. Technical report.
  tion for Computational Linguistics, volume 1, pages
  633–642.                                                Nanda Kambhatla. 2004. Combining Lexical, Syntac-
                                                            tic, and Semantic Features with Maximum Entropy
Kyunghyun Cho, Bart van Merrienboer, Caglar Gul-            Models for Extracting Relations. In Proceedings of
  cehre, Dzmitry Bahdanau, Fethi Bougares, Hol-             the ACL 2004 on Interactive Poster and Demonstra-
  ger Schwenk, and Yoshua Bengio. 2014. Learn-              tion Sessions.
  ing Phrase Representations using RNN Encoder-
  Decoder for Statistical Machine Translation. In Pro-    Tushar Khot, Ashish Sabharwal, and Peter Clark. 2017.
  ceedings of the 2014 Conference on Empirical Meth-        Answering Complex Questions Using Open Infor-
  ods in Natural Language Processing, pages 1724–           mation Extraction. In Proceedings of the 55th An-
  1734.                                                     nual Meeting of the Association for Computational
                                                            Linguistics, pages 311—-316.
Janara Christensen, Mausam, Stephen Soderland, and
   Oren Etzioni. 2011. An Analysis of Open Informa-       Ioannis Konstas, Srinivasan Iyer, Mark Yatskar, Yejin
   tion Extraction based on Semantic Role Labeling.          Choi, and Luke Zettlemoyer. 2017. Neural amr:
   In Proceedings of the Sixth International Conference      Sequence-to-sequence models for parsing and gen-
   on Knowledge Capture, pages 113–120. ACM Press.           eration. In Proceedings of the 55th Annual Meet-
                                                             ing of the Association for Computational Linguis-
Janara Christensen, Mausam, Stephen Soderland, Oren          tics, pages 146–157. Association for Computational
   Etzioni, Mausam, Stephen Soderland, and Oren Et-          Linguistics.
   zioni. 2013. Towards Coherent Multi-Document
   Summarization. In Proceedings of the 2013 Con-         C Y Lin. 2004. Rouge: A Package for Automatic Eval-
   ference of the North American Chapter of the Asso-       uation of Summaries. In Proceedings of the 2004
   ciation for Computational Linguistics: Human Lan-        ACL Workshop on Text Summarization Branches
   guage Technologies, Section 3, pages 1163–1173.          Out, 1, pages 25–26.

                                                      2128
Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan,         Mingming Sun, Xu Li, Xin Wang, Miao Fan, Yue Feng,
  and Maosong Sun. 2016. Neural Relation Extrac-            and Ping Li. 2018. Logician: A Unified End-to-End
  tion with Selective Attention over Instances. In Pro-     Neural Approach for Open-Domain Information Ex-
  ceedings of the 54th Annual Meeting of the Asso-          traction. In Proceedings of the Eleventh ACM In-
  ciation for Computational Linguistics, pages 2124–        ternational Conference on Web Search and Data
  2133.                                                     Mining, February, pages 556–564, New York, New
                                                            York, USA. ACM Press.
Mausam. 2016. Open Information Extraction Systems
 and Downstream Applications. In Proceedings of           Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014.
 the 25th International Joint Conference on Artificial       Sequence to Sequence Learning with Neural Net-
 Intelligence, pages 4074–4077.                              works. In Advances in Neural Information Process-
                                                             ing Systems 27, volume 155, page 9.
T Mikolov, M Karafiat, L Burget, J Cernocky, and
  S Khudanpur. 2010. Recurrent Neural Network             Richard S. Sutton, David Mcallester, Satinder Singh,
  based Language Model. In Proceedings of Inter-            and Yishay Mansour. 1999. Policy Gradient Meth-
  speech, September, pages 1045–1048.                       ods for Reinforcement Learning with Function Ap-
                                                            proximation. In Advances in Neural Information
Mike Mintz, Steven Bills, Rion Snow, and Dan Juraf-         Processing Systems 12, pages 1057–1063.
  sky. 2009. Distant Supervision for Relation Extrac-
  tion without Labeled Data. In Proceedings of the        Duyu Tang, Nan Duan, Tao Qin, Zhao Yan, and
  Joint Conference of the 47th Annual Meeting of the        Ming Zhou. 2017. Question answering and ques-
  ACL and the 4th International Joint Conference on         tion generation as dual tasks. arXiv preprint
  Natural Language, volume 2, page 1003, Morris-            arXiv:1706.02027.
  town, NJ, USA. Association for Computational Lin-
  guistics.                                               Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu,
                                                            and Hang Li. 2016. Modeling Coverage for Neural
Makoto Miwa and Mohit Bansal. 2016. End-to-End              Machine Translation. In Proceedings of the Annual
 Relation Extraction using LSTMs on Sequences and           Meeting of the Association for Computational Lin-
 Tree Structures. In Proceedings of the 54th Annual         guistics, pages 76–85.
 Meeting of the Association for Computational Lin-
 guistics, pages 1105–1116, Stroudsburg, PA, USA.         Menno Van Zaanen and Simon Zwarts. 2006. Unsu-
 Association for Computational Linguistics.                pervised Measurement of Translation Quality using
                                                           Multi-engine, Bi-directional Translation. In Aus-
Harinder Pal and Mausam. 2016. Demonyms and                tralasian Joint Conference on Artificial Intelligence,
  Compound Relational Nouns in Nominal Open IE.            pages 1208–1214. Springer.
  In Proceedings of the 5th Workshop on Automated
  Knowledge Base Construction, pages 35–39.               Pavlos Vougiouklis, Hady Elsahar, Lucie-Aimée
                                                            Kaffee, Christoph Gravier, Frederique Laforest,
John W Ratcliff and David E Metzener. 1988. Pattern         Jonathon Hare, and Elena Simperl. 2017. Neural
  Matching: The Gestalt Approach. Dr Dobb’s, 13(7).         Wikipedian: Generating Textual Summaries from
                                                            Knowledge Base Triples. Journal of Web Seman-
Mrinmaya Sachan and Eric Xing. 2018. Self-Training          tics: Science, Services and Agents on the World Wide
  for Jointly Learning to Ask and Answer Questions.         Web.
  In Proceedings of the 2018 Conference of the North
  American Chapter of the Association for Computa-        Wikipedia. 2017. Assignment problem— Wikipedia,
  tional Linguistics: Human Language Technologies,          The Free Encyclopedia.
  volume 1, pages 629–640.
                                                          Sam Wiseman, Stuart M. Shieber, and Alexander M.
Michael Schmitz, Robert Bart, Stephen Soderland, and        Rush. 2017. Challenges in Data-to-Document Gen-
  Oren Etzioni. 2012. Open language learning for in-        eration. In Proceedings of the 2017 Conference on
  formation extraction. In Proceedings of the 2012          Empirical Methods in Natural Language Process-
  Joint Conference on Empirical Methods in Natural          ing, pages 2243–2253.
  Language Processing and Computational Natural
  Language Learning, pages 523–534.                       Yingce Xia, Di He, Tao Qin, Liwei Wang, Nenghai Yu,
                                                            Tie-Yan Liu, and Wei-Ying Ma. 2016. Dual Learn-
Tomohiro Shigenobu. 2007. Evaluation and Usability          ing for Machine Translation. In Advances in Neural
  of Back Translation for Intercultural Communica-          Information Processing Systems 29, pages 1–9.
  tion. In International Conference on Usability and
  Internationalization, pages 259–265. Springer.          Yingce Xia, Tao Qin, Wei Chen, Jiang Bian, Nenghai
                                                            Yu, and Tie-Yan Liu. 2017. Dual Supervised Learn-
Gabriel Stanovsky, Ido Dagan, and Mausam. 2015.             ing. In Proceedings of 34th International Confer-
  Open IE as an Intermediate Structure for Semantic         ence on Machine Learning.
  Tasks. In Proceedings of the 53rd Annual Meeting of
  the Association for Computational Linguistics and       Jun Yin, Xin Jiang, Zhengdong Lu, Lifeng Shang,
  the 7th International Joint Conference on Natural         Hang Li, and Xiaoming Li. 2016. Neural Gen-
  Language Processing, pages 303–308.                       erative Question Answering. In Proceedings of

                                                      2129
2016 NAACL Human-Computer Question Answer-
  ing Workshop, pages 36–42.
Dmitry Zelenko, Chinatsu Aone, Anthony Richardella,
 Jaz Kandola, Thomas Hofmann, Tomaso Poggio,
 and John Shawe-Taylor. 2003. Kernel Methods for
 Relation Extraction. Journal of Machine Learning
 Research, 3:1083–1106.
Suncong Zheng, Feng Wang, Hongyun Bao, Yuexing
  Hao, Peng Zhou, and Bo Xu. 2017. Joint Extraction
  of Entities and Relations Based on a Novel Tagging
  Scheme. In Proceedings of the 55th Annual Meet-
  ing of the Association for Computational Linguis-
  tics, pages 1227–1236.

                                                   2130
You can also read