Jointly Extracting Explicit and Implicit Relational Triples with Reasoning Pattern Enhanced Binary Pointer Network

Page created by Shirley Campos
 
CONTINUE READING
Jointly Extracting Explicit and Implicit Relational Triples with
 Reasoning Pattern Enhanced Binary Pointer Network
 Yubo Chen† , Yunqi Zhang† , Changran Hu‡ , Yongfeng Huang†
 †
 Department of Electronic Engineering & BNRist, Tsinghua University, Beijing, China
 ‡
 University of California, Berkeley, USA
 ybch14@gmail.com zhang-yq15@outlook.com
 changran_hu@berkeley.edu yfhuang@tsinghua.edu.cn

 Abstract Work for Locate in

 Relational triple extraction is a crucial task … Mark Spencer, a designer of Digium, a company in Huntsville …
 for knowledge graph construction. Existing Live in
 methods mainly focused on explicit relational
 triples that are directly expressed, but usually Person Organization Location Explict Triple Implicit Triple

 suffer from ignoring implicit triples that lack
 explicit expressions. This will lead to serious Figure 1: An example of explicit and implicit relational
 incompleteness of the constructed knowledge triples. Italic phrases are key relational expressions cor-
 graphs. Fortunately, other triples in the sen- responding to the explicit relational triples.
 tence provide supplementary information for
 discovering entity pairs that may have implicit
 relations. Also, the relation types between the with feature-based methods (Yu and Lam, 2010; Li
 implicitly connected entity pairs can be iden- and Ji, 2014; Ren et al., 2017). Afterward, neural
 tified with relational reasoning patterns in the network-based models were proposed to eliminate
 real world. In this paper, we propose a uni-
 hand-crafted features (Gupta et al., 2016; Zheng
 fied framework to jointly extract explicit and
 implicit relational triples. To explore entity et al., 2017). More recently, several methods were
 pairs that may be implicitly connected by rela- proposed to extract overlapping triples, such as
 tions, we propose a binary pointer network to tagging-based (Dai et al., 2019; Wei et al., 2020),
 extract overlapping relational triples relevant graph-based (Wang et al., 2018; Fu et al., 2019),
 to each word sequentially and retain the infor- copy-based (Zeng et al., 2018, 2019, 2020) and
 mation of previously extracted triples in an ex- token pair linking models (Wang et al., 2020).
 ternal memory. To infer the relation types of
 Existing models achieved considerable success
 implicit relational triples, we propose to intro-
 duce real-world relational reasoning patterns
 on extracting explicit triples which have direct rela-
 in our model and capture these patterns with tional expressions in the sentence. However, there
 a relation network. We conduct experiments are many implicit relational triples that are not ex-
 on several benchmark datasets, and the results plicitly expressed. For example, in Figure 1, the
 prove the validity of our method. explicit triples are strongly indicated by the key
 relational phrases, but the implicit relation “Live in”
1 Introduction
 is not expressed explicitly. Unfortunately, existing
Relational triple extraction is defined as automat- methods usually ignored implicit triples (Zhu et al.,
ically recognizing semantic relations with triple 2019), which will cause serious incompleteness of
structures (subject, relation, object) among multi- the constructed KGs and performance degradation
ple entities in a sentence. It is a critical task for of downstream tasks (Angeli and Manning, 2013;
constructing Knowledge Graphs (KGs) from unla- Jia et al., 2020; Jun et al., 2020).
beled corpus (Dong et al., 2014). Our work is motivated by several observations.
 Early work of relational triple extraction ap- First, other relational triples within a sentence pro-
plied pipeline methods (Zelenko et al., 2003; Chan vide supplementary information for discovering
and Roth, 2011), which ran entity recognition and entity pairs that may have implicit relational con-
relation classification separately. However, such nections. For example, in Figure 1, the explicit
pipeline approaches suffered from error propaga- triples establish a relational connection between
tion. To address this issue, recent work proposed “Mark Spencer” and “Huntsville” through the in-
to jointly extract entity and relations from the text termediate entity “Digium”. Second, the relation
 5694
 Proceedings of the 2021 Conference of the North American Chapter of the
 Association for Computational Linguistics: Human Language Technologies, pages 5694–5703
 June 6–11, 2021. ©2021 Association for Computational Linguistics
types of implicit relation triples can be derived interactions between entities and relations.
through real-world reasoning patterns. For exam- To overcome these drawbacks, recent research
ple, in Figure 1, the reasoning pattern “one lives focused on jointly extracting entities and relations,
where the company he works for is located” helps including feature-based models (Yu and Lam, 2010;
identify the type of the implicit triple as “Live in”. Li and Ji, 2014; Ren et al., 2017) and neural
 In this paper, we propose a unified framework network-based models (Gupta et al., 2016; Miwa
for the joint extraction of explicit and implicit re- and Bansal, 2016; Zheng et al., 2017). For example,
lational triples. We propose a Binary Pointer Net- Ren et al. (2017) proposed to jointly embed enti-
work (BPtrNet), which is based on the pointer net- ties, relations, text features and type labels into two
work (Vinyals et al., 2015), to extract overlapping low-dimensional spaces. Miwa and Bansal (2016)
relational triples relevant to each word sequentially. proposed a joint model containing two long-short
To discover implicitly connected entity pairs, we term memories (LSTMs) (Gers et al., 2000) with
preserve the information of previously extracted shared parameters. Zheng et al. (2017) proposed
triples in an external memory and use it to enhance to extract relational triples directly by transforming
the extraction of later time steps. To infer the rela- this task into a sequence tagging problem, whose
tion types between the implicitly connected entity tags contain the information of entities and the re-
pairs, we propose to augment our model with real- lations they hold. However, they only assigned
world relational reasoning patterns and capture the one label for each word, which means that this
relational inference logic with a Relation Network method failed to extract overlapping triples. Subse-
(RN) (Santoro et al., 2017). The RN obtains a quent work proposed several mechanisms to solve
pattern-enhanced representation from the memory this problem: (1) labeling tagging sequences for
for each word pair. Then the Reasoning pattern words (Dai et al., 2019) or entities (Yu et al., 2019;
enhanced BPtrNet (R-BPtrNet) uses the word pair Wei et al., 2020); (2) transforming the sentence
representation to compute a binary score for each into a graph structure (Wang et al., 2018; Fu et al.,
candidate triple. Finally, triples with positive scores 2019); (3) generating triple element sequences with
are output as the extraction result. copy mechanism (Zeng et al., 2018, 2019, 2020;
 The main contributions of this paper are: Nayak and Ng, 2020); (4) linking token pairs with
 • We propose a unified framework to jointly a handshake tagging scheme (Wang et al., 2020).
 extract explicit and implicit relational triples. However, these methods usually ignored implicit
 • To discover entity pairs that are implicitly con- relational triples that are not directly expressed in
 nected by relations, we propose a BPtrNet the sentence (Zhu et al., 2019), thus will lead to the
 model to extract overlapping relational triples incompleteness of the resulting KGs and negatively
 sequentially and utilize an internal memory to affect the performance of downstream tasks (An-
 retain the extracted triples. geli and Manning, 2013; Jia et al., 2020).
 • To enhance the relation type inference of im- Our work is motivated by two observations.
 plicitly connected entity pairs, we propose to First, other triples in the sentence provide supple-
 introduce relational reasoning patterns, cap- mentary evidence for discovering entity pairs with
 tured with a RN, to augment our model. implicit relational connections. Second, the rela-
 • We conduct experiments on several bench- tion types of the implicit connections need to be
 mark datasets and the experimental results identified through real-world reasoning patterns.
 demonstrate the validity of our method. In this paper, we propose a unified framework
 for the joint extraction of explicit and implicit rela-
2 Related Work
 tional triples. We propose a binary pointer network
Early work of relational triple extraction addressed to sequentially extract overlapping relational triples
this task in a pipelined manner (Zelenko et al., and externally keep the information of predicted
2003; Zhou et al., 2005; Chan and Roth, 2011; triples for exploring implicitly connected entity
Gormley et al., 2015). They first ran named entity pairs. We also propose to introduce real-world rea-
recognition to identify all entities and then classi- soning patterns in our model to help derive the rela-
fied relations between all entity pairs. However, tion type of implicit triples with a relation network.
these pipelined methods usually suffered from er- Experimental results on several benchmark datasets
ror propagation problem and failed to capture the demonstrate the effectiveness of our method.
 5695
Extracted Triples (i=n)
 End Tags O PER O ORG O LOC
 Spencer Digium Start Tags PER O O ORG O LOC
 Live in Locate in ...
 Hunt. Hunt.
 Inputs Output
 Work for 0 0 0 0 0 0 0 0 0 0 0 0 Mark Spencer ... Digium ... Hunt.
 Linear
 Locate in 0 0 0 0 0 0 1 0 0 0 0 0

 BPtrNet Decoder
 Hunt.

 Entity Tagger
 Gate
 Live in 0 0 0 1 0 0 0 0 0 0 0 0
 ... ... ... ...
 .9 .2 .1
 …… 0 0 0 0 0 0 0 0 0 0 0 0
 Linear
 ...
 ... Micro.

 Relation Network

 Spencer ... Spencer
 Work for ... Work for
 Digium ... Micro.
 (i=n-5) (i=...) (i=1) ...
 Word Encoder
 External Memory 
 Micro. ... Mark Spencer ... of Digium , a com. in Hunt.
 , a com. in Hunt.
 BPtrNet Encoder
 LM embeddings p Word representations x Reasoning representations hP Relation type embeddings r
 Char. representations c Encoder hidden states hE Decoder hidden states hD LSTM cells
 Word embeddings w Global hidden state g Entity tag embeddings e

 Figure 2: The overall framework of our approach.

3 Our Approach sequence-to-sequence (seq2seq) model.
 Therefore, we propose a Binary Pointer Net-
The overall framework of our approach is shown work (BPtrNet), based on a seq2seq pointer net-
in Figure 2. We introduce the Binary Pointer Net- work (Vinyals et al., 2015), to jointly extract ex-
work (BPtrNet) and the Relation Network (RN) in plicit and implicit relational triples. Our model
Section 3.1 and 3.2 and the details of training and first encodes the words of a sentence into vector
inference in Section 3.3, respectively. representations (Section 3.1.1). Then, we use a bi-
 nary decoder to sequentially transform the vectors
3.1 Binary Pointer Network into (overlapping) relational triples (Section 3.1.2).
Existing methods usually failed to extract implicit We also introduce an external memory to retain
relational triples due to the lack of explicit expres- previously extracted triples for enhancing future
sions (Zhu et al., 2019). Fortunately, we observe decoding steps (Section 3.1.3).
that other triples in the sentence can help discover
 3.1.1 Encoder
entity pairs that may have implicit relational con-
nections. For instance, in the sentence “George is Given a sentence [w1 , . . . , wn ], we first capture
Judy’s father and David’s grandfather”, the rela- morphological patterns of entities with a convolu-
tion between Judy and David is not explicitly ex- tional neural network (CNN) (LeCun et al., 1989)
pressed. In this case, if we first extract the explicit and compute the character representation ci of the
triple (Judy, father, George) and keep its informa- word wi (i = 1, . . . , n): ci = CNN(wi ; θ) ∈ Rdc .
tion in our model, we can easily establish an im- Then we introduce the context-sensitive representa-
plicit connection between Judy and David through tions p1:n captured with a pre-trained Language
George because George is explicitly connected Model (LM) to bring rich semantics and prior
with David by the relational keyword “grandfa- knowledge from the large-scale unlabeled corpus.
ther”. Inspired by this observation, our model ex- We feed ci , pi and the word embedding wi into a
tracts relational triples relevant to each word se- bidirectional LSTM (BiLSTM) to compute the con-
quentially and keeps all previous triples of this textualized word representations x1:n and encode
sentence to enhance the extraction at future time the sentence with another BiLSTM:
steps. This word-by-word extraction process can
be regarded as transforming a text sequence into a xi = BiLSTMin ([wi ;pi ; ci ]) ∈ Rdin .
 (1)
sequence of extracting actions, which leads us to a hEi = BiLSTM
 Enc
 (xi ) ∈ R2dE .
 5696
3.1.2 Binary Decoder in our model (i, j = 1, . . . , n). σ and ρ are the
First, to capture the interactions between entities sigmoid and tanh functions, respectively. r ∈ RdR
and relations, we recognize the entities with a span- is the type embedding of the relation r. Wptr and
based entity tagger (Yu et al., 2019; Wei et al., bptr are parameters of the binary pointer.
2020) and transform the tags into vectors as part 3.1.3 External Memory
of the decoder’s input (Figure 2). Specifically, we
 We introduce an external memory M to keep the
assign each token a start and end tag to indicate
 previously extracted triples of the sentence. We
whether the current token corresponds to a start or
 first initialize M as an empty set. After the de-
end position of an entity of a certain type:
 coder’s extraction process at the i-th time step, we
 hTi = BiLSTMT ag (xi ) ∈ RdT represent the extracted triple t = (wst , rt , wi ) as:
 s/e
 p(yi ) = softmax(Ws/e hTi + bs/e ) (2) hM E E dM
 t = [hst ; rt ; hi ] ∈ R . (5)
 s/e s/e
 tagi = arg max p(yi = k)
 k Then we update the memory with the representa-
where (Ws , bs ) and (We , be )are parameters of tions of the output triples t1 , . . . , tNi :
the start and end tag classifiers, respectively. Then N ×dM
we obtain the entity tag embedding ei ∈ Rde by M ← [M; hM M
 t1 ; . . . ; htN ] ∈ R (6)
 i

averaging the look-up embeddings of the start and
 where Ni is the number of the currently extracted
end tags. We also capture a global contextual em-
 triples and N = ik=1 Nk . Note that we set and
 P
bedding g by max pooling over hE 1:n . Then we update the external memory for each sentence in-
adopt a LSTM as the decoder (hD 0 = hE n ): dependently, and the memory stores only the triple
 hD
 i = LSTM
 Dec
 ([xi ; ei ; g], hD
 i−1 ) ∈ R
 2dE
 . (3) representations of one single sentence. Thus triples
 of other sentences will not be introduced into the
 Next, we introduce how to extract relational
 sentence currently being extracted. Finally, the
triples at the i-th time step. We consider the current
 triples in the memory are utilized to obtain the rea-
word as the object entity, select words as subjects
 soning pattern-enhanced representations for future
that form triples with the object from all the words
 time steps, as described in Section 3.2.
of the sentence, and predict the relation types be-
tween the subjects and the object. For example, in 3.2 Relation Network for capturing patterns
Figure 2, when the current object is “Huntsville”, of relational reasoning
the model selects “Digium” as the subject and clas- Relation types of implicit relational triples are dif-
sifies the relation as “Locate in”. Thus (“Digium”, ficult to infer due to the lack of explicit evidence,
“Locate in”, “Huntsville”) is extracted as a rela- thus need to be derived with real-world relational
tional triple. Multi-token entities are represented reasoning patterns. For example, in the sentence
with their last words and recovered by finding the “George is Judy’s father and David’s grandfather”,
nearest start tags of the same type from their last the relation type between “Judy” and “David” can
positions. However, the original softmax pointer be inferred as “father” using the pattern “father’s
in (Vinyals et al., 2015) only allows an object to father is called grandfather”.
point to one subject, thus fails to extract multiple Based on this fact, we propose to enhance our
triples with overlapping objects. To address this model by introducing real-world relational reason-
issue, we propose a binary pointer, which indepen- ing patterns. We capture the patterns with a Rela-
dently computes a binary score for each subject tion Network (RN) (Santoro et al., 2017), a neu-
to form a relational triple with the current object ral network module specially designed for rela-
under each relation type. Our method naturally tional reasoning. A RN is essentially a composite
solves the overlapping triple problem by producing function over arelational triple set T : RN(T ) =
multiple positive scores at one step (Figure 2). We fφ {gθ (t)}t∈T , where fφ is an aggregation func-
formulate the score of the triple (wj , r, wi ) as: tion and gθ projects a triple into a fixed-size em-
 (r) > ptr E D
 sji = σ(r ρ(W [hj ; hi ] + b )), ptr
 (4) bedding. We set the memory M as the input re-
 lational triple set T and utilize the RN to learn a
and extract this candidate triple as a relational triple pattern-enhanced representation hPji for the word
 (r)
if sji is higher than some threshold, such as 0.5 pair (wj , wi ) at the i-th time step. First, the gθ reads
 5697
the triple representations from M and projects them 4 Experiments
with a fully-connected layer:
 4.1 Datasets and Evaluation Metrics
 gθ (t) = ρ(Wθ hM θ dP
 t +b )∈R . (7) We evaluate our method on two benchmark datasets.
Then fφ selects useful triples with a gating net- NYT (Riedel et al., 2010) consists of sentences
work1 : uφt = σ gθ (t)Uφ [hE D from the New York Times corpus and contains 24
 
 j ; hi ] ∈ R, and ag-
gregates the selected triples with the word pair to relation types. WebNLG (Gardent et al., 2017)
compute hPji using another fully-connected layer: was created for natural language generation task. It
 contains 171 relation types2 and was adopted for re-
 hPji = fφ ({gθ (t)}t∈M ) lational triple extraction by (Zeng et al., 2018). We
    
 φ E D
 X φ
 φ (8) split the sentences into three categories: Normal,
 = ρ W hj ; hi ; ut gθ (t) + b .
 SingleEntityOverlap (SPO) and EntityPairOverlap
 t
 (EPO) following Zeng et al. (2018). The statistics
 (r)
Finally, we modify Equation 4 as sji = σ(r> hPji ) of the two datasets are shown in Table 1. Follow-
to compute the binary scores of candidate triples. ing previous work (Zeng et al., 2018; Wei et al.,
We denote our Reasoning pattern enhanced BPtr- 2020; Wang et al., 2020), an extracted relational
Net model as R-BPtrNet. Note that we use quite triple is regarded as correct only if the relation and
simple formulas for fφ and gθ because our contri- the heads of both subject and object are all correct.
bution focuses on the effectiveness of introducing We report the standard micro precision, recall, and
relational reasoning patterns for this task rather F1 -scores on both datasets.
than the model structure. Exploration for more
complex structures will be left for future work. NYT WebNLG
 Dataset
3.3 Training and Inference Train Test Train Test
We calculate the triple loss of a sentence as a binary Normal 37013 3266 1596 246
cross entropy over valid candidate triples Tv , whose SEO 9782 1297 227 457
subject and object are different entities (or the end EPO 14735 978 3406 26
words of different entities): ALL 56195 5000 5019 703
 X 
 1
Lt = − yt log st +(1−yt ) log(1−st ) Table 1: Statistics of evaluation datasets.
 |Tv |
 t∈Tv
 (9)
where st is the score of the candidate triple t, yt = 1 4.2 Experimental Settings
for gold triples and 0 for others. We also train the We determine the hyper-parameters on the valida-
entity tagger with a cross-entrory loss: tion sets. We use the pre-trained GloVe (Penning-
 n
 1X X ton et al., 2014) embeddings as w. We adopt a
 Le = − log p(yi∗ = ŷi∗ ) (10) one-layer CNN with dc = 30 channels to learn
 n
 i=1 ∗∈{s,e}
 c from 30-dimensional randomly-initialized char-
 s/e
where ŷi are the gold start and end tags of the i-th acter embeddings. We choose the state-of-the-art
word, respectively. Finally, we train the R-BPtrNet RoBERTaLARGE (Liu et al., 2019) model3 as the
with the joint loss L = Lt + Le . pre-tained LM. For a fair comparison with previous
 To prevent error propagation, we use the gold methods, we also conduct experiments and report
entity tags to filter out valid candidate triples and the scores with BERTBASE (Devlin et al., 2019).
compute the tag embeddings e1:n during training. We set din (Equation 1) as 300. The hidden dimen-
We also update the memory M with the gold rela- sions of the encoder dE and the entity tagger dT are
tional triples. During inference, we extract triples both 200. The dimensions of entity tag embeddings
from scratch and use the predicted entity tags and de and relation type embeddings dR are set as 50
relational triples instead of the gold ones. and 200, respectively. The projection dimension
 1
 dP of the RN is set as 500.
 We don’t use the more common attention mecha-
 2
nism (Bahdanau et al., 2015) to select triples because the As mentioned in (Wang et al., 2020), this number is mis-
attention weights are restricted to sum to 1. If all triples in the written as 246 in (Wei et al., 2020) and (Yu et al., 2019). Here
memory are useless, they will still be assigned a large weight we quote the correct number from (Wang et al., 2020).
 3
due to the restriction, which will confuse the model. https://github.com/huggingface/transformers
 5698
NYT WebNLG
 Method
 Prec. Rec. F1 Prec. Rec. F1
 NovelTagging (Zheng et al., 2017) 62.4 31.7 42.0 52.5 19.3 28.3
 CopyRE (Zeng et al., 2018) 72.8 69.4 71.1 60.9 61.1 61.0
 CopyRERL (Zeng et al., 2019) 77.9 67.2 72.1 63.3 59.9 61.6
 GraphRel (Fu et al., 2019) 63.9 60.0 61.9 44.7 41.1 42.9
 ETL-Span (Yu et al., 2019) 84.9 72.3 78.1 84.0 91.5 87.6
 CopyMTL (Zeng et al., 2020) 75.7 68.7 72.0 58.0 54.9 56.4
 WDec (Nayak and Ng, 2020) 94.5 76.2 84.4 - - -
 CGTUniLM (Ye et al., 2020) 94.7 84.2 89.1 92.9 75.6 83.4
 C AS R ELLSTM (Wei et al., 2020) 84.2 83.0 83.6 86.9 80.6 83.7
 C AS R ELBERT (Wei et al., 2020) 89.7 89.5 89.6 93.4 90.1 91.7
 TPLinkerLSTM (Wang et al., 2020) 83.8 83.4 83.6 90.8 90.3 90.5
 TPLinkerBERT (Wang et al., 2020) 91.3 92.5 91.9 91.8 92.0 91.9
 R-BPtrNet 90.9 91.3 91.1 90.7 94.6 92.6
 R-BPtrNetBERT 92.7 92.5 92.6 93.7 92.8 93.3
 R-BPtrNetRoBERTa 94.0 92.9 93.5 94.3 93.3 93.8

Table 2: Performance of our method and previous state-of-the-art models on the NYT and WebNLG test sets. The
best scores are in bold and the second-best scores are underlined. R-BPtrNet is our model without pre-trained LMs.
R-BPtrNetBERT and R-BPtrNetRoBERTa are models using BERTBASE and RoBERTaLARGE , respectively.

 We add 10% dropout (Srivastava et al., 2014) on Reinforcement Learning (RL).
the input of all LSTMs for regularization. Follow- • GraphRel (Fu et al., 2019) proposed a graph
ing previous work (Zeng et al., 2018; Wei et al., convolutional network for this task.
2020; Wang et al., 2020), we set the max length • ETL-Span (Yu et al., 2019) proposed a
of input sentences to 100. We use the Adam op- decomposition-based tagging scheme.
timizer (Kingma and Ba, 2014) to fine-tune the
 • CopyMTL (Zeng et al., 2020) proposed a
LM and train other parameters with the learning
 Multi-Task Learning (MTL) framework based
rates of 10−5 and 10−3 , respectively. We train our
 on CopyRE to address multi-token entities.
model for 30/90 epochs with the batch size as 32/8
on NYT/WebNLG. At the beginning of the last 10 • WDec (Nayak and Ng, 2020) proposed an
epochs, we load the parameters with the best val- encoder-decoder architecture for this task.
idation performance and divide the learning rates • CGTUniLM (Ye et al., 2020) proposed a gen-
by ten. Finally, we choose the best model on the erative transformer module with a triple con-
validation set and output results on the test set. trastive training object.
 • C AS R EL (Wei et al., 2020) proposed a cas-
4.3 Performance Evaluation cade binary tagging framework.
We present our results on the NYT and WebNLG • TPLinker (Wang et al., 2020) proposed a one-
test sets in Table 2 and compare them with several stage token pair linking model with a novel
previous state-of-the-art models: handshaking tagging scheme.
 • NovelTagging (Zheng et al., 2017) trans- From Table 2 we have the following observa-
 formed this task into a sequence tagging prob- tions: (1) The R-BPtrNet significantly outperforms
 lem but neglected the overlapping triples. all previous non-LM methods. It demonstrates
 • CopyRE (Zeng et al., 2018) proposed a the superiority of our seq2seq-based framework
 seq2seq model based on the copy mechanism to jointly extract explicit and implicit relational
 to generate triple element as sequences. triples and improve the performance for this task.
 • CopyRERL (Zeng et al., 2019) proposed to Additionally, the R-BPtrNet produces competitive
 learn the extraction order of CopyRE with performance to the BERT-based baseline models
 5699
NYT WebNLG
Method
 Nor. SEO EPO N=1 N=2 N=3 N=4 N≥5 Nor. SEO EPO N=1 N=2 N=3 N=4 N≥5
CopyRE 66.0 48.6 55.0 67.1 58.6 52.0 53.6 30.0 59.2 33.0 36.6 59.2 42.5 31.7 24.2 30.0
CopyRERL 71.2 69.4 72.8 71.7 72.6 72.5 77.9 45.9 65.4 60.1 67.4 63.4 62.2 64.4 57.2 55.7
GraphRel 69.6 51.2 58.2 71.0 61.5 57.4 55.1 41.1 65.8 38.3 40.6 66.0 48.3 37.0 32.1 32.1
ETL-Span† 88.5 87.6 60.3 85.5 82.1 74.7 75.6 76.9 87.3 91.5 80.5 82.1 86.5 91.4 89.5 91.1
C AS R ELBERT 87.3 91.4 92.0 88.2 90.3 91.9 94.2 83.7 89.4 92.2 94.7 89.3 90.8 94.2 92.4 90.9
TPLinkerBERT 90.1 93.4 94.0 90.0 92.9 93.1 96.1 90.0 87.9 92.5 95.3 88.0 90.1 94.6 93.3 91.6
R-BPtrNetBERT 90.4 94.4 95.2 89.5 93.1 93.5 96.7 91.3 89.5 93.9 96.1 88.5 91.4 96.2 94.9 94.2
R-BPtrNetRoBERTa 91.2 95.3 96.1 90.5 93.6 94.2 97.7 92.1 89.9 94.4 97.4 89.3 91.7 96.5 95.8 94.8

Table 3: F1 scores on sentences with different overlapping patterns and different triple numbers. The best scores
are in bold and the second-best scores are underlined. † marks scores reproduced by (Wang et al., 2020).

without using BERT. It shows that the improve- patterns and the number of triples. We conduct
ments of our model come not primarily from the further experiments on these subsets and report
pre-trained LM representations, but from the intro- the results in Table 3, from which we can observe
duction of relational reasoning patterns to this task. that: (1) The R-BPtrNetRoBERTa and R-BPtrNetBERT
(2) R-BPtrNetBERT outperforms BERT-based base- both significantly outperform previous models on
line models. It indicates that our method can effec- the SPO and EPO subsets of NYT and WebNLG
tively extract implicit relational triples with the as- datasets. It proves the validity of our method to
sistance of the triple-retaining external memory and address the overlapping triple problem. Moreover,
the pattern-capturing RN. (3) R-BPtrNetRoBERTa we find that implicit relational triples usually over-
further outperforms R-BPtrNetBERT and other base- lap with others. Therefore, the improvements on
line methods. It indicates that the more powerful the overlapping subsets also validate the effective-
LM brings more prior knowledge and real-world ness of our method for extracting implicit relational
relational facts, enhancing the model’s ability to triples. (2) R-BPtrNetRoBERTa and R-BPtrNetBERT
learn real-world relational reasoning patterns. both bring improvements to sentences with multi-
 ple triples compared to baseline models. It indi-
4.4 Performance on Different Sentence Types cates that our method can effectively extract multi-
To demonstrate the ability of our model in han- ple relational triples from a sentence. Furthermore,
dling the multiple triples and overlapping triples we observe more significant improvements when
of a sentence, we split the test sets of NYT and the number of triples grows. We hypothesize that
WebNLG datasets according to the overlapping this is because implicit relational triples are more
 likely to occur in sentences with more triples. Our
 model extracts the implicit relational triples more
 correctly and improves the performance.

 4.5 Ablation Study on Implicit Triples
 We run an ablation study to investigate the con-
 tribution of each component in our model to the
 implicit relational triples. We manually select 134
 sentences with rich implicit triples from the NYT
 test set4 . We conduct experiments on the subset
 4
 We first select sentences that contain at least two over-
 lapping relational triples. For example, if a sentence contains
 entity A, B and C, and if A→B and B→C exists or A→B and
 A→C exists, this sentence is selected. Note that A→B and
Figure 3: An ablation study on a manually selected sub- B→A are counted as one triple during this selecting procedure.
 Then, we manually check all selected sentences and keep the
set with rich implicit relational triples. ones with implicit relational triples.
 5700
We ' re providing a new outlet ... Edmund , was an influential
 ... organizing an expedition for them for distribution , '' said municipal judge in Crowley,
 starting in November in
 Sentence Chad Hurley , chief executive who was ... as well as a close
 Jinghong , a small city in the
 Yunnan province in China. and ... of YouTube , a division adviser to former Louisiana
 of Google. Gov. Edwin Edwards ...

 Yunnan YouTube Edmund Edwin
 contains Edwards
 Ground contains Chad
 Jinghong lives in lives in lives in
 Truth Hurley company
 contains
 China Google Crowley Louisiana

 Yunnan YouTube Edmund Edwin
 company Edwards
 contains
 TPLinker contains Chad lives in lives in
 Jinghong Hurley
 China Google Crowley Louisiana

 Yunnan YouTube Edmund advisor
 Edwin
 company Edwards
 contains
 Ours contains Chad division lives in lives in
 Jinghong lives in
 contains Hurley company
 China Google Crowley Louisiana

Figure 4: Examples of sentences with implicit relational triples, and the predictions from the TPLinkerBERT and
R-BPtrNetBERT model. Bold and colored texts are entities. We distinguish different entities with different colors.
Explicit and implicit relational triples are represented by black and green solid arrows, respectively. Blue dashed
arrows indicate explicit relational connections between entities, but they do not appear as relational triples because
their relation types don’t belong to the pre-defined relations of the dataset.

using the following ablation options: extracted triples. (3) R-BPtrNet brings significant
 • R-BPtrNetRoBERTa and R-BPtrNetBERT are improvements over the BPtrNet to the relation and
 the full models using RoBERTaLARGE and the triple F1 scores. It indicates that the RN effec-
 BERTBASE as LMs, respectively. tively captures the relational reasoning patterns and
 • R-BPtrNet removes the pre-trained LM enhances the relation type inference of implicit re-
 representations from the full model. lations. (4) The pre-trained LMs only bring minor
 improvements. It proves that the effectiveness of
 • BPtrNet removes the RN from the R-
 our model comes primarily from the external mem-
 BPtrNet. Under this setting, we feed a gated
 ory and the introduction of relational reasoning
 summation of the memory into the decoder’s
 patterns rather than the pre-trained LMs.
 input of the next time step.
 • BPtrNetNoMem removes the external mem- 4.6 Case Study
 ory from the BPtrNet, which means that the Figure 4 shows the comparison of the best previ-
 previously extracted triples are not retained. ous model TPLinkerBERT and our R-BPtrNetBERT
We compare the performance of these options with model on three example sentences from the im-
the previous BERT-based models. We also analyze plicit subset in Section 4.5. The first example
the performance on predicting only the entity pairs contains the transitive pattern of the relation “con-
and the relations, respectively. We illustrate the tains”. The second example contains a multi-hop
results in Figure 3, from which we can observe relation path pattern between “Chad Hurley” and
that: (1) BPtrNetNoMem produces comparable re- “Google” through the intermediate entity “Youtube”.
sults to the baseline models. We speculate that it The third example contains a composite pattern be-
benefits from the seq2seq structure and the previ- tween the siblings “Crowley” and “Edwin Edwards”
ous triples are embedded into the decoder’s hidden with a common ancestor “Edmund”. We can ob-
states. (2) BPtrNet brings huge improvements over serve that the TPLinkerBERT model fails to extract
the BPtrNetNoMem to the entity pair and the triple the implicit relational triples. The R-BPtrNetBERT
F1 scores. It indicates that the external memory successfully captures various reasoning patterns
effectively helps discover entity pairs that have im- in the real world and effectively extracts all the
plicit relational connections by retaining previously implicit relational triples in the examples.
 5701
5 Conclusion Claire Gardent, Anastasia Shimorina, Shashi Narayan,
 and Laura Perez-Beltrachini. 2017. Creating train-
In this paper, we propose a unified framework to ing corpora for nlg micro-planning. In ACL.
extract explicit and implicit relational triples jointly.
To discover entity pairs that may have implicit re- Felix A Gers, Jürgen Schmidhuber, and Fred Cummins.
 2000. Learning to forget: Continual prediction with
lational connections, we propose a binary pointer lstm. Neural computation, 12(10):2451–2471.
network to extract relational triples relevant to each
word sequentially and introduce an external mem- Matthew R Gormley, Mo Yu, and Mark Dredze. 2015.
ory to retain the extracted triples. To derive the Improved relation extraction with feature-rich com-
 positional embedding models. In EMNLP, pages
relation types of the implicitly connected entity
 1774–1784.
pairs, we propose to introduce real-world relational
reasoning patterns to this task and capture the rea- Pankaj Gupta, Hinrich Schütze, and Bernt Andrassy.
soning patterns with a relation network. We con- 2016. Table filling multi-task recurrent neural net-
duct experiments on two benchmark datasets, and work for joint entity and relation extraction. In COL-
 ING, pages 2537–2547.
the results prove the effectiveness of our method.
 Ningning Jia, Xiang Cheng, and Sen Su. 2020. Im-
Acknowledgement proving knowledge graph embedding using locally
 and globally attentive relation paths. In European
We would like to thank the anonymous reviewers Conference on Information Retrieval, pages 17–32.
for their constructive comments on this paper. This Springer.
work was supported by the National Natural Sci-
ence Foundation of China under Grant numbers Kuang Jun, Cao Yixin, Zheng Jianbing, He Xiangnan,
 Gao Ming, and Zhou Aoying. 2020. Improving neu-
U1936208, U1936216, and 61862002. ral relation extraction with implicit mutual relations.
 In ICDE, pages 1021–1032.

References Diederik P Kingma and Jimmy Ba. 2014. Adam: A
 method for stochastic optimization. arXiv preprint
Gabor Angeli and Christopher D Manning. 2013. arXiv:1412.6980.
 Philosophers are mortal: Inferring the truth of un-
 seen facts. In CoNLL, pages 133–142. Yann LeCun, Bernhard Boser, John S Denker, Don-
 nie Henderson, Richard E Howard, Wayne Hubbard,
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben- and Lawrence D Jackel. 1989. Backpropagation ap-
 gio. 2015. Neural machine translation by jointly plied to handwritten zip code recognition. Neural
 learning to align and translate. In ICLR. computation, 1(4):541–551.
Yee Seng Chan and Dan Roth. 2011. Exploiting
 Qi Li and Heng Ji. 2014. Incremental joint extraction
 syntactico-semantic structures for relation extrac-
 of entity mentions and relations. In ACL, pages 402–
 tion. In ACL-HLT, pages 551–560.
 412.
Dai Dai, Xinyan Xiao, Yajuan Lyu, Shan Dou, Qiao-
 qiao She, and Haifeng Wang. 2019. Joint extraction Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man-
 of entities and overlapping relations using position- dar Joshi, Danqi Chen, Omer Levy, Mike Lewis,
 attentive sequence labeling. In AAAI, volume 33, Luke Zettlemoyer, and Veselin Stoyanov. 2019.
 pages 6300–6308. Roberta: A robustly optimized bert pretraining ap-
 proach. arXiv preprint arXiv:1907.11692.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
 Kristina Toutanova. 2019. Bert: Pre-training of deep Makoto Miwa and Mohit Bansal. 2016. End-to-end re-
 bidirectional transformers for language understand- lation extraction using lstms on sequences and tree
 ing. In NAACL-HLT. structures. In ACL, pages 1105–1116.

Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Tapas Nayak and Hwee Tou Ng. 2020. Effective mod-
 Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, eling of encoder-decoder architecture for joint entity
 Shaohua Sun, and Wei Zhang. 2014. Knowledge and relation extraction. In AAAI, volume 34, pages
 vault: A web-scale approach to probabilistic knowl- 8528–8535.
 edge fusion. In KDD, pages 601–610.
 Jeffrey Pennington, Richard Socher, and Christopher D
Tsu-Jui Fu, Peng-Hsuan Li, and Wei-Yun Ma. 2019. Manning. 2014. Glove: Global vectors for word rep-
 Graphrel: Modeling text as relational graphs for resentation. In Proceedings of the 2014 conference
 joint entity and relation extraction. In ACL, pages on empirical methods in natural language process-
 1409–1418. ing (EMNLP), pages 1532–1543.
 5702
Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare R 2020. Contrastive triple extraction with generative
 Voss, Heng Ji, Tarek F Abdelzaher, and Jiawei Han. transformer. arXiv preprint arXiv:2009.06207.
 2017. Cotype: Joint extraction of typed entities and
 relations with knowledge bases. In WWW, pages Bowen Yu, Zhenyu Zhang, and Jianlin Su. 2019.
 1015–1024. Joint extraction of entities and relations based on
 a novel decomposition strategy. arXiv preprint
Sebastian Riedel, Limin Yao, and Andrew McCallum. arXiv:1909.04273.
 2010. Modeling relations and their mentions with-
 out labeled text. In Joint European Conference Xiaofeng Yu and Wai Lam. 2010. Jointly identifying
 on Machine Learning and Knowledge Discovery in entities and extracting relations in encyclopedia text
 Databases, pages 148–163. Springer. via a graphical model approach. In COLING 2010:
 Posters, pages 1399–1407.
Adam Santoro, David Raposo, David G Barrett, Ma-
 teusz Malinowski, Razvan Pascanu, Peter Battaglia, Dmitry Zelenko, Chinatsu Aone, and Anthony
 and Timothy Lillicrap. 2017. A simple neural net- Richardella. 2003. Kernel methods for relation ex-
 work module for relational reasoning. In Advances traction. Journal of machine learning research,
 in neural information processing systems, pages 3(Feb):1083–1106.
 4967–4976. Daojian Zeng, Haoran Zhang, and Qianying Liu. 2020.
 Copymtl: Copy mechanism for joint extraction of
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, entities and relations with multi-task learning. In
 Ilya Sutskever, and Ruslan Salakhutdinov. 2014. AAAI, pages 9507–9514.
 Dropout: a simple way to prevent neural networks
 from overfitting. The journal of machine learning Xiangrong Zeng, Shizhu He, Daojian Zeng, Kang Liu,
 research, 15(1):1929–1958. Shengping Liu, and Jun Zhao. 2019. Learning
 the extraction order of multiple relational facts in a
Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly.
 sentence with reinforcement learning. In EMNLP-
 2015. Pointer networks. In Advances in neural in-
 IJCNLP, pages 367–377.
 formation processing systems, pages 2692–2700.
Shaolei Wang, Yue Zhang, Wanxiang Che, and Ting Xiangrong Zeng, Daojian Zeng, Shizhu He, Kang Liu,
 Liu. 2018. Joint extraction of entities and relations and Jun Zhao. 2018. Extracting relational facts by
 based on a novel graph scheme. In IJCAI, pages an end-to-end neural model with copy mechanism.
 4461–4467. In ACL, pages 506–514.

Yucheng Wang, Bowen Yu, Yueyang Zhang, Tingwen Suncong Zheng, Feng Wang, Hongyun Bao, Yuexing
 Liu, Hongsong Zhu, and Limin Sun. 2020. Tplinker: Hao, Peng Zhou, and Bo Xu. 2017. Joint extrac-
 Single-stage joint extraction of entities and rela- tion of entities and relations based on a novel tagging
 tions through token pair linking. arXiv preprint scheme. In ACL, pages 1227–1236.
 arXiv:2010.13415.
 GuoDong Zhou, Jian Su, Jie Zhang, and Min Zhang.
Zhepei Wei, Jianlin Su, Yue Wang, Yuan Tian, and 2005. Exploring various knowledge in relation ex-
 Yi Chang. 2020. A novel cascade binary tagging traction. In ACL, pages 427–434.
 framework for relational triple extraction. In ACL,
 pages 1476–1488. Hao Zhu, Yankai Lin, Zhiyuan Liu, Jie Fu, Tat-Seng
 Chua, and Maosong Sun. 2019. Graph neural net-
Hongbin Ye, Ningyu Zhang, Shumin Deng, Mosha works with generated parameters for relation extrac-
 Chen, Chuanqi Tan, Fei Huang, and Huajun Chen. tion. In ACL, pages 1331–1339.

 5703
You can also read