Interpreting Verbal Metaphors by Paraphrasing

Page created by Tracy Haynes
 
CONTINUE READING
Interpreting Verbal Metaphors by Paraphrasing
                                                                   Rui Mao1 , Chenghua Lin2 , and Frank Guerin3
                                                  1
                                                      Department of Computing Science, University of Aberdeen, AB24 3UE, UK
                                                                             1
                                                                               r03rm16@abdn.ac.uk
                                                       2
                                                         Department of Computer Science, University of Sheffield, S1 4DP, UK
                                                                          2
                                                                            c.lin@sheffield.ac.uk
                                                       3
                                                         Department of Computer Science, University of Surrey, GU2 7XH, UK
                                                                          3
                                                                            f.guerin@surrey.ac.uk

                                                             Abstract                          similar word to the real meaning of a metaphor,
                                                                                               which has the highest probability of appearing in
                                            Metaphorical expressions are difficult linguis-
                                            tic phenomena, challenging diverse Natural         a given context as the literal paraphrase of the
                                                                                               metaphor (see § 3 for the reason). The probability
arXiv:2104.03391v1 [cs.CL] 7 Apr 2021

                                            Language Processing tasks. Previous works
                                            showed that paraphrasing a metaphor as its lit-    is given by BERT based LM prediction and the
                                            eral counterpart can help machines better pro-     semantically similar word is constrained by Word-
                                            cess metaphors on downstream tasks. In this        Net hypernyms and synonyms. Therefore, our ap-
                                            paper, we interpret metaphors with BERT and        proach is fully unsupervised which does not require
                                            WordNet hypernyms and synonyms in an un-           labelling a large dataset for model training.
                                            supervised manner, showing that our method
                                            significantly outperforms the state-of-the-art
                                                                                                  By running an automatic evaluation on a pub-
                                            baseline. We also demonstrate that our method      licly available dataset (MOH, Mohammad et al.,
                                            can help a machine translation system improve      2016), our model achieves 8% accuracy gains on
                                            its accuracy in translating English metaphors      verbal metaphor paraphrasing, compared with the
                                            to 8 target languages.                             state-of-the-art baseline (Mao et al., 2018). On
                                                                                               the human evaluated paraphrasing task, the aver-
                                        1   Introduction
                                                                                               age gain of our model is 13% on two benchmark
                                        Metaphor is defined as using one or several words      datasets, i.e., MOH and VUA (Steen et al., 2010).
                                        to illustrate a meaning different to the basic mean-   We also examine the metaphor paraphrases on MT
                                        ing of the words (Steen et al., 2010). Due to          tasks based on Google translator1 and 8 diverse
                                        the difficulty of inferring underlying meanings of     target languages. Evaluation results show that our
                                        metaphors, metaphoric expressions challenge di-        method can improve the accuracy of translating
                                        verse Natural Language Processing (NLP) tasks,         English metaphors by an average gain of 20.9%,
                                        e.g., sentiment analysis (Ghosh et al., 2015) and      outperforming the baseline by 12.9%.
                                        machine translation (MT) (Mao et al., 2018). Cur-         The contribution of this work can be summarised
                                        rently, BERT has achieved large improvements           as follows: (1) we introduce a simple yet effec-
                                        on diverse downstream tasks (Devlin et al., 2019).     tive method by using BERT as a LM model for
                                        However, it has not been examined on metaphor          metaphor interpretation, yielding significant im-
                                        interpretation. In this paper, we interpret verbal     provements against the state-of-the-art baseline; (2)
                                        metaphors with BERT and WordNet (Fellbaum,             with our model, a MT system can significantly im-
                                        1998). We focus on verbal metaphors, because           prove its accuracy of translating English metaphors
                                        verbs are the most common metaphoric expressions       into 8 diverse target languages.
                                        among all PoS categories (Steen et al., 2010), form-
                                        ing the most popular tasks in automatic metaphor       2       Related work
                                        identification and interpretation (Shutova, 2015;
                                                                                               Due to the lack of large annotated metaphor in-
                                        Leong et al., 2018).
                                                                                               terpretation corpora, previous works addressed
                                           We consider metaphor interpretation as a para-
                                                                                               metaphor interpretation mainly by unsupervised
                                        phrasing task, predicting literal counterparts for
                                                                                               modelling of co-occurrences of words and their
                                        metaphors. Unlike fine-tuning or feature based
                                                                                               contexts with different knowledge bases and lexical
                                        BERT applications, we use BERT as a language
                                                                                                   1
                                        modelling (LM) method to predict a semantically                https://translate.google.com/
resources (Shutova, 2010; Shutova et al., 2012; Bol-        the start and end of an input sequence with a length
legala and Shutova, 2013). These works focused              of n. For a sentence that has multiple metaphors,
on interpreting verbal metaphors from word-pairs            we mask one metaphoric word each time.
with specific syntactic structures, e.g., verb-subject         Next, we predict an appropriate paraphrase for
and verb-direct object, which is inconvenient for           the metaphor. In order to connect a metaphor (the
real-world applications. Another trend in metaphor          target word) with its possible literal counterparts,
interpretation focused on a specific domain, e.g.,          we introduce the semantic constraints of Word-
question answering about Unix (Martin, 1990),               Net hypernyms and synonyms for candidate literal
event descriptions in economics (Narayanan, 1997)           counterpart development. It is likely that one of
and mental states descriptions (Barnden and Lee,            the hypernyms and synonyms has a similar mean-
2002) with hand-coded knowledge and logic rules.            ing to the underlying meaning of a metaphor, thus,
Mao et al. (2018) extended the word-pair models,            they help the BERT model filter out irrelevant pre-
interpreting metaphors from full sentences in open          dictions. The hypernyms {hi,j } and synonyms
domains. They modelled the co-occurrences of                {si,k } of a metaphoric verb i and their different
words and their contexts with word2vec CBOW                 verb forms (f , inflections) are considered as our
(Mikolov et al., 2013) input and output vectors             candidate word set (Ci = {hi,j , hfi,j , si,k , sfi,k }).
to generate appropriate paraphrases of metaphors.              The best fit word (bi ) in Ci is given by a candidate
Compared with context-independent word2vec em-              word with the highest probability in its context
beddings, BERT as a context-dependent word em-
bedding method, has shown remarkable perfor-                    bi = argmax p(mi |w1 , ..., wi−1 , wi+1 , ..., wn ).
                                                                       mi ∈Ci
mance on diverse NLP tasks (Devlin et al., 2019).
However, to the best of our knowledge, BERT has             bi is considered as literal, because according to
not been applied in metaphor interpretation.                relevant statistics from Cameron (2003); Martin
3   Methodology                                             (2006); Steen et al. (2010) and Shutova (2016), lit-
                                                            erals are more common in typical corpora. Thus, a
Inspired by Mao et al. (2018), we introduce a hy-           literal has higher probability appearing in a context
pernym and synonym constrained missing word                 than a metaphor with the similar meaning. No-
prediction method for unsupervised metaphor in-             ticeably, the limitation of paraphrasing a metaphor
terpretation. The difference is our missing word            with a single literal word is that the paraphrase may
prediction is based on pre-trained BERT (BERT-              lose some nuance in the original metaphor. How-
LARGE - CASED 2 ) that has 24 Transformer layers            ever, such a paraphrase method can help a machine
(Vaswani et al., 2017), whereas Mao et al. (2018)           to better translate the real meanings of metaphors
used word2vec input and output vectors.                     into human comprehensible languages (see § 6).
   BERT is a pre-trained Language Modelling
method. The training target is to predict randomly          4     Dataset
selected masked WordPieces (Wu et al., 2016) and
                                                            MOH. MOH was formed by Mohammad et al.
the next sentence of a current processing sentence.
                                                            (2016), sourcing from WordNet example sentences.
Since the masked word prediction is bidirectional,
                                                            We select 315 sentences containing 315 metaphoric
combining its surrounding context information, a
                                                            verbs whose metaphoricity was agreed by at least
pre-trained BERT model on open corpora can be
                                                            70% annotators, forming our automatic evaluation
naturally applied as a missing word prediction
                                                            test set (MOH315 ). The average sentence length
model. In our task, we predict the probability of
                                                            is 8.8. We conduct human evaluation with 50 ran-
a missing word (mi ) with its context (w) by using
                                                            domly selected sentences (MOH50 ) from MOH315 .
pre-trained BERT
                                                            VUA50 . We also randomly select 50 metaphoric
         p(mi |w1 , ..., wi−1 , wi+1 , ..., wn )            verbs and their associated sentences from VU Ams-
= BERT([ CLS ], w1 , ..., [ MASK ]i , ..., wn , [ SEP ]),   terdam Metaphor Corpus (Steen et al., 2010) for hu-
                                                            man evaluation, where the sentences are originated
where the position of mi is replaced with [ MASK ]i ;       from British National Corpus (Burnard, 2000).
[ CLS ] and [ SEP ] are special tokens, representing        VUA50 may contain multiple metaphoric verbs in a
  2
    https://github.com/google-research/                     sentence. Thus, 36 different sentences from all four
bert                                                        different genres e.g., academic text, conversation,
news and fiction are selected. The average sentence      vited 3 native English speakers with master degrees.
length of VUA50 is 27.7.                                 A participant would be asked if a lemmatized para-
                                                         phrase of a target word is acceptable in the given
5   Baselines                                            context with a questionnaire. A paraphrase is ac-
SIM - CBOW I+O     (Mao et al., 2018) modelled           ceptable, if it captures the original meaning of a
word co-occurrences with CBOW based word2vec             target word in the given context, where the gram-
(Mikolov et al., 2013) input and output vectors.         matical tense of the lemmatized paraphrase is not
Their paraphrases were also constrained by Word-         a part of consideration. An example of the ques-
Net hypernyms and synonyms.                              tionnaire can be viewed in Figure 1a. The final
BERT- LARGE - CASED - WWM (Cui et al., 2019)             decision is agreed by at least 2 annotators.
masked whole words during the BERT pre-training             In the human evaluation (Table 1), BERT- LARGE -
procedure, rather than WordPieces. We test this          CASED surpasses SIM - CBOW I+O by 14%, 12%
method as an alternative of BERT- LARGE - CASED          and 13% on MOH50 , VUA50 and their com-
for the best fit word prediction.                        bination datasets (Cohen’s κ = 0.61), respec-
                                                         tively, yielding an average gain of 13%. The
6   Result                                               average gain of BERT- LARGE - CASED over BERT-
                                                         LARGE - CASED - WWM is 4%. The accuracy of the
The evaluation is conducted as three phases. First,
                                                         three models on MOH50 with human evaluation
we automatically evaluate model performance on
                                                         is higher than their automatic evaluation, because
MOH315 . Since MOH315 originated from WordNet
                                                         candidate words from other WordNet sense classes
example sentences, a paraphrasing prediction is
                                                         are possibly acceptable from practical aspects. E.g.,
correct, if the paraphrase belongs to the WordNet
                                                         “read” is an acceptable paraphrase of “scan” in “She
sense class which the test sentence was in.
                                                         scanned the newspaper headlines while waiting for
   As seen under the first column in Table 1, the ac-
                                                         the taxi” for human, although it is incorrect for auto-
curacy of BERT- LARGE - CASED on the automatic
                                                         matic evaluation. BERT- LARGE - CASED performs
evaluation are 49%, outperforming SIM - CBOWI+O
                                                         similarly on MOH50 and VUA50 , although the aver-
by 8%. It shows that using BERT can better
                                                         age length of VUA sentences (27.7) is longer than
paraphrase a target word in a sentence, based
                                                         that of MOH (8.8).
on constrained missing word prediction. This is
likely because Transformer based BERT can bet-              In the third phase, for evaluating the quality of
ter model the long-term dependency and word or-          paraphrases on a downstream task, we examine
ders in sentences, while a shallow neural network        the MOH50 paraphrases on English MT with 8 tar-
based CBOW only represents word co-occurrences           get languages, namely German, Russian, Greek,
within a window of contexts. In CBOW, the posi-          Italian, Dutch, Chinese, Thai and Japanese. We
tion of a target word was not specified in a sentence,   invited 3 participants per target language, to anno-
while BERT used positional embedding for each            tate the quality of a translation of each target word
word during its pre-training procedure. Compared         in an original sentence, a SIM - CBOWI+O para-
with BERT- LARGE - CASED - WWM, there is a gain          phrased sentence, and a BERT- LARGE - CASED para-
of 3% in BERT- LARGE - CASED. Masking whole              phrased sentence. BERT- LARGE - CASED - WWM is
words during pre-training does not outperform the        excluded in this phase, because many predictions
original BERT that masked WordPieces, probably           of it are the same as the predictions given by
because the learned stem information from Word-          BERT- LARGE - CASED by our observation (e.g., B
Piece is more useful for predicting a missing word       and C in Figure 1a), whereas BERT- LARGE - CASED
than the whole-word-masking approach. Thus, the          surpasses BERT- LARGE - CASED - WWM on the para-
original BERT can better retrieve a missing word.        phrase evaluations. The translations are given by
   In the second phase, we test model perfor-            Google translator. The participants are native in the
mance on MOH50 and VUA50 with human eval-                target languages, and had learnt English in school,
uation.      As seen in Table 1, the accuracy            and had each spent a minimum of one year living
of automatic evaluation on MOH50 given by                and studying in England as a postgraduate. Partic-
SIM - CBOW I+O , BERT- LARGE - CASED - WWM and           ipants were provided with senses of the original
BERT- LARGE - CASED is similar to their perfor-          metaphors in WordNet, so that they can fully un-
mance on MOH315 . For human evaluation, we in-           derstand the true meanings of the metaphors. The
1. Am I supposed to swallow that story?                     1. Am I supposed to swallow that story?

     For the target word “swallow” (the bold):                    swallow: believe or accept without         Acceptable/
                                                  Yes/No                   questioning or challenge.        Unacceptable
     A. “talk” is an acceptable paraphrase;                       A. Я должен проглотить эту историю?
     B. “believe” is an acceptable paraphrase;                    B. Я должен рассказать эту историю?
     C. “believe” is an acceptable paraphrase.                    C. Я должен верить этой истории?

 (a) A is paraphrased by SIM-CBOWI+O; B is by BERT-LARGE-    (b) A is translated by original sentence; B is by SIM-CBOWI+O
    CASED; C is by BERT-LARGE-CASED-WWM.                          paraphrasing; C is by BERT-LARGE-CASED paraphrasing.

    Figure 1: Example questions for human evaluation. (a) Paraphrasing task. (b) Machine Translation task.

                                         Automatic Evaluation                 Human Evaluation
                 Model
                                         MOH315     MOH50               MOH50 VUA50 MOH50 +VUA50
           SIM - CBOW I+O                  0.41       0.42               0.68   0.70         0.69
    BERT- LARGE - CASED - WWM              0.46       0.46               0.76   0.80         0.78
        BERT- LARGE - CASED                0.49       0.50               0.82   0.82         0.82

           Table 1: Metaphor interpretation evaluation on paraphrasing tasks, measured by accuracy.

                      German Russian             Greek      Italian Dutch       Chinese    Thai        Japanese    Avg
   Cohen’s kappa (κ)  0.59   0.67                0.60        0.63   0.67        0.69       0.71         0.55        -
       Original       0.48   0.32                0.32        0.50   0.50        0.38       0.30         0.28      0.385
    SIM - CBOW I+O    0.38   0.40                0.36        0.56   0.46        0.60       0.52         0.44      0.465
  BERT- LARGE - CASED 0.60   0.42                0.46        0.72   0.66        0.68       0.70         0.50      0.594

       Table 2: Metaphor interpretation evaluation on Machine Translation tasks, measured by accuracy.

quality of a translation is measured by a binary              and “tugged” in “She tugged for years to make a de-
choice, e.g., acceptable or unacceptable. An ac-              cent living.” are badly translated by Google across
ceptable translation is qualified, if a translation           all the target languages. However, the translations
expresses the true meaning of the target word and             based on BERT- LARGE - CASED paraphrased target
it matches its context (cohesion) in their mother             words, e.g. “Am I supposed to believe that story?”
language, where errors in context words and tenses            and “She struggled for years to make a decent liv-
are ignored. The final decision was agreed by at              ing.” are acceptable in each target language. We
least two annotators (κ ranges from 0.55 to 0.71 in           also noticed that there are some translation errors
Table 2). An example can be viewed in Figure 1b.              introduced due to incorrect paraphrasing. How-
   As seen in Table 2, the average accuracy of orig-          ever, the accuracy gain achieved by paraphrasing
inal metaphor translations is 38.5%. By paraphras-            metaphors substantially surpasses the small impact
ing the metaphors with BERT- LARGE - CASED, the               from the error, yielding a significant overall im-
average accuracy is improved to 59.4%, achieving              provement (20.9% in accuracy) in the MT task.
a gain of 20.9%, outperforming SIM - CBOWI+O by
                                                              7    Conclusion
12.9%. The average gain of BERT- LARGE - CASED
on the target Asian languages (30.7%) against the             We propose an unsupervised model for metaphor
original metaphor translations are higher than that           interpretation based on BERT and WordNet, yield-
of European languages (15.0%). This is proba-                 ing large gains against the baseline on metaphor
bly because the larger language and cultural dif-             paraphrasing tasks. Our evaluation demonstrates
ferences mean that English metaphors which are                that paraphrasing English metaphors into their lit-
directly translated are unlikely to be understand-            eral counterparts can help a MT system improve
able by Asians. After paraphrasing, the incom-                the accuracy of translating the metaphors into more
prehensible phrases become comprehensible. E.g.,              comprehensible target languages. In future work,
“swallow” in “Am I supposed to swallow that story?”           we will test our model on other downstream tasks.
References                                               Saif Mohammad, Ekaterina Shutova, and Peter Turney.
                                                           2016. Metaphor as a medium for emotion: An em-
John A Barnden and Mark G Lee. 2002. An artifi-            pirical study. In Proceedings of the Fifth Joint Con-
  cial intelligence approach to metaphor understand-       ference on Lexical and Computational Semantics,
  ing. Theoria et Historia Scientiarum, 6(1):399–412.      pages 23–33.
D Bollegala and E Shutova. 2013. Metaphor interpreta-    Srini Narayanan. 1997. Knowledge-based action rep-
  tion using paraphrases extracted from the web. PLoS       resentations for metaphor and aspect (KARMA).
  ONE, 8(9).                                                Computer Science Division, University of California
                                                            at Berkeley dissertation.
Lou Burnard. 2000. Reference guide for the British
  national corpus (world edition).                       Ekaterina Shutova. 2010. Automatic metaphor inter-
                                                           pretation as a paraphrasing task. In Human Lan-
Lynne Cameron. 2003. Metaphor in Educational Dis-          guage Technologies: The 2010 Annual Conference
  course. A&C Black.                                       of the North American Chapter of the Association
                                                           for Computational Linguistics, pages 1029–1037.
Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin,              Association for Computational Linguistics.
  Ziqing Yang, Shijin Wang, and Guoping Hu. 2019.
  Pre-training with whole word masking for Chinese       Ekaterina Shutova. 2015. Design and evaluation of
  BERT. arXiv preprint arXiv:1906.08101.                   metaphor processing systems. Computational Lin-
                                                           guistics, 41(4):579–623.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
   Kristina Toutanova. 2019. BERT: Pre-training of       Ekaterina Shutova. 2016. Design and evaluation of
   deep bidirectional Transformers for language under-     metaphor processing systems. Computational Lin-
   standing. In Proceedings of the 2019 Conference of      guistics.
   the North American Chapter of the Association for
   Computational Linguistics: Human Language Tech-       Ekaterina Shutova, Tim Van de Cruys, and Anna Ko-
   nologies, pages 4171–4186.                              rhonen. 2012. Unsupervised metaphor paraphras-
                                                           ing using a vector space model. In 24th Inter-
Christiane Fellbaum. 1998. WordNet: An Electronic          national Conference on Computational Linguistics,
  Lexical Database. Bradford Books.                        page 1121.

Aniruddha Ghosh, Guofu Li, Tony Veale, Paolo Rosso,      Gerard J Steen, Aletta G Dorst, J Berenike Herrmann,
  Ekaterina Shutova, John Barnden, and Antonio             Anna Kaal, Tina Krennmayr, and Trijntje Pasma.
  Reyes. 2015. Semeval-2015 task 11: Sentiment             2010. A Method for Linguistic Metaphor Identifi-
  analysis of figurative language in twitter. In Pro-      cation: From MIP to MIPVU, volume 14. John Ben-
  ceedings of the 9th International Workshop on Se-        jamins Publishing.
  mantic Evaluation (SemEval 2015), pages 470–478.
                                                         Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
Chee Wee Ben Leong, Beata Beigman Klebanov, and            Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz
  Ekaterina Shutova. 2018. A report on the 2018 VUA        Kaiser, and Illia Polosukhin. 2017. Attention is all
  metaphor detection shared task. In Proceedings of        you need. In Advances in Neural Information Pro-
  the Workshop on Figurative Language Processing,          cessing Systems, pages 5998–6008.
  pages 56–66.                                           Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V
                                                           Le, Mohammad Norouzi, Wolfgang Macherey,
Rui Mao, Chenghua Lin, and Frank Guerin. 2018.
                                                           Maxim Krikun, Yuan Cao, Qin Gao, Klaus
  Word embedding and WordNet based metaphor iden-
                                                           Macherey, et al. 2016. Google’s neural machine
  tification and interpretation. In Proceedings of the
                                                           translation system: Bridging the gap between hu-
  56th Annual Meeting of the Association for Compu-
                                                           man and machine translation.      arXiv preprint
  tational Linguistics, volume 1, pages 1222–1231.
                                                           arXiv:1609.08144.
James Martin. 1990. A Computational Model of
  Metaphor Interpretation. Academic Press Profes-
  sional.

James H Martin. 2006. A corpus-based analysis of con-
  text effects on metaphor comprehension. Technical
  Report CU-CS-738-94, Boulder: University of Col-
  orado: Computer Science Department.

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor-
  rado, and Jeff Dean. 2013. Distributed representa-
  tions of words and phrases and their compositional-
  ity. In Advances in neural information processing
  systems, pages 3111–3119.
You can also read