CARMA Constructional Analyzer using Recursively Multiple AVMs

Page created by Justin Gardner
 
CONTINUE READING
CARMA Constructional Analyzer using Recursively Multiple AVMs
CARMA
Constructional Analyzer using Recursively Multiple AVMs

Ely Edison Matos
ely.matos@ufjf.edu.br
September 6, 2018
FrameNet Brasil Project - UFJF
CARMA Constructional Analyzer using Recursively Multiple AVMs
Table of contents

   1. Introduction

   2. Premises

   3. Computational Processing

   4. Limitations and Outlook

                                 1
Introduction
Context

  FNBr is working on NLU (Natural Language Understanding) projects.
  FNBr approach to NLU comprises three main elements:

    1. Linguistic Knowledge: Lexicon, Constructions, GF, POS, Roles,
       Syntax, etc.
    2. World Knowledge: Ontologies and external datasets
    3. Situational Context: Frames and Frame Elements

  NLU processes must use linguistic knowledge cognitively to get an
  approximated shape of a world knowledge in a given situational
  context.

                                                                       2
CARMA

 CARMA1 is a constructional analyzer: given a raw sentence, it tries to
 identify the constructions in the sentence.
 If these constructions evoke a frame, it helps to identify the Situational
 Context.

  1 this   is a new version of [4]
                                                                              3
Why constructional analysis?

                                           obj
                    det      nsubj                det

                 o   celular   quebrou   a   tela
                DET NOUN        VERB DET NOUN
                The cellphone break.PST the screen

           ’The screen in cellphone broke’/Cxn Split_object
                                          obj
                     det    nsubj                det

                 o menino quebrou  a cadeira
                DET NOUN VERB DET NOUN
                The boy break.PST the chair

            ’The boy broke the chair’/Cxn Transitive_action

                                                              4
Resources

   CARMA is using 4 different resources:

    1. FNBr framenet
         • the network of frames and LUs, including all lexicon stuff (words,
           lexemes, lemmas,..)
    2. FNBr constructicon
         • the network of constructions
    3. FNBr ontology
         • a Generative Lexicon based ontology defining extended qualia
           relations between LUs (based on SIMPLE ontology[1])
    4. UD parser
         • to get the syntactic structure of sentence using UD POS and
           relations.

                                                                                5
Premises
AVM

      Figure 1: AVM structure

                                6
AVM

                      Figure 2: Everything as AVM

 ! Recursive AVMs: the value can be another AVM
                                                    7
Constraints

   CARMA is a constraint-based system: the AVM attributes must be
   restricted by a (set of) possible/acceptable value(s)
   Constructions are defined by constraining construction elements to
   dependency relations, as proposed by Property Grammar [2]

                                                                        8
Construction definition
   cxn_split_object:
       type: cxn
       class: cxn_split_object
       region: cxn_split_object
       attributes:
           nsubj:
                features: {optional: false, head: false}
                value: [ud_nsubj]
           verb:
                features: {optional: false, head: true}
                value: [pos_verb]
           obj:
                features: {optional: false, head: false}
                value: [ud_obj]
           x_part:
                type: xe
                value: [rel_is_part_of]
           x_frame:
                type: xe
                value: [frm_undergoing]
       constraints:
           - {arg1: verb, constraint: dominance, arg2: nsubj}
           - {arg1: verb, constraint: dominance, arg2: obj}
           - {arg1: nsubj, arg2: x_part, constraint: hasword}
           - {arg1: obj, arg2: x_part, constraint: hasword}

                                                                9
Computational Processing
Topology

  CARMA
           is a recursive hierarchical network
           and an elaborated pattern-matching system
  So, it is amenable to some Machine Learning techniques

                                                           10
RCN

  RCN: Recursive Cortical Network[3]

                   Figure 3: Overview of RCN (source:[3])   11
RCN

  ? RCN resembles AVM

                 Figure 4: Detail of of RCN (source:[3])
                                                           12
RCN Processing

  RCN can be used for generation and inference (parsing)
  Inference

    • Belief propagation
    • Forward-pass
    • Backward-pass

                                                           13
CARMA processing

  In CARMA we are interested in the parsing process
  Resources are stored in some persistent medium

    • Lexicon on FNBr database (MySQL)
    • Frames, Constructions, Ontology exported to Neo4j graph database

                                                                         14
CARMA processing

   1. User inputs a sentence.
   2. The sentence is parsed for UD (currently using UDPipe parser)
   3. FNBr database is queried for wordforms, lexemes and lemmas
   4. A type network is built with lexical stuff
   5. Graph database is queried to complete the type network
   6. Type network is traversed to create a token network
   7. Word nodes are activated, constraints are calculated and the
      activation spreads in token network until a root node
   8. Activated constructions nodes correspond to constructions detected
      in the sentence
   9. Conflicts (more than one construction activated) are resolved based
      on MAP (maximum a posteriori)

                                                                            15
CARMA processing

             Figure 5: Partial view of activated network

                                                           16
Limitations and Outlook
Limitations

     • Current version is at very beginning
     • UD parsing for Brazilian Portuguese is very limited and error prone
     • Some basic linguistic phenomenons are not handled yet (e.g. Null
       Instantiation)
     • and many others...

                                                                             17
Outlook

    • How to implement a learning process?
    • How to use the analysis in the context of construction alignment
    • How many constraint types?
    • and many others...

                                                                         18
Thank you!

             18
References i

      N. Bel, F. Busa, N. Calzolari, E. Gola, A. Lenci, M. Monachini,
      A. Ogonowski, I. Peters, W. Peters, N. Ruimy, M. Villegas, and
      A. Zampolli.
      SIMPLE: A General Framework for the Development of
      Multilingual Lexicons.
      Proceedings of the 2nd International Conference on Language
      Resources and Evaluation, 2000.
      P. Blache.
      Property Grammars: A Fully Constraint-Based Theory.
      In Christiansen H. et al. (eds.), Constraint Solving and Language
      Processing, Sptinger-Verlag, Berlin Heidelberg, pages 1–16, 2005.

                                                                          19
References ii

      D. George, W. Lehrach, K. Kansky, M. Lazaro-Gredilla, C. Laan,
      B. Marthi, X. Lou, Z. Meng, and Y. Liu.
      A generative vision model that trains with high data efficiency
      and breaks text-based CAPTCHAs.
      Science, 10(October):1–19, 1126.
      E. Matos, T. Torrent, V. Almeida, A. Laviola, L. Lage, N. Marção,
      and T. Tavares.
      Constructional Analysis Using Constrained Spreading
      Activation in a FrameNet-Based Structured Connectionist
      Model.
      The AAAI 2017 Spring Symposium on Computational Construction
      Grammar and Natural Language Understanding, Technical Report
      SS-17-02, pages 222–229, 2017.

                                                                          20
You can also read