Knowledge Graph Representation - From Recent Models towards a Theoretical Understanding - From Recent Models ...

Page created by Geraldine Boyd
 
CONTINUE READING
Knowledge Graph Representation - From Recent Models towards a Theoretical Understanding - From Recent Models ...
Knowledge Graph Representation
From Recent Models towards a Theoretical Understanding

Ivana Balažević & Carl Allen
January 27, 2021
School of Informatics, University of Edinburgh
What are Knowledge Graphs?

                   A            father of         B

                                                              f?
                                                          le o
                   married to

                                         f?

                                                         c
                                     ro

                                                      un
                                    he
                                ot
                                m                              D
                                     sibling
                   C

  Entities        E = {A, B, C , D}
  Relations       R = {married to, father of, uncle of, ...}
  Knowledge Graph G = {(A, father of, B), (A, married to, C ), ...}
                                                                      1
Representing Entities and Relations

   Subject and object entities es , eo are represented by vectors es , eo ∈ Rd
   (embeddings).

                                                                                 2
Representing Entities and Relations

   Subject and object entities es , eo are represented by vectors es , eo ∈ Rd
   (embeddings).
                                                                     0
   Relations r are represented by transformations fr , gr : Rd → Rd that
   transform the entity embeddings.

                                                                                 2
Representing Entities and Relations

   Subject and object entities es , eo are represented by vectors es , eo ∈ Rd
   (embeddings).
                                                                     0
   Relations r are represented by transformations fr , gr : Rd → Rd that
   transform the entity embeddings.
   A proximity measure, e.g. Euclidean distance, dot product, compares
   the transformed subject and object entities.

                                                                                 2
Representing Entities and Relations

   Subject and object entities es , eo are represented by vectors es , eo ∈ Rd
   (embeddings).
                                                                     0
   Relations r are represented by transformations fr , gr : Rd → Rd that
   transform the entity embeddings.
   A proximity measure, e.g. Euclidean distance, dot product, compares
   the transformed subject and object entities.

                                             (r )
                                            eo
                                     (r )
                                    es              gr

                               fr                        eo

                          es

                        (Edinburgh, capital of, Scotland)
                                                                                 2
Score Function

   A score function φ : E ×R×E → R brings together entity, relation
   representations and proximity measure to assign a score φ(es , r , eo ) to
   each triple, used to predict whether the triple is true or false.

                                                                                3
Score Function

   A score function φ : E ×R×E → R brings together entity, relation
   representations and proximity measure to assign a score φ(es , r , eo ) to
   each triple, used to predict whether the triple is true or false.

   Representation parameters are optimised to improve prediction accuracy.

                                                                                3
Score Function

   A score function φ : E ×R×E → R brings together entity, relation
   representations and proximity measure to assign a score φ(es , r , eo ) to
   each triple, used to predict whether the triple is true or false.

   Representation parameters are optimised to improve prediction accuracy.

   Score functions can be broadly categorised by:
    ä relation representation type (additive, multiplicative or both); and
    ä proximity measure (e.g. dot product, Euclidean distance).

                                                                                3
Score Function

   A score function φ : E ×R×E → R brings together entity, relation
   representations and proximity measure to assign a score φ(es , r , eo ) to
   each triple, used to predict whether the triple is true or false.

   Representation parameters are optimised to improve prediction accuracy.

   Score functions can be broadly categorised by:
    ä relation representation type (additive, multiplicative or both); and
    ä proximity measure (e.g. dot product, Euclidean distance).

     Rel. Repr. Type   Example φ(es , r , eo )        Model

                                     (r )             DistMult (Yang et al., 2015)
     Multiplicative    e>
                        s Wr eo = hes , eo i
                                                      TuckER (Balažević et al., 2019b)
     Additive          −kes +r−eo k2                  TransE (Bordes et al., 2013)
     Both              −ke>  s     >  o 2
                          s Wr +r−eo Wr k + bs + bo   MuRE (Balažević et al., 2019a)

                                                                                           3
TuckER: Tensor Factorization for Knowledge Graph Completion

                              W

                                                                  Wr

         es      de                             =     es   de

                                    dr    wr

                         de                                        de

                         eo                                        eo

                 Figure 1: Visualization of the TuckER architecture.

              φTuckER (es , r , eo ) = ((W ×1 wr )×2 es )×3 eo = e>
                                                                  s Wr eo

                                                                            4
TuckER: Tensor Factorization for Knowledge Graph Completion

                                W

                                                                      Wr

           es      de                             =     es   de

                                      dr    wr

                           de                                          de

                           eo                                          eo

                   Figure 1: Visualization of the TuckER architecture.

                φTuckER (es , r , eo ) = ((W ×1 wr )×2 es )×3 eo = e>
                                                                    s Wr eo

   Multi-task learning: Rather than learning distinct relation matrices Wr ,
   the core tensor W contains a shared pool of “prototype” relation matrices
   that are linearly combined using parameters of the relation embedding wr .
                                                         (Balažević et al., 2019a)   4
MuRE: Multi-relational Euclidean Graph Embeddings

                                 z

                                          y

                                                   x

                Figure 2: MuRE spheres of influence.

               φMuRE = −d(Res , eo + r)2 + bs + bo

                                              (Balažević et al., 2019b)   5
Recap

 ä KGs store facts: binary relations between entities (es , r , eo ).

                                                                        6
Recap

 ä KGs store facts: binary relations between entities (es , r , eo ).

 ä Enable computational reasoning over KGs,
    e.g. question answering and inferring new facts (link prediction)

                                                                        6
Recap

 ä KGs store facts: binary relations between entities (es , r , eo ).

 ä Enable computational reasoning over KGs,
    e.g. question answering and inferring new facts (link prediction)

 ä Requires representation, typically:
        • each entity by a vector embedding e ∈ Rd ,
                                                                        eo
        • each relation by a transformation from           es
          subject entity to object entity,

    Many, many models with increasing success, but no principled
    rationale as to why they work, or how to improve (e.g. better
    prediction, incorporate logic, etc).
Recap

 ä KGs store facts: binary relations between entities (es , r , eo ).

 ä Enable computational reasoning over KGs,
    e.g. question answering and inferring new facts (link prediction)

 ä Requires representation, typically:                           r       (r )
                                                                        es
        • each entity by a vector embedding e ∈ Rd ,
                                                                        eo
        • each relation by a transformation from           es
          subject entity to object entity,

    Many, many models with increasing success, but no principled
    rationale as to why they work, or how to improve (e.g. better
    prediction, incorporate logic, etc).
Recap

 ä KGs store facts: binary relations between entities (es , r , eo ).

 ä Enable computational reasoning over KGs,
    e.g. question answering and inferring new facts (link prediction)

 ä Requires representation, typically:                           r       (r )
                                                                        es
        • each entity by a vector embedding e ∈ Rd ,
                                                                        eo
        • each relation by a transformation from           es
          subject entity to object entity,

    Many, many models with increasing success, but no principled
    rationale as to why they work, or how to improve (e.g. better
    prediction, incorporate logic, etc).

                                                                                6
Recap

 ä KGs store facts: binary relations between entities (es , r , eo ).

 ä Enable computational reasoning over KGs,
    e.g. question answering and inferring new facts (link prediction)

 ä Requires representation, typically:                           r       (r )
                                                                        es
        • each entity by a vector embedding e ∈ Rd ,
                                                                        eo
        • each relation by a transformation from           es
          subject entity to object entity,

 ä Many, many models with gradually increasing success, but no
   principled rationale for why they work, or how to improve them
   (e.g. more accurate prediction, incorporate logic, etc).

                                                                                6
Simplify: consider Word Embeddings
                                               target              context
                                              words (E)           words (E)

  ä Word embeddings, e.g.                        w1                  c1

                                                 w2                  c2
     • Word2Vec (W2V, Mikolov et al., 2013)
                                                 w3                  c3
     • GloVe (Pennington et al., 2014)
                                                  ..                  ..
                                                   .                   .

                                                 wn       W   C      cn

                                                                              7
Simplify: consider Word Embeddings
                                                  target              context
                                                 words (E)           words (E)

  ä Word embeddings, e.g.                           w1                  c1

                                                    w2                  c2
     • Word2Vec (W2V, Mikolov et al., 2013)
                                                    w3                  c3
     • GloVe (Pennington et al., 2014)
                                                     ..                  ..
                                                      .                   .

                                                    wn       W   C      cn

  ä Observation:   semantic relations    =⇒   geometric relationships
                    between words              between embeddings

                                                                                 7
Simplify: consider Word Embeddings
                                                  target              context
                                                 words (E)           words (E)

  ä Word embeddings, e.g.                           w1                  c1

                                                    w2                  c2
     • Word2Vec (W2V, Mikolov et al., 2013)
                                                    w3                  c3
     • GloVe (Pennington et al., 2014)
                                                     ..                  ..
                                                      .                   .

                                                    wn       W   C      cn

  ä Observation:   semantic relations    =⇒   geometric relationships
                    between words              between embeddings
    • similar words ⇒ close embeddings

                                                                                 7
Simplify: consider Word Embeddings
                                                    target                 context
                                                   words (E)              words (E)

  ä Word embeddings, e.g.                             w1                     c1

                                                      w2                     c2
     • Word2Vec (W2V, Mikolov et al., 2013)
                                                      w3                     c3
     • GloVe (Pennington et al., 2014)
                                                       ..                     ..
                                                        .                      .

                                                      wn       W      C      cn

  ä Observation:   semantic relations      =⇒   geometric relationships
                    between words                between embeddings
    • similar words ⇒ close embeddings
                                                            wwoman + wking − wman
    • analogies (often) ⇒            wking                   ≈ wqueen

                                                 wwoman
                                     man
                                 w

                                                                                      7
Simplify: consider Word Embeddings
                                                     target                 context
                                                    words (E)              words (E)

  ä Word embeddings, e.g.                              w1                     c1

                                                       w2                     c2
     • Word2Vec (W2V, Mikolov et al., 2013)
                                                       w3                     c3
     • GloVe (Pennington et al., 2014)
                                                        ..                     ..
                                                         .                      .

                                                       wn       W      C      cn

  ä Observation:    semantic relations      =⇒   geometric relationships
                     between words                between embeddings
    • similar words ⇒ close embeddings
                                                             wwoman + wking − wman
    • analogies (often) ⇒            wking                    ≈ wqueen

                                                  wwoman
                                      man
                                  w

  ä Aim: relate the understanding of this to knowledge graph relations
                                                                                       7
Understanding word embeddings: the W2V Loss Function

                                                  k#(wi )#(cj )
             X
  −`W 2V =         #(wi , cj ) log σ(wi> cj ) +       D           log(σ(−wi> cj ))
             i,j

                                                                                     8
Understanding word embeddings: the W2V Loss Function

                                                    k#(wi )#(cj )
               X
  −`W 2V =           #(wi , cj ) log σ(wi> cj ) +       D            log(σ(−wi> cj ))
               i,j
               X
                                                        σ(Si,j ) − σ(wi> cj ) cj = C diag(d(i))e(i)
                                                    
 ∇wi `W 2V ∝          p(wi , cj )+kp(wi )p(cj )
               j      |           {z         }          |        {z        }
                                  (i)                            (i)
                                 dj                             ej

                                                                                                      8
Understanding word embeddings: the W2V Loss Function

                                                      k#(wi )#(cj )
               X
  −`W 2V =           #(wi , cj ) log σ(wi> cj ) +         D            log(σ(−wi> cj ))
               i,j
               X
                                                          σ(Si,j ) − σ(wi> cj ) cj = C diag(d(i))e(i)
                                                      
 ∇wi `W 2V ∝          p(wi , cj )+kp(wi )p(cj )
               j      |           {z         }            |        {z        }
                                  (i)                              (i)
                                 dj                               ej

 • `W 2V minimised when:
                              p(c |w )           .
   low-rank case: wi> cj = log p(cj j )i − log k = Si,j                     (Levy and Goldberg, 2014)
                           | {z }
                                      PMI(wi , cj )

                                                                                                        8
Understanding word embeddings: the W2V Loss Function

                                                      k#(wi )#(cj )
               X
  −`W 2V =           #(wi , cj ) log σ(wi> cj ) +         D            log(σ(−wi> cj ))
               i,j
               X
                                                          σ(Si,j ) − σ(wi> cj ) cj = C diag(d(i))e(i)
                                                      
 ∇wi `W 2V ∝          p(wi , cj )+kp(wi )p(cj )
               j      |           {z         }            |        {z        }
                                  (i)                              (i)
                                 dj                               ej

 • `W 2V minimised when:
                              p(c |w )           .
   low-rank case: wi> cj = log p(cj j )i − log k = Si,j                     (Levy and Goldberg, 2014)
                           | {z }
                                      PMI(wi , cj )

     general case: error vectors diag(d(i) )e(i) orthogonal to rows of C

                                                                                                        8
Understanding word embeddings: the W2V Loss Function

                                                      k#(wi )#(cj )
               X
  −`W 2V =           #(wi , cj ) log σ(wi> cj ) +         D            log(σ(−wi> cj ))
               i,j
               X
                                                          σ(Si,j ) − σ(wi> cj ) cj = C diag(d(i))e(i)
                                                      
 ∇wi `W 2V ∝          p(wi , cj )+kp(wi )p(cj )
               j      |           {z         }            |        {z        }
                                  (i)                              (i)
                                 dj                               ej

 • `W 2V minimised when:
                              p(c |w )           .
   low-rank case: wi> cj = log p(cj j )i − log k = Si,j                     (Levy and Goldberg, 2014)
                           | {z }
                                      PMI(wi , cj )

     general case: error vectors diag(d(i) )e(i) orthogonal to rows of C

    ⇒ Embedding wi is a (non-linear) projection of row i of the PMI matrix*,
                             a PMI vector pi .

   (* drop k term as artefact of the W2V algorithm.)
                                                                                                        8
PMI Vectors

              p(c |w )
   pi =                            = log p(E|w i)
          
           log p(cj j )i   cj ∈E          p(E)      (E = dictionary of all words)

   Figure 3: The PMI surface S with example PMI vectors of words (red dots)

                                                                                    9
PMI Vector Interactions = Semantics (Similarity)

   Similarity: similar words, e.g. synonyms, induce similar distributions,
               p(E|w ), over context words.

                                                                             10
PMI Vector Interactions = Semantics (Similarity)

   Similarity: similar words, e.g. synonyms, induce similar distributions,
               p(E|w ), over context words.

   Identified by subtraction of PMI vectors:
                                       p(E|wi )
                         pi − pj = log p(E|w j)
                                                = ρi,j

                                                                             10
PMI Vector Interactions = Semantics (Similarity)

   Similarity: similar words, e.g. synonyms, induce similar distributions,
               p(E|w ), over context words.

   Identified by subtraction of PMI vectors:
                                          p(E|wi )
                            pi − pj = log p(E|w j)
                                                   = ρi,j

               p(E|hound)         p(E|dog)                    pdog   phound

                                                     =⇒
        w1                                     wn
                              E

                                                                              10
PMI Vector Interactions = Semantics (Paraphrase)

   Paraphrases: word sets with similar aggregate semantic meaning,
                e.g. {man, royal} ≈ king.

                                                                     11
PMI Vector Interactions = Semantics (Paraphrase)

   Paraphrases: word sets with similar aggregate semantic meaning,
                e.g. {man, royal} ≈ king.

   Identified by addition of PMI vectors:
                                      p(E|wj )
       pi + pj = log p(E|w i)
                      p(E) + log       p(E)
                               p(E|w ,w )          p(w ,w |E)              p(w ,w )
               = pk + log p(E|wi k )j − log p(wi |E)p(w
                                                   i  j
                                                        j |E)
                                                                           i
                                                              + log p(wi )p(wj
                                                                               j)
                      |     {z      }   |      {z           } |      {z         }
                         ρ {i,j},k             σ  i,j                τ i,j
                      |     {z      }   |                   {z                  }
                          paraphrase error            independence error

                                                                                      11
PMI Vector Interactions = Semantics (Paraphrase)

   Paraphrases: word sets with similar aggregate semantic meaning,
                e.g. {man, royal} ≈ king.

   Identified by addition of PMI vectors:
                                      p(E|wj )
       pi + pj = log p(E|w i)
                      p(E) + log       p(E)
                               p(E|w ,w )             p(w ,w |E)                p(w ,w )
               = pk + log p(E|wi k )j − log p(wi |E)p(w
                                                   i  j
                                                        j |E)
                                                                           i
                                                              + log p(wi )p(wj
                                                                               j)
                      |     {z      }   |      {z           } |      {z         }
                         ρ {i,j},k             σ  i,j                τ i,j
                      |     {z      }   |                   {z                  }
                          paraphrase error               independence error

                                                                   pman   p{man, royal}
              p(E|king)   p(E|{man, royal})
                                                                                     pking
                                                      =⇒
       w1                                        wn
                          E
                                                                                 proyal      11
PMI Vector Interactions = Semantics (Analogy)

   Analogies: word pairs that share a similar semantic difference,
              e.g. {man, king} and {woman, queen}.

                                                                     12
PMI Vector Interactions = Semantics (Analogy)

   Analogies: word pairs that share a similar semantic difference,
              e.g. {man, king} and {woman, queen}.

   Identified by a linear combination of PMI vectors:
                        pking − pman ≈ pqueen − pwoman

                                                                     12
PMI Vector Interactions = Semantics (Analogy)

   Analogies: word pairs that share a similar semantic difference,
              e.g. {man, king} and {woman, queen}.

   Identified by a linear combination of PMI vectors:
                        pking − pman ≈ pqueen − pwoman

                                                      pqueen
                             pking

                                            pwoman
                      pman

                               (Allen and Hospedales, 2019; Allen et al., 2019)   12
From Analogies to Relations

                                 pqueen
           pking
                                                    ≈                 ≈
                                          ⇔
                        pwoman
    pman                                            +                 +
                                              man       king    woman     queen
              Analogy                                      Relation

                                                                                  13
From Analogies to Relations

                                  pqueen
            pking
                                                     ≈                 ≈
                                           ⇔
                         pwoman
     pman                                            +                 +
                                               man       king    woman     queen
               Analogy                                      Relation

 ä Analogies contain common binary word relations, similar to KGs.

                                                                                   13
From Analogies to Relations

                                  pqueen
            pking
                                                     ≈                 ≈
                                           ⇔
                         pwoman
     pman                                            +                 +
                                               man       king    woman     queen
               Analogy                                      Relation

 ä Analogies contain common binary word relations, similar to KGs.
 ä For certain analogies (“specialisations”), the associated “vector offset”
   gives a transformation that represents the relation.

                                                                                   13
From Analogies to Relations

                                  pqueen
            pking
                                                     ≈                 ≈
                                           ⇔
                         pwoman
     pman                                            +                 +
                                               man       king    woman     queen
               Analogy                                      Relation

 ä Analogies contain common binary word relations, similar to KGs.
 ä For certain analogies (“specialisations”), the associated “vector offset”
   gives a transformation that represents the relation.
 ä Not all relations fit this semantic pattern, but we have insight to
   consider geometric aspects (relation conditions) of other relation types.

                                                                                   13
Categorising Relations: semantics → relation requirements

                          ≈
          ≈                                ≈                ≈                  ≈

       Similarity     Relatedness     Specialisation   Context-shift Gen. context-shift

              Relationships between PMI vectors for different relation types.
     blue/green = strong word association (PMI> 0); red = relatedness; black = context sets

                                                                                              14
Categorising Relations: semantics → relation requirements

                                    ≈
              ≈                                            ≈                         ≈               ≈

           Similarity         Relatedness           Specialisation         Context-shift Gen. context-shift

                  Relationships between PMI vectors for different relation types.
        blue/green = strong word association (PMI> 0); red = relatedness; black = context sets

                                    Categorisation of WN18RR relations.
    Type    Relation                      Examples (subject entity, object entity)
            verb group                    (trim down VB 1, cut VB 35), (hatch VB 1, incubate VB 2)
    R       derivationally related form   (lodge VB 4, accommodation NN 4), (question NN 1, inquire VB 1)
            also see                      (clean JJ 1, tidy JJ 1), (ram VB 2, screw VB 3)
            hypernym                      (land reform NN 1, reform NN 1), (prickle-weed NN 1, herbaceous plant NN 1)
    S
            instance hypernym             (yellowstone river NN 1, river NN 1), (leipzig NN 1, urban center NN 1)
            member of domain usage        (colloquialism NN 1, figure VB 5), (plural form NN 1, authority NN 2)
            member of domain region       (rome NN 1, gladiator NN 1), (usa NN 1, multiple voting NN 1)
    C       member meronym                (south NN 2, sunshine state NN 1), (genus carya NN 1, pecan tree NN 1)
            has part                      (aircraft NN 1, cabin NN 3), (morocco NN 1, atlas mountains NN 1)
            synset domain topic of        (quark NN 1, physics NN 1), (harmonize VB 3, music NN 4)                      14
Categorical completeness: are all relations covered?

 ä View PMI vectors as sets of word features and relation types as set
   operations:
      • similarity      ⇒ set equality
      • relatedness     ⇒ subset equality (relation-specific)
      • context-shift   ⇒ set difference (relation-specific)

                                                                         15
Categorical completeness: are all relations covered?

 ä View PMI vectors as sets of word features and relation types as set
   operations:
      • similarity       ⇒ set equality
      • relatedness      ⇒ subset equality (relation-specific)
      • context-shift    ⇒ set difference (relation-specific)

 ä For any relation, each feature is either
      • necessarily unchanged (relatedness),
      • necessarily/potentially changed (context shift), or
      • irrelevant.

                                                                         15
Categorical completeness: are all relations covered?

 ä View PMI vectors as sets of word features and relation types as set
   operations:
      • similarity       ⇒ set equality
      • relatedness      ⇒ subset equality (relation-specific)
      • context-shift    ⇒ set difference (relation-specific)

 ä For any relation, each feature is either
      • necessarily unchanged (relatedness),
      • necessarily/potentially changed (context shift), or
      • irrelevant.

 ä Conjecture: the relation types identified partition the set of semantic
   relations.

                                                                             15
Relations as mappings between embeddings

  R: S-relatedness requires both entity embeddings es , eo to share a common
     subspace component VS
    ä project onto VS (multiply by matrix Pr ∈ Rd×d ) and compare.
    ä Dot product:        (Pr es )> (Pr eo ) = es> Pr> Pr eo = es> Mr eo
    ä Euclidean distance: kPr es −Pr eo k2 = kPr es k2 − 2es> Mr eo + kPr eo k2

S/C: requires S-relatedness and relation-specific component(s) (vrs , vro ).
    ä project onto a subspace (by Pr ∈ Rd×d ) corresponding to S, vrs and vro
      (i.e. test S-relatedness while preserving relation-specific components);
    ä add relation-specific r = vro − vrs ∈ Rd to transformed embeddings.
    ä Dot product:          (Pr es + r )> Pr eo
    ä Euclidean distance: kPr es + r − Pr eo k2 (cf MuRE: kRes + r − eo k2 )

                                                                                  16
Summary

ä Theoretic: a derivation of geometric components of relation
  representations from word co-occurrence statistics.

                                                                17
Summary

ä Theoretic: a derivation of geometric components of relation
  representations from word co-occurrence statistics.

ä Interpretability: associates geometric model components with semantic
  aspects of relations.

                                                                          17
Summary

ä Theoretic: a derivation of geometric components of relation
  representations from word co-occurrence statistics.

ä Interpretability: associates geometric model components with semantic
  aspects of relations.

ä Empirically supported: justifies relative link-prediction performance of a
  range of models on real datasets:

                                                                               17
Summary

ä Theoretic: a derivation of geometric components of relation
  representations from word co-occurrence statistics.

ä Interpretability: associates geometric model components with semantic
  aspects of relations.

ä Empirically supported: justifies relative link-prediction performance of a
  range of models on real datasets:
additive & multiplicative              >       multiplicative                 or          additive                .
|          {z           }                      |    {z      }                             | {z }
 MuRE* (Balažević et al., 2019a)       TuckER (Balažević et al., 2019b)        TransE (Bordes et al., 2013)
                                           DistMult (Yang et al., 2015)

 *Note: MuRE was inspired by the vector offset of analogies.

                                                          Work to appear in ICLR 2021 (Allen et al., 2021).   17
Thanks!

          Any questions?

                           18
References i

   Carl Allen and Timothy Hospedales. Analogies Explained: Towards Understanding
     Word Embeddings. In ICML, 2019.
   Carl Allen, Ivana Balažević, and Timothy Hospedales. What the Vec? Towards
     Probabilistically Grounded Embeddings. In NeurIPS, 2019.
   Carl Allen, Ivana Balažević, and Timothy Hospedales. Interpreting Knowledge Graph
     Relation Representation from Word Embeddings. In ICLR, 2021.
   Ivana Balažević, Carl Allen, and Timothy M Hospedales. Multi-relational Poincaré
      Graph Embeddings. In NeurIPS, 2019a.
   Ivana Balažević, Carl Allen, and Timothy M Hospedales. TuckER: Tensor
      Factorization for Knowledge Graph Completion. In EMNLP, 2019b.
   Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana
     Yakhnenko. Translating Embeddings for Modeling Multi-relational Data. In
     NeurIPS, 2013.
   Omer Levy and Yoav Goldberg. Neural word embedding as implicit matrix
     factorization. In NeurIPS, 2014.

                                                                                         19
References ii

   Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of
     word representations in vector space. In ICLR Workshop, 2013.
   Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global Vectors
      for Word Representation. In EMNLP, 2014.
   Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. Embedding
      Entities and Relations for Learning and Inference in Knowledge Bases. In ICLR,
      2015.

                                                                                        20
You can also read