WP3 : entity-fishing service - Presented by Tanti Kristanti (INRIA - Paris) For the HIRMEOS Final Workshop 2 June 2019 Marseille, France

Page created by Alfredo Yates
WP3 : entity-fishing service - Presented by Tanti Kristanti (INRIA - Paris) For the HIRMEOS Final Workshop 2 June 2019 Marseille, France
WP3 : entity-fishing service
     Presented by Tanti Kristanti (INRIA – Paris)
         For the HIRMEOS Final Workshop
                    2 June 2019
                 Marseille, France
WP3 : entity-fishing service - Presented by Tanti Kristanti (INRIA - Paris) For the HIRMEOS Final Workshop 2 June 2019 Marseille, France
entity-fishing (1)
     • An open source tool composed of services to automate the entity recognition and
       disambiguation against Wikidata 1
     • It is not restricted and not limited for special domains, classes of entities or
       usages 2
     • Initially developed within the FP9 CENDARI (Collaborative European Digital
       Archive Infrastructure) project 3
     • Continued to be developed within the H2020 HIRMEOS (High Integration of
       Research Monographs in the European Open Science Infrastructure) project to
       enrich open access digital monographs published on five digital platforms 4
     • Deployed as part of the national infrastructure Huma-Num in France
     • A stable online service within the DARIAH-EU infrastructure, the European digital
       research infrastructure for the arts and humanities
     • Distributed under Apache 2.0 license

1 Science-Miner, Entity disambiguation, http://science-miner.com/entity-disambiguation/, (accessed 6 May 2019)
2 Patrice Lopez, Overview: Motivation, 2019, https://nerd.readthedocs.io/en/latest/overview.html, (accessed 6 May 2019)
3 Patrice Lopez, Alexander Meyer, Laurent Romary. CENDARI Virtual Research Environment & Named Entity Recognition techniques. Grenzen überschreiten – Digitale Geisteswissenschaft heute

und morgen, Feb 2014, Berlin, Germany, https://hal.inria.fr/hal-01577975, (accessed 6 May 2019)
4 OAPEN, End user services: Named Entity Recognition and Disambiguation, http://www.oapen.org/content/services-end-user-services, (accessed 6 May 2019)
WP3 : entity-fishing service - Presented by Tanti Kristanti (INRIA - Paris) For the HIRMEOS Final Workshop 2 June 2019 Marseille, France
entity-fishing (2)
• Current version (0.0.3) supports English, French, German, Italian and Spanish
• Based on machine-learning techniques (Gradient Tree Boosting, CRF, word and
  entity embeddings)
   • For English and French, a Name Entity Recognition based on CRF Grobid-NER in combination
     with the disambiguation
• Library for machine learning uses SMILE ML
• Knowledge base contains
   • 37 million entities 154M statements from Wikidata
   • 15 millions word and entity embeddings
• Project repositories: https://github.com/kermitt2/entity-fishing
• Demo: http://nerd.huma-num.fr/nerd/
• Documentation: https://nerd.readthedocs.io/en/latest/
WP3 : entity-fishing service - Presented by Tanti Kristanti (INRIA - Paris) For the HIRMEOS Final Workshop 2 June 2019 Marseille, France
Examples of Text and Pdf File Processing with entity-fishing
WP3 : entity-fishing service - Presented by Tanti Kristanti (INRIA - Paris) For the HIRMEOS Final Workshop 2 June 2019 Marseille, France
How to use entity-fishing services ?
                                                                                                                               Response of the service
       • Through REST API                                                             Query parameter to be sent to the
       • Service can be applied on 4 types of input                  1:                          service
               •   text
               •   search query
               •   weighted vector of terms
               •   PDF document
       • REST query
               •   POST /disambiguate
               •   POST /language
               •   POST /segmentation
               •   POST /customisations
               •   GET /kb/concept/{id}
               •   GET /kb/term/{term}
               •   GET /language?text={text}
               •   GET /segmentation?text={text}
               •   GET /customisations
               •   GET /customisation/{name}
               •   PUT /customisation/{profile}
               •   DELETE /customisation/{profile}

1   Patrice Lopez, entity-fishing REST API, 2019, https://nerd.readthedocs.io/en/latest/restAPI.html, (accessed 13 May 2019)
WP3 : entity-fishing service - Presented by Tanti Kristanti (INRIA - Paris) For the HIRMEOS Final Workshop 2 June 2019 Marseille, France
WP3 Works
       • Deployment and integration of entity-fishing services in the
         partners’ open access platforms.
               • The approach : reusability and code sharing
       • Process the following data:
               •   4 000 books in English and French from Open Edition
               •   2000 titles in English and German from OAPEN
               •   162 books in English from Ubiquity Press
               •   765 books (606 in German, 159 in English) from UGOE
       • Result (entity-fishing clients in Java, Python, PHP) under licence APACHE 2.0
               •   entity-fishing-client-python: python client for entity-fishing service
               •   entity-fishing client-php-oe: php client for entity-fishing service by OpenEdition
               •   entity-fishing-client-php: php client for entity-fishing service by EKT
               •   entity-fishing-client-oapen: integration scripts with the OAPEN infrastructure by OAPEN
       • For validation measures needs:
               • Use a CC-BY gold standard HIRMEOS corpus
                      • Containing a set of thousands manually corrected Named Entity Recognition and Disambiguation entities with
                        Wikidata identifier (not present in any of the corpuses already existing (e.g. IITB, AQUAINT)
1   High Integration of Research Monographs in the European Open Science Infrastructure (HIRMEOS), WP3 NERD Work Package Validation, (accessed 6 May 2019)
2   Hirmeos Github, https://github.com/Hirmeos
WP3 : entity-fishing service - Presented by Tanti Kristanti (INRIA - Paris) For the HIRMEOS Final Workshop 2 June 2019 Marseille, France
How partners integrate
 entity-fishing in their
      platforms ?
WP3 : entity-fishing service - Presented by Tanti Kristanti (INRIA - Paris) For the HIRMEOS Final Workshop 2 June 2019 Marseille, France
The OpenEdition Books publishing
• entity-fishing PHP client is created and
  integrated into Core processes data for
   • Fetch entities as results of requesting the entity-
     fishing API services for chapters
   • Entities are classified as PERSON and LOCATION
   • Aggregate the entities results at books level
   • Location and Person entities at book and chapter
     level are stored in the SolR Index
   • Two facets for Persons and Location are added to
     the front-end interface
WP3 : entity-fishing service - Presented by Tanti Kristanti (INRIA - Paris) For the HIRMEOS Final Workshop 2 June 2019 Marseille, France
           • entity-fishing is integrated into the
             publishing workflow of Göttingen
             University Press (GUP) to enable the
             semi-automatic indexing of its
              • Titles, abstract and metadata of the
                monographs are processed by entity-
                fishing API to identify and categorize the
                named entities
              • Different named entities are classified
                into different classes : PERSON,
                LOCATION and ORGANIZATION
              • Show how often every singular entity
              • The indexed data are displayed as facets
                which are made available to users as
                « Keywords »; It allows users to quickly
                find the monographs by the entities
WP3 : entity-fishing service - Presented by Tanti Kristanti (INRIA - Paris) For the HIRMEOS Final Workshop 2 June 2019 Marseille, France
EKT / National Documentation Center
• The current release of OMP does not support any annotation service and EKT has
  improved OMP with entity-fishing support
• Integrating entity-fishing API service to the Open Monograph Press (OMP) monographs’
  landing page to annotate the abstract
• Two phases of implementation :
   • Create a PHP client that acts as a wrapper above entity-fishing service by hiding its complexity to the
       • Hiding the complexity of HTTP protocol
       • The JSON result of entity-fishing service is wrapped to high level class objects
   • Integrate the client into the OMP Software.
Ubiquity Press (UB)
• Developed an internal service to receive
  notifications from the existing company
  platform when a new article has been
  published and POSTs its content to the entity-
  fishing API to retrieve all the entities and
  store them locally.
• The entities are shown to the reader as
  clickable links referring to the Wikipedia

• Create some scripts to :
   • Call entity-fishing service with 1) path to PDF and 2) API URL as arguments
   • Storing the entity-fishing response locally
   • Combine the entity-fishing results with the unique identifier of the book or chapter
     in the OAPEN Library
   • Export of the database to CSV
• OAPEN plans to make the data available as a CC0 licensed file, which will be
  published on the OAPEN Library metadata page
You can also read