Data-Centric Systems and Applications

Page created by Louis Figueroa
 
CONTINUE READING
Data-Centric Systems and Applications

                                                 Series Editors
                                                    M.J. Carey
                                                        S. Ceri

                                                Editorial Board
                                                    P. Bernstein
                                                        U. Dayal
                                                   C. Faloutsos
                                                    J.C. Freytag
                                                    G. Gardarin
                                                      W. Jonker
                                              V. Krishnamurthy
                                                  M.-A. Neimat
                                                    P. Valduriez
                                                     G. Weikum
                                                  K.-Y. Whang
                                                       J. Widom

For further volumes:
http://www.springer.com/series/5258
Roberto De Virgilio      Francesco Guerra
Yannis Velegrakis
Editors

Semantic Search
over the Web

123
Editors
Roberto De Virgilio                                       Francesco Guerra
Department of informatics and Automation                  University of Modena and Reggio Emilia
University Roma Tre                                       Modena
Rome                                                      Italy
Italy

Yannis Velegrakis
University of Trento
Trento
Italy

ISBN 978-3-642-25007-1          ISBN 978-3-642-25008-8 (eBook)
DOI 10.1007/978-3-642-25008-8
Springer Heidelberg New York Dordrecht London

Library of Congress Control Number: 2012943692

ACM Computing Classification: H.3, I.2

c Springer-Verlag Berlin Heidelberg 2012
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection
with reviews or scholarly analysis or material supplied specifically for the purpose of being entered
and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of
this publication or parts thereof is permitted only under the provisions of the Copyright Law of the
Publisher’s location, in its current version, and permission for use must always be obtained from Springer.
Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations
are liable to prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for
any errors or omissions that may be made. The publisher makes no warranty, express or implied, with
respect to the material contained herein.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)
Introduction

The Web has become the world’s largest database with search being the main
tool that enables organization and individuals to exploit its huge amounts of
information that is freely offering. Thus, having a successful mechanism for
finding and retrieving the most relevant information to a task at hand is of major
importance. Traditionally, Web search has been based on textual and structural
similarity. Given the set of keywords that comprise a query, the goal is to identify
the documents containing all these keywords (or as many as possible). Additional
information such as information from logs, references from authorities, popularity,
and personalization has been extensively used to further improve the accuracy.
However, one of the dimensions that has not been captured to its full extent is that
of semantics, that is, fully understanding the meaning of the words in a query and in
a document. Combining search and semantics gives birth to the idea of the semantic
search. Semantic search can be described in a sentence as the effort of improving
the accuracy of the search process by understanding the context and limiting the
ambiguity.
   The idea of the semantic Web is based on this goal and aims at making the
semantics of the Web content machine understandable. To do so, a number of
different technologies that allowed for richer modeling of the Web resources, along-
side annotations describing their semantics, have been introduced. Furthermore, the
semantic Web went on to create associations between different representations of the
same real-world entity. These associations are either explicitly specified or derived
off-line and then remain static. They allow data from many different sources to
be interlinked, giving birth to the so-called linked open data cloud. Nevertheless,
semantics have yet to fully penetrate existing data management solutions and
become an integral part in information retrieval, analysis, integration, and data
exchange techniques.
   Unfortunately, the generic idea of semantic search has remained in its infancy.
Existing solutions are either search engines that simply index the semantic Web data,
like Sindice, or the traditional search engines enhanced with some basic form of
synonym exploitation, as supported by Google and Bing. Semantic search is about
using the semantics of the query terms instead of the terms themselves. This means

                                                                                   v
vi                                                                        Introduction

using synonyms and related terms, providing additional materials in the answer that
may be related to elements already in the result, searching not only in the content
but also in the semantic annotations of the data, exploiting ontological knowledge
through advanced reasoning techniques, treating the query as a natural language
expression, clustering the results, offering faced browsing, etc.
   All the above mean that there are currently numerous opportunities to exploit
in the area of semantic search on the Web. In this work, we try to give a generic
overview of the works that have been done in the field and in other related areas.
However, the work should definitely not be considered as a survey. It is simply
intended to provide the reader with a taste of the many different aspects of the
problem and go deep in some specific technologies and solutions.
   The book is divided into three parts. The first part introduces the notion of the
Web of Data. It describes the different types of data that exist, their topology, and
their storing and indexing techniques. It also shows how semantic links between the
data can be automatically derived.
   The second part is dedicated specifically to Web search. It presents different
kinds of search, such as the exploratory or the path-oriented, alongside methods
for efficiently implementing them. It talks about the problem of interactive query
construction and also about the understanding of the keyword query semantics.
Other topics include the use of uncertainty in query answering or the exploitation of
ontologies. The second part concludes with some reference to Mashup technologies
and the way they are affected by the semantics.
   The theme of the third part of the book is Linked Data and, more specifically,
how recommender system ideas can be used in the case of linked data management
alongside techniques for efficient query answering.

Rome, Italy                                                     Roberto De Virgilio
Modena, Italy                                                    Francesco Guerra
Trento, Italy                                                     Yannis Velegrakis
Contents

Part I     Introduction to Web of Data

1   Topology of the Web of Data . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .                         3
    Christian Bizer, Pablo N. Mendes, and Anja Jentzsch
2   Storing and Indexing Massive RDF Datasets .. . . . . . .. . . . . . . . . . . . . . . . . . . .                                               31
    Yongming Luo, François Picalausa, George H.L. Fletcher,
    Jan Hidders, and Stijn Vansummeren
3   Designing Exploratory Search Applications upon Web
    Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .   61
    Marco Brambilla and Stefano Ceri

Part II      Search over the Web

4   Path-Oriented Keyword Search Query over RDF . .. . . . . . . . . . . . . . . . . . . .                                                        81
    Roberto De Virgilio, Paolo Cappellari, Antonio Maccioni,
    and Riccardo Torlone
5   Interactive Query Construction for Keyword Search on
    the Semantic Web. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 109
    Gideon Zenz, Xuan Zhou, Enrico Minack, Wolf Siberski,
    and Wolfgang Nejdl
6   Understanding the Semantics of Keyword Queries on
    Relational Data Without Accessing the Instance . . . .. . . . . . . . . . . . . . . . . . . . 131
    Sonia Bergamaschi, Elton Domnori, Francesco Guerra, Silvia
    Rota, Raquel Trillo Lado, and Yannis Velegrakis
7   Keyword-Based Search over Semantic Data . . . . . . . .. . . . . . . . . . . . . . . . . . . . 159
    Klara Weiand, Andreas Hartl, Steffen Hausmann,
    Tim Furche, and François Bry

                                                                                                                                                  vii
viii                                                                                                                                                 Contents

8        Semantic Link Discovery over Relational Data . . . . .. . . . . . . . . . . . . . . . . . . . 193
         Oktie Hassanzadeh, Anastasios Kementsietsidis,
         Lipyeow Lim, Renée J. Miller, and Min Wang
9        Embracing Uncertainty in Entity Linking . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 225
         Ekaterini Ioannou, Wolfgang Nejdl, Claudia Niederée,
         and Yannis Velegrakis
10 The Return of the Entity-Relationship Model: Ontological
   Query Answering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 255
   Andrea Calı̀, Georg Gottlob, and Andreas Pieris
11 Linked Data Services and Semantics-Enabled Mashup . . . . . . . . . . . . . . . . 283
   Devis Bianchini and Valeria De Antonellis

Part III             Linked Data Search Engines

12 A Recommender System for Linked Data . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 311
   Roberto Mirizzi, Azzurra Ragone, Tommaso Di Noia,
   and Eugenio Di Sciascio
13 Flint: From Web Pages to Probabilistic Semantic Data .. . . . . . . . . . . . . . . 333
   Lorenzo Blanco, Mirko Bronzi, Valter Crescenzi,
   Paolo Merialdo, and Paolo Papotti
14 Searching and Browsing Linked Data with SWSE . . . . . . . . . . . . . . . . . . . . 361
   Andreas Harth, Aidan Hogan, Jürgen Umbrich,
   Sheila Kinsella, Axel Polleres, and Stefan Decker

Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 415
Contributors

Sonia Bergamaschi Dipartimento di Ingegneria dell’Informazione, Università di
Modena e Reggio Emilia, Modena, Italy
Devis Bianchini Department of Electronics for Automation, University of Brescia,
Brescia, Italy
Christian Bizer Web-based Systems Group, Freie Universität Berlin, Berlin,
Germany
Lorenzo Blanco Dipartimento di Informatica e Automazione, Università degli
Studi Roma Tre, Rome, Italy
Marco Brambilla Dipartimento di Elettronica e Informazione, Politecnico di
Milano, Milano, Italy
Mirko Bronzi Dipartimento di Informatica e Automazione, Università degli Studi
Roma Tre, Rome, Italy
François Bry Institute for Informatics, University of Munich, München, Germany
Andrea Calı̀ Department of Computer Science and Information Systems, Birkbeck
University of London, London, UK
Paolo Cappellari Interoperable System Group, Dublin City University, Dublin,
Ireland
Stefano Ceri Dipartimento di Elettronica e Informazione, Politecnico di Milano,
Milano, Italy
Valter Crescenzi Dipartimento di Informatica e Automazione, Università degli
Studi Roma Tre, Rome, Italy
Valeria De Antonellis Department of Electronics for Automation, University of
Brescia, Brescia, Italy
Roberto De Virgilio University Roma Tre, Rome, Italy

                                                                              ix
x                                                                      Contributors

Tommaso Di Noia Dipartimento di Elettrotecnica ed Elettronica, Politecnico di
Bari, Bari, Italy
Eugenio Di Sciascio Dipartimento di Elettrotecnica ed Elettronica, Politecnico di
Bari, Bari, Italy
Stefan Decker Digital Enterprise Research Institute, National University of
Ireland, Galway, Ireland
Elton Domnori Dipartimento di Ingegneria dell’Informazione, Università di
Modena e Reggio Emilia, Modena, Italy
George H.L. Fletcher Department of Mathematics and Computer Science,
Eindhoven University of Technology, Eindhoven, The Netherlands
Tim Furche Department of Computer Science and Institute for the Future of
Computing, Oxford University, Oxford, UK
Georg Gottlob Computing Laboratory, University of Oxford, Oxford, UK
Oxford-Man Institute of Quantitative Finance, University of Oxford, Oxford, UK
Francesco Guerra Dipartimento di Economia Aziendale, Università di Modena e
Reggio Emilia, Modena, Italy
Andreas Harth Karlsruhe Institute of Technology, Institute AIFB, Karlsruhe,
Germany
Andreas Hartl Institute for Informatics, University of Munich, München,
Germany
Oktie Hassanzadeh University of Toronto, Toronto, Ontario, Canada
Steffen Hausmann Institute for Informatics, University of Munich, München,
Germany
Jan Hidders Faculty of Electrical Engineering Mathematics and Computer
Science, Delft University of Technology, Delft, The Netherlands
Aidan Hogan Digital Enterprise Research Institute, National University of Ireland,
Galway, Ireland
Ekaterini Ioannou University Campus – Kounoupidiana, Technical University of
Crete, Chania, Greece
Anja Jentzsch Web-based Systems Group, Freie Universität Berlin, Berlin,
Germany
Anastasios Kementsietsidis IBM T.J. Watson Research Center, Hawthorne, NY,
USA
Sheila Kinsella Digital Enterprise Research Institute, National University of
Ireland, Galway, Ireland
Contributors                                                                     xi

Lipyeow Lim University of Hawaii at Manoa, Honolulu, HI, USA
Yongming Luo Department of Mathematics and Computer Science, Eindhoven
University of Technology, Eindhoven, The Netherlands
Antonio Maccioni University Roma Tre, Rome, Italy
Pablo N. Mendes Web-based Systems Group, Freie Universität Berlin, Berlin,
Germany
Paolo Merialdo Dipartimento di Informatica e Automazione, Università degli
Studi Roma Tre, Rome, Italy
Renée J. Miller University of Toronto, Toronto, Ontario, Canada
Enrico Minack L3S Research Center, Hannover, Germany
Roberto Mirizzi Dipartimento di Elettrotecnica ed Elettronica, Politecnico di Bari,
Bari, Italy
Wolfgang Nejdl L3S Research Center, Hannover, Germany
Claudia Niederée L3S Research Center, Hannover, Germany
Paolo Papotti Dipartimento di Informatica e Automazione, Università degli Studi
Roma Tre, Rome, Italy
François Picalausa Université Libre de Bruxelles, Brussels, Belgium
Andreas Pieris Department of Computer Science, University of Oxford,
Oxford, UK
Axel Polleres Siemens AG Österreich, Vienna, Austria
Digital Enterprise Research Institute, National University of Ireland, Galway,
Ireland
Azzurra Ragone Dipartimento di Elettrotecnica ed Elettronica, Politecnico di
Bari, Bari, Italy
Exprivia S.p.A., Molfetta, BA, Italy
Silvia Rota Dipartimento di Ingegneria dell’Informazione, Università di Modena
e Reggio Emilia, Modena, Italy
Wolf Siberski L3S Research Center, Hannover, Germany
Riccardo Torlone University Roma Tre, Rome, Italy
Raquel Trillo Informatica e Ing. Sistemas, Zaragoza, Spain
Jürgen Umbrich Digital Enterprise Research Institute, National University of
Ireland, Galway, Ireland
Stijn Vansummeren Université Libre de Bruxelles, Brussels, Belgium
xii                                                          Contributors

Yannis Velegrakis University of Trento, Trento, Italy
Min Wang HP Labs China, Beijing, China
Klara Weiand Institute for Informatics, University of Munich, München,
Germany
Gideon Zenz L3S Research Center, Hannover, Germany
Xuan Zhou Renmin University of China, Beijing, China
You can also read