Enabling the Geospatial Semantic Web with Parliament and GeoSPARQL

Page created by Ricardo Hansen
 
CONTINUE READING
Semantic Web 0 (0) 1                                                                                                                  1
IOS Press

Enabling the Geospatial Semantic Web with
Parliament and GeoSPARQL
Editor(s): Krzysztof Janowicz, University of California, Santa Barbara, US
Solicited review(s): Sven Schade, European Commission – Joint Research Centre; Jens Lehmann, University of Leipzig, Germany; Boyan
Brodaric, Geological Survey of Canada, Canada

Robert Battle, Dave Kolas
Knowledge Engineering Group, Raytheon BBN Technologies
1300 N 17th Street, Suite 400, Arlington, VA 22209,
USA
E-mail: {rbattle,dkolas}@bbn.com

Abstract. As the amount of Linked Open Data on the web increases, so does the amount of data with an inherent spatial context.
Without spatial reasoning, however, the value of this spatial context is limited. Over the past decade there have been several
vocabularies and query languages that attempt to exploit this knowledge and enable spatial reasoning. These attempts provide
varying levels of support for fundamental geospatial concepts. GeoSPARQL, a forthcoming OGC standard, attempts to unify
data access for the geospatial Semantic Web. As authors of the Parliament triple store and contributors to the GeoSPARQL spec-
ification, we are particularly interested in the issues of geospatial data access and indexing. In this paper, we look at the overall
state of geospatial data in the Semantic Web, with a focus on GeoSPARQL. We first describe the motivation for GeoSPARQL,
then the current state of the art in industry and research, followed by an example use case, and finally our implementation of
GeoSPARQL in the Parliament triple store.

Keywords: GeoSPARQL, SPARQL, RDF, Parliament, geospatial, geospatial data, query language, geospatial query language,
geospatial index

1. Introduction                                                         ingful query, like "What parks are within 3km of the
                                                                        Washington Monument?", depends on how the data is
   Geospatial data is increasingly being made avail-                    represented, whether all of the resources are related to
able on the Web in the form of datasets described us-                   the Washington Monument, and if that relationship is
ing the Resource Description Framework (RDF). The                       explicit.
principles of Linked Open Data, detailed in [5], en-                       In this paper, we discuss an emerging standard, Geo-
courage a set of best practices for publishing and con-                 SPARQL [24] from the Open Geospatial Consortium
necting structured data on the Web. Linked Open Data                    (OGC)1 . This standard aims to address the issues of
promotes the use of the SPARQL Protocol and RDF                         geospatial data representation and access. It provides
Query Language (SPARQL) and RDF to query and                            a common representation of geospatial data described
model data. While this is useful for querying for re-                   using RDF, and the ability to query and filter on the
lationships that are explicitly represented in data, im-                relationships between geospatial entities. First, we in-
plicit relationships, such as geospatial relationships,
cannot easily be queried. For instance, datasets may                      1 Dave Kolas is a co-chair for the GeoSPARQL Standards Work-
exist that describe monuments and parks, but being                      ing Group, and Robert Battle has worked on the Parliament imple-
able to link these datasets based on their undeclared                   mentation and provided feedback to the development of the stan-
relationships is difficult. The ability to answer a mean-               dard.

1570-0844/0-1900/$27.50 c 0 – IOS Press and the authors. All rights reserved
2

troduce some geospatial concepts that are critical to        2.2. Coordinate Reference Systems
understanding some of the design choices for Geo-
SPARQL. We then describe topological relationships              An important part of the metadata associated with a
that are important to understand when designing a lan-       geometry is its coordinate reference system (CRS) (al-
guage for querying between spatial entities. Next, we        ternatively known as its spatial reference system). The
describe the motivation for GeoSPARQL, the current           elements of a coordinate reference system provide con-
state of the art in modeling and querying geospatial         text for the coordinates that define a geometry in order
data in the Semantic Web, and we introduce Geo-              to accurately describe their position and establish rela-
SPARQL. As authors of the Parliament2 triple store,          tionships between sets of coordinates. There are four
we are particularly interested in the capabilities that      parts that make up a CRS: a coordinate system, an el-
GeoSPARQL provides. We describe an implementa-               lipsoid, a datum, and a projection.
tion of a GeoSPARQL spatial index using Parliament              A coordinate system describes a location relative to
and a use-case that illustrates a simple example of data     some center. A geocentric coordinate system places the
integration with GeoSPARQL.                                  center at the center of the Earth and uses standard X,
                                                             Y, Z ordinates. A geographic (or geodetic) coordinate
                                                             system uses a spherical surface to determine locations.
2. Geospatial Concepts                                       In such a system, a point is defined by angles measured
                                                             from the center of the Earth to a point on the surface.
                                                             These are also known as latitudes (horizontal) and lon-
   Some basic understanding of geospatial concepts is
                                                             gitudes (vertical). A Cartesian coordinate system is a
required for discussion of GeoSPARQL. The follow-
                                                             flat coordinate system on the surface. It enables quick
ing sections define some of the terms used throughout
                                                             and accurate measurements over small distances and is
the rest of this paper.
                                                             useful for applications such as surveying.
                                                                An ellipsoid defines an approximation for the center
2.1. Features and Geometries
                                                             and shape of the Earth. A datum defines the position
                                                             of an ellipsoid relative to the center of the earth. This
   Features and geometries are two fundamental con-          provides a frame of reference for measuring locations
cepts of geospatial science. A feature is simply any en-     and, for local datums, allows for accurate locations to
tity in the real world with some spatial location. This      be defined for the valid area of the datum. WGS84 is a
could be a park, an airport, a monument, a restaurant,       datum that is widely used by GPS devices that approxi-
etc. A feature can have a spatial location that cannot       mates the entire world. Geographic coordinate systems
be precisely defined, such as a swamp or a mountain          use an Earth based datum that transforms an ellipsoid
range. A geometry is any geometric shape, such as a          into a representation of the Earth.
point, polygon, or line, and is used as a representation        In order to create a map of the Earth, it must be pro-
of a feature’s spatial location. For instance, Reagan        jected from a curved surface onto the plane. This pro-
National Airport is a geospatial feature because it is an    jection will distort the surface in some fashion which
entity that has a specific location in the world. It has a   will mean that the coordinates for some locations are
geometry that is a point with coordinates 38.852222, -       more accurate than those in another. Some projections
77.037778 (in the WGS843 datum). Geometries can be           will preserve area, so the size of all objects is relative,
measured at varying resolutions, from a simple point         while others preserve angles, and others try to do both.
in the center of a feature to a complex, precise mea-        A coordinate system projected onto a plane enables
surement of a feature’s entire border. Spatial data typi-    faster performance, as Cartesian calculations require
cally separates features from geometries, although that      fewer resources than Spherical calculations. Computa-
is not always the case.                                      tions across the plane, however, are inaccurate when
                                                             they deal with large areas as the curvature of the Earth
    2 http://parliament.semwebcentral.org                    is not taken into account.
  3 http://en.wikipedia.org/wiki/World_                         The combination of these elements defines a CRS.
Geodetic_System                                              One common source of well defined coordinate refer-
3

                                                                                       Table 1
ence systems is the European Petroleum Survey Group
                                                              Simple Features, Egenhofer and RCC8 relations equivalence
(EPSG)4 .
                                                               Simple Features    Egenhofer             RCC8
2.3. Topological Relationships                                 equals             equal                 EQ
                                                               disjoint           disjoint              DC
   All spatial entities are inherently related to some         intersects         ¬disjoint             ¬DC
other spatial entity. Whether two entities intersect           touches            meet                  EC
somehow or are thousands of miles apart, the relation-         within             inside + coveredBy    NTPP + TPP
ship that they share can be described and evaluated.           contains           contains + covers     NTPPi + TPPi
   In [28], Randell et al. describe an interval logic for      overlaps           overlap               PO
reasoning about space using a simple ontology that de-
fines functions and relations for expressing and rea-       geospatial Semantic Web data is represented in a con-
soning over spatial regions. This logic is referred to as   sistent logical way. With the input and acceptance of
Region Connection Calculus (RCC). A subset of RCC,          all of both knowledge base vendors and data produc-
RCC8, defines eight mutually exhaustive pairwise dis-       ers and consumers, GeoSPARQL has the potential to
joint relations which can be used to imply the rest of      unify geospatial RDF data access.
the relations in RCC. These eight base relations are:          Geospatial reasoning is critical in a large number
  1. DC(x, y) (x is disconnected from y)                    of application domains (emergency response, trans-
  2. x = y (x is identical with y)                          portation planning, hydrology, land use, etc.). Users in
  3. PO(x,y) (x partially overlaps y)                       these domains have long utilized relational databases
  4. EC(x,y) (x is externally connected with y)             with spatial extensions [9]. These spatially extended
  5. TPP(x,y) (x is a tangential proper part of y)          databases have given the combination of efficient, sta-
  6. NTPP(x,y) (x is a non-tangential proper part of        ble storage and retrieval of data with geospatial calcu-
     y)                                                     lation and indexing. This allows questions like "Which
  7. TPPi(x,y) (y is a tangential proper part of x)         students live within 2km of the school they attend?" to
  8. NTPPi(x,y) (y is a non-tangential proper part of       be answered efficiently.
     x)                                                        Within the last decade, RDF storage solutions have
                                                            become increasingly popular. These knowledge bases,
   The same set of eight geospatial topological rela-       sometimes called triple stores, are capable of better
tions is described with different names by Egenhofer in     handling several types of problems at which relational
[8], and includes the capacity to describe the relation-    databases struggle or are not intended to perform:
ship between different dimensioned geometries. This         queries with many joins across entities [32], queries
model was later generalized in the Nine Intersection        with variable properties [32], and ontological inference
Model [11]. The Open Geospatial Consortium Sim-             on datasets. These features lend themselves towards
ple Feature Access Common Architecture specifica-           problems that involve data exploration, linkage across
tion [25] uses the Nine Intersection Model introduced       datasets, and abstraction from low level data.
by Egenhofer to describe spatial relations for use in ge-      Because of RDF stores’ ability to do inference and
ographic access systems. Table 1 illustrates the equiv-     easily link data sets, they have been of growing inter-
alence between all of these spatial relations.              est to the geospatial data community. Often geospa-
                                                            tial domains have complicated type hierarchies which
                                                            cannot be fully expressed in current geospatial infor-
3. Motivation for GeoSPARQL                                 mation systems. For instance, a river is both a water-
                                                            way and a transportation route. Also, geospatial do-
   The Open Geospatial Consortium is a non-profit           main problems often require marrying multiple data
standards organization focused on geospatial data. The      sources together to solve a particular problem. In emer-
OGC is composed of members of industry, academic            gency response scenarios, population data, transporta-
institutions, and government organizations. By stan-        tion data, and even realtime police and fire data must
dardizing GeoSPARQL within the OGC, we seek to              be combined to deliver a timely result. Combining data
leverage the experience of its members to ensure that       sources on the web is useful to consumers as well;
                                                            geospatial data about points of interest combined with
  4 http://www.epsg-registry.org                            hotel information and travel route information could
4

lead to significantly more sophisticated travel planning    for features where the geometries are either unknown
than currently exists.                                      or cannot be made concrete [13]. For example, if there
   As such, it was inevitable that those people inter-      are assertions that a monument is inside a park, and
ested in expressing geospatial data on the Semantic         the park is inside a city, a qualitative reasoning sys-
Web would want to combine the spatial indexing and          tem should be able to infer that the monument is in
calculation of spatial databases with the inferential       the city through transitivity. Hybrid versions of these
power and data linkage of RDF triple stores [16]. This      systems are also possible, where some features have
has been done by many groups in varying ways for            concrete geometries and others have abstract geome-
varying purposes, as will be discussed later.               tries. By sharing a set of terms for topological rela-
   To provide geospatial reasoning and querying in          tions, GeoSPARQL allows conclusions from quantita-
a triple store, the implementors must define both an        tive applications to be used by qualitative systems and
ontology for representing spatial objects and query         a single query language for both types of reasoning.
predicates for retrieving these spatial objects. How-
ever, each organization that has attempted this task
has approached it in a slightly different way. As a re-     4. Geospatial RDF: State of the Art and Related
sult, spatial RDF data that would be properly indexed          Work
and queryable in one implementation would simply be
treated as plain RDF data in another implementation.           Over the past decade, there have been many dif-
   This is the primary problem which the standard-          ferent attempts to create a geospatial RDF standard.
ization of GeoSPARQL attempts to solve. The Geo-            Several different organizations, including the W3C, re-
SPARQL language defines both a small ontology for           search groups, and triple store vendors have created
representing features and geometries and a number of        their own ontologies and strategies for representing
SPARQL query predicates and functions. All of these         and querying geospatial data.
are derived from other OGC standards so that they are          In 2003, a W3C Semantic Web Interest Group cre-
well grounded and understood. Using the new standard        ated the Basic Geo Vocabulary [31] which provided
should ensure two things: (1) if a data provider uses       a way to represent WGS84 points in RDF. This work
the spatial ontology in combination with an ontology        was extended from 2005 through 2007 by the W3C
of their domain, that data can be properly indexed and      Geospatial Incubator Group [21] to follow the GeoRSS
queried in spatial RDF stores; and (2) compliant RDF        [23] feature model to allow for the description of
triple stores should be able to properly process the ma-    points, lines, rectangles, and polygon geometries and
jority of spatial RDF data.                                 their associated features. This group produced the
   Aside from providing the ability to perform spa-         GeoOWL ontology5 which provides a detailed and
tial queries, the small ontology portion of the Geo-        flexible model for representing geospatial concepts.
SPARQL specification is intended to provide an in-          These ontologies were produced as products from their
terchange format for geospatial data in a wide variety      respective groups within the W3C as the result of col-
of use cases. The ontology is meant to be attached to       laborations across university and industry partners.
other ontologies for various domains, providing only           There are published datasets that use the W3C vo-
the bare spatial aspects. It is intended to be simple       cabularies [3] and a variety of triple stores that support
enough to cover the most light-weight uses, and scale       data represented by these ontologies [17,22]. However,
in complexity for complex use cases. GeoSPARQL              the associated working groups never moved beyond
goes a long way to solving one of the research issues       the incubator state and the respective ontologies never
posed by Egenhofer in [10], namely that "we need a          became official W3C recommendations. These ontolo-
plausible canonical form in which to pose geospatial        gies also only work with data in the WGS84 datum.
data queries".                                              In order to be valid, data in any other CRS must be
   GeoSPARQL is intended to inter-operate with both         projected which can lead to inaccuracy in the data.
quantitative and qualitative spatial reasoning systems.        Support for spatial data in triple stores is mixed.
A quantitative spatial reasoning system involves con-       Several vendors support spatial data, but not all ven-
crete geometries for features. With these concrete ge-
ometries present, distances and topological relations         5 http://www.w3.org/2005/Incubator/geo/
can be explicitly calculated. Qualitative geospatial rea-   XGR-geo-20071023/W3C_XGR_Geo_files/geo_2007.
soning systems allow RCC type topological inferences        owl
5

dors support the same representation of data or share       fined in the OGC Simple Features Access [25]. Lit-
the same support for relational queries. Some triple        eral datatypes are introduced for Well-Known Text
stores use the aforementioned W3C ontologies, while         (WKT), Well Known-Binary (WKB), and their com-
others have invented their own.                             pressed forms while the spatial relations that PostGIS
   Parliament, the high performance [18] triple store       supports are implemented as SPARQL filter functions.
from Raytheon BBN Technologies, provided the first             In addition to vendor supported options, there has
geospatial index for semantic web data in 2007 [16,         been significant community and research interest in-
17]. This index supports data in the GeoOWL ontology        representing and querying geospatial data in the last
and introduces ontologies for querying spatial data via     few years. Perry proposed an extension to SPARQL,
RDF properties that correspond to the RCC spatial re-       SPARQL-ST in [26]. This introduces a modified
lations and OGC Simple Features relations. However,         SPARQL syntax for posing spatial queries to data that
data in the W3C Basic Geo vocabulary is unsupported.        is modeled in an upper ontology based on GeoRSS. A
   Ontotext’s OWLIM-SE6 triple store can index point        focus on describing data and metadata such as the CRS
data represented in the W3C Basic Geo Vocabulary            for geometries allows SPARQL-ST to operate with
[22]. The only spatial relationship that can be queried     data of different systems which is something that is
is whether a point is contained within a circle or poly-    lacking from many vocabularies such as GeoOWL and
gon. The ability to query for relationships between         the Basic Geo vocabulary. This increased flexibility
higher order geometries such as lines and polygons is       with data is hindered, however, by the proposed query
not supported.                                              syntax which deviates from the standard SPARQL lan-
   OpenLink Virtuoso7 also has support for the W3C          guage. Any data that is in the SPARQL-ST format can
Basic Geo Vocabulary. A SPARQL function is pro-             only be accessed by a SPARQL-ST system.
vided to convert a pair of latitude and longitude prop-        Taking a simpler approach, Zhai et al., in [34] and
erty values into a point geometry. A special literal        [33], discuss the need for adding topological predicates
datatype, virtrdf:Geometry, is also provided for            to SPARQL. The OGC Simple Features relations and a
indexing point literals. Support for testing intersection   subset of geometries are used as the basis for their on-
and containment relationships is provided via property      tologies. Unfortunately, the relations have to be specif-
functions. Through a combination of these relations         ically encoded in RDF and the data cannot support
and their negations, most of the Egenhofer relations        multiple coordinate reference systems.
                                                               Another approach described by Koubarakis and
can be tested. However, some relations, such as testing
                                                            Kyzirakos in [19] is based on research from the con-
for overlap, cannot be supported in Virtuoso.
                                                            straint database community. Constraint databases are
   Other triple stores take a completely different ap-
                                                            a promising technology for integrating spatial data
proach. As described in [12], Franz AllegroGraph8 de-
                                                            [4]. The authors propose to enrich the Semantic Web
fines a custom datatype to represent geospatial data
                                                            with spatial and temporal data by extending RDF and
and map it to a "strip" of space in the index that con-
                                                            SPARQL with constraints. The proposed extension to
tains the data. A modified SPARQL syntax provides a
                                                            RDF, stRDF, uses typed literals to describe a semi-
new GEO operator where the geospatial aspect of the
                                                            linear point set. The query language, stSPARQL, ex-
query is defined.
                                                            tends SPARQL to includes additional operators for
   OpenSahara9 provides a service for adding exter-
                                                            querying RCC relationships and introduces a new syn-
nal indexing and geospatial querying capabilities to
                                                            tax for specifying spatial variables. Support for mul-
any triple store with a Sesame10 Sail layer. The im-        tiple coordinate reference systems is not discussed.
plentation is a wrapper for the PostgreSQL11 database       Compatibility with existing data is also a problem as
with PostGIS spatial extensions12 and as such, Open-        not all data is represented as a semi-linear point set.
Sahara supports all of the geometries and relations de-     stSPARQL is intentionally not compatible with OGC
                                                            standards as the authors believe that the semi-linear
  6 http://www.ontotext.com/owlim/editions                  point set can be used to describe different geometries
  7 http://www.openlinksw.com
                                                            without forcing a hierarchy of datatypes on users. In
  8 http://www.franz.com/agraph/allegrograph
  9 http://www.opensahara.com
                                                            [20], stSPARQL and stRDF are described with an ad-
  10 http://www.openrdf.org                                 ditional context of GIS applications. The authors con-
  11 http://www.postgresql.org                              trast stRDF and stSPARQL with prior work (such as
  12 http://postgis.refractions.net                         GeoOWL) and note that it is hard to find information
6

about geometries when you do not know what type of           the triple store is able to efficiently store and query
geometries will be in the answer set and what prop-          data. Once again, the OGC Simple Features relations
erties to ask about. This is a problem with represen-        are used as the basis for posing queries for the spatial
tations like GeoOWL where different geometries have          relation. These relations are mapped to SPARQL filter
different properties. The semi-linear point set avoids       functions which allow for direct comparison between
this by encapsulating the representation into a single       spatial literals. This implementation is similar to the
datatype.                                                    approach taken by Parliament and OpenSahara in the
   The NeoGeo Vocabulary [29] is a vocabulary that           way that the SPARQL language itself does not need to
arose from the NeoGeo community13 , the Linked Data          be modified in order to query for the spatial relations
community, and several universities. They recognized         between entities.
the need for a well formed standard representation
for geospatial data and provided one that is based on
                                                             5. Introduction to GeoSPARQL
the Geography Markup Language (GML) Simple Fea-
tures Profile14 . To describe spatial relations, an ontol-
                                                                As illustrated above, many groups have created on-
ogy based on RCC8 is also provided. A limitation of
                                                             tologies and query predicates to make indexing and
the NeoGeo approach is the need to represent each co-        query of geospatial data possible. However, since there
ordinate as a resource. In particular, polygons and lines    are many of these and they all differ slightly, data that
are represented with an RDF collection of Basic Geo          can be spatially queried in one knowledge base may
points. While this does allow points to be shared across     not be able to be spatially queried in another. Geo-
geometries, it significantly increases the verbosity of      SPARQL provides a standard for geospatial RDF data
the data. Unfortunately, this extra verbosity does not       insertion and query, which covers the use cases of the
result in particular gains in expressive power, since        other previous approaches.
each point in a polygon provides little value in isola-         The GeoSPARQL specification attempts to enable a
tion and RDF list contents are difficult to query for in     wide range of geospatial query use cases, from sim-
SPARQL 1.0. Moreover this prevents geometry liter-           ple points of interest knowledge bases to detailed au-
als from being compared easily in SPARQL filter func-        thoritative geospatial data sources for transportation.
tions. The ability to use content negotiation to retrieve    Moreover, use of GeoSPARQL for both of these types
other formats for geometries mitigates these issues for      of data should enable the data sets to be easily used
some applications. A nice thing about this approach is       together. In order to achieve this goal, different con-
that it does not require any complex literals to parse,      formance classes are provided. This means that a sim-
and thus may be attractive to Semantic Web practition-       ple knowledge base implementation intended for sim-
ers not familiar with geospatial data formats.               ple use cases need not implement all of the more ad-
   In [15], Hu and Du compose a three level hierar-          vanced reasoning capabilities of GeoSPARQL, such as
chical spatiotemporal model: a meta level for abstract       quantitative reasoning or query rewriting.
space-time knowledge, a schema level for well-known             There are also several sets of terminology for the
models in spatial and temporal reasoning (e.g., RCC,         topological relationships between geometries. Rather
Allen time[1]), and an instantiations level that pro-        than mandate that all implementations use the same
vides mappings and formal descriptions of the various        set of terminology, each implementation can choose
ground spatiotemporal statements in the Linked Data          which sets of terms to support. This is discussed fur-
clouds. This meta approach provides a convenient way         ther in the section on GeoSPARQL relationships.
to abstract out spatial knowledge from its underlying           GeoSPARQL attempts to address the problems with
representation. Mappings, however, have to be defined        the disparate and incompatible implementations for
for each dataset at the instantiations level.                representing and querying spatial data. It achieves this
   A complete implementation of integrating spatial          by defining an ontology that closely follows the exist-
data and queries into an RDF triple store is described       ing standards work from the OGC with regard to spa-
by Brodt et al. in [7]. By typing WKT string repre-          tial indexing in relational databases.
sentations of geometry literals with a spatial datatype,        The GeoSPARQL specification contains three main
                                                             components:
    13 http://sites.google.com/site/neogswvocs/                1. The definition of a vocabulary to represent fea-
    14 http://www.ogcnetwork.net/gml-sf                           tures, geometries, and their relationships.
7

   2. A set of domain-specific, spatial functions for                  such as point, polygon, curve, arc, and multicurve. The
      use in SPARQL queries.                                           geo:asWKT and geo:asGML properties link the ge-
   3. A set of query transformation rules.                             ometry entities to the geometry literal representations.
                                                                       Values for these properties use the sf:wktLiteral
5.1. GeoSPARQL Ontology                                                and gml:gmlLiteral data types respectively.

   The ontology for representing features and geome-                   5.2. GeoSPARQL Relationships
tries is fundamental to being able to build and query
spatial data. The ontology is based on the OGC’s Sim-                     GeoSPARQL also includes a standard way to ask for
ple Features model, with some adaptations for RDF.                     topological relationships, such as overlaps, between
Note that prefix definitions are omitted from exam-                    spatial entities. These come in the form of binary prop-
ples for clarity; a definition of all of the prefixes used             erties between the entities and geospatial filter func-
is at the end of the paper in listing 1515 . The ontol-                tions.
ogy includes a class geo:SpatialObject, with                              The topological binary properties can be used in
two primary subclasses, geo:Feature and geo:                           SPARQL query triple patterns like a normal prop-
Geometry. These classes are meant to be connected                      erty. Primarily they are used between objects of the
to an ontology representing a domain of interest. Fea-                 geo:Geometry type. However, they can also be
tures can connect to their geometries via the geo:                     used between geo:Features, or between geo:
hasGeometry property.                                                  Features and geo:Geometrys, if GeoSPARQL’s
   For example, an airport is a geo:Feature. It is                     query rewrite rules are supported (discussed in the
a conceptual thing that exists in the real world in a                  next section). The properties can be expressed using
particular place. In the real world, it has a location                 three distinct vocabularies: the OGC’s Simple Fea-
that corresponds to the coordinates of all of the infi-                tures, Egenhofer’s 9-intersection model, and RCC8.
nite points along the border of the airport area. This                 Which of these vocabularies is supported can be de-
real-world location has to be measured and estimated                   pendent on the triple store implementation, though it is
in some way. It is possible to do this measurement at                  likely that implementations will support all three. The
various resolutions, each of which may serve well for                  Simple Features topological relations include equals,
different purposes. A representation of the real world                 disjoint, intersects, touches, within, contains, overlaps,
                                                                       and crosses.
location which has been measured becomes a geo:
                                                                          The filter functions provide two different types
Geometry. Thus the airport may have several geo:
                                                                       of functionality. First, there are operator functions
Geometrys, ranging from a single point in the cen-
                                                                       which take multiple geometries as predicates and pro-
ter of the airport to an extremely detailed polygon that
                                                                       duce either a new geometry or another datatype as
closely follows the airport’s outside border. A geome-
                                                                       a result. An example of this is the function ogcf:
try that will function for most purposes within a dataset
                                                                       intersection. This function takes two geometries
can be specified as the geo:defaultGeometry.
                                                                       and returns a geometry that is their spatial intersec-
   GeoSPARQL includes two different ways to repre-
                                                                       tion. Other functions like ogcf:distance produce
sent geometry literals and their associated type hier-
                                                                       an xsd:double as a result. The second type of func-
archies: WKT and GML. An implementor of a spa-
                                                                       tionality is boolean topological tests of geometries.
tial triple store may choose to support either or both
                                                                       These come in the same three vocabulary sets as the
of these representations. GeoSPARQL provides differ-
                                                                       topological binary properties: simple features topo-
ent OWL classes for the geometry hierarchies associ-
                                                                       logical relations, Egenhofer relations, and RCC8 re-
ated with both of these geometry representations. This                 lations. These functions are partially redundant with
provides classes for many different geometry types                     the topological binary properties; however, the topol-
                                                                       ogy functions take the geometry literals as parameters,
   15 In order to be more relevent to the forthcoming finalized Geo-   while the binary properties relate geo:Geometry
SPARQL standard, this paper makes use of an updated OGC internal       and geo:Feature entities. This means that quanti-
draft of GeoSPARQL. While the functionality is the same as the pub-
                                                                       tative and qualitative applications can both make use
lic draft, the uniform resource identifiers (URIs) for GeoSPARQL
predicates and classes have been updated. We have chosen to use the    of the binary properties, but only quantitative applica-
newer URIs in this document in order to be more consistent with the    tions can make use of the topology functions. Also,
standard once it is released.                                          comparisons to concrete geometries provided in the
8

query can only be made via the functions. An example
of the topological functions is ogcf:intersects,                          Listing 2: Example Ontology
which returns true if two geometries intersect.
                                                             ex:Restaurant a owl:Class;
5.3. Query Transformation Rules                                 rdfs:subClassOf ex:Service .
                                                             ex:Park a owl:Class;
                                                                rdfs:subClassOf ex:Attraction .
   The query rewrite rules allow for an additional layer
                                                             ex:Museum a owl:Class;
of abstraction in SPARQL queries. While only con-
                                                                rdfs:subClassOf ex:Attraction .
crete geometry entities can be quantitatively com-
                                                             ex:Monument a owl:Class;
pared, it nonetheless sometimes makes sense to discuss
                                                                rdfs:subClassOf ex:Attraction .
whether two features have a particular topological re-
                                                             ex:Service a owl:Class;
lationship. This is represented in the natural language
                                                                rdfs:subClassOf ex:
question, "Is Reagan National Airport within Washing-
                                                                   PointOfInterest .
ton, DC?". Although Reagan National is referred to as
                                                             ex:Attraction a owl:Class;
a Washington DC airport, it is actually across the Po-
                                                                rdfs:subClassOf ex:
tomac river in Virginia. In GeoSPARQL, feature to fea-             PointOfInterest .
ture and feature to geometry topological relations are       ex:PointOfInterest a owl:Class;
achieved by the combination of the use of the geo:              rdfs:subClassOf geo:Feature .
defaultGeometry property and the query rewrite
rules. If a geo:Feature is used as the subject or ob-
ject of a topological relation, the query is automatically
rewritten to compare the geo:Geometry linked as a
default, thus removing the abstraction for processing.                 Listing 3: Washington Monument
Listing 1 shows a query with a relationship between          ex:WashingtonMonument a ex:Monument;
geo:Feature objects before and after rewrite.                   rdfs:label "Washington Monument";
                                                                geo:hasGeometry ex:WMPoint .
          Listing 1: Query Rewrite Example                   ex:WMPoint a geo:Point;
                                                               geo:asWKT "POINT(-77.03524
#Before                                                           38.889468)"^^sf:wktLiteral.
ASK {
  ex:DCA a geo:Feature;
      geo:sfWithin ex:WashingtonDC .                         6. Using GeoSPARQL
  ex:WashingtonDC a geo:Feature .
}                                                               Consider an example using a points of interest ontol-
                                                             ogy in listing 2. We seek to represent points of interest
#After                                                       of various types (Monuments, Parks, Restaurants, Mu-
ASK {                                                        seums, etc.). These types of landmarks are represented
  ex:DCA a geo:Feature;                                      in a class hierarchy with a ex:PointOfInterest
      geo:defaultGeometry ?g1 .                              class at the top. These classes of course may include
  ex:WashingtonDC a geo:Feature ;                            many non-spatial attributes, but only a label is included
      geo:defaultGeometry ?g2 .                              here. All that is required to link this ontology with
  ?g1 geo:sfWithin ?g2 .                                     GeoSPARQL, and thus give its classes a geospatial ref-
}                                                            erence, is to make ex:PointOfInterest a sub-
                                                             class of geo:Feature. This may of course result in
   The goal of this feature is to provide a more intu-       the class having two parent classes, but this is compat-
itive approach to geospatial querying for use cases that     ible with RDFS and OWL reasoning.
do not require many different geometries, while still           If compliance with WKT is chosen, and latitudes
maintaining a concrete definition of this intuitive un-      and longitudes are expressed in WGS84 datum in a
derstanding. Compliant GeoSPARQL triple stores are           longitude latitude order (CRS:84), a record for the
not required to implement this feature.                      Washington Monument would look like listing 3. If a
9

coordinate reference system other than CRS:84 is de-
sired, that can be included within the literal. Listing 4                 Listing 5: Example Query 1
expresses the same point in WGS84 with latitude lon-
gitude order.                                                SELECT ?m ?p
                                                             WHERE {
                                                               ?m a ex:Monument ;
          Listing 4: Point in WGS84 datum                          geo:hasGeometry ?mgeo .
" POINT(38.889468                                   geo:hasGeometry ?pgeo .
   -77.03524)^^"sf:wktLiteral                                  ?mgeo geo:within ?pgeo .
                                                             }
While this representation may seem verbose, and the
literal string is no longer standard WKT, it has the ad-
vantage of encoding the CRS information directly into
the literal. This is all of the data needed to define a
geo:Geometry; without the CRS, another property                           Listing 6: Example Query 2
would need to be added onto the geo:Geometry in-             SELECT ?m ?p
stance, and that property would need to be read and          WHERE {
passed into filter functions as well. While an argument        ?p a ex:Park .
can be made that it produces a "cleaner" model if the          ?m a ex:Monument ;
CRS was associated with a geometry via a separate                 geo:within ?p .
predicate, GeoSPARQL focused on producing a com-             }
pact representations that could contain the entire de-
scription of a geometry within a single literal. For the
case of WKT, this necessitates adding the CRS specifi-
cation to the WKT literal string. For the GML confor-
mance class, the CRS information is already encoded                       Listing 7: Example Query 3
in the GML string so no changes are required.
                                                             SELECT ?a
   Now we will look at a few example GeoSPARQL
                                                             WHERE {
queries using this data. One potential query over this
                                                               ?a a ex:Attraction;
dataset would be, "Which monuments are contained
                                                                   geo:hasGeometry ?ageo .
within a park?" This query requires a topological com-
                                                               FILTER(geof:within(?ageo,
parison between the geometries of the monuments and
                                                                   "POLYGON((
the geometries of parks. We show it in listing 5 using
                                                             -77.089005 38.913574,
the binary topology property geo:within. The two
                                                             -77.029953 38.913574,
entities have type statements, geo:hasGeometry
                                                             -77.029953 38.886321,
properties to tie them to their geometries, and then the
                                                             -77.089005 38.886321,
geo:within function to tie them together.
                                                             -77.089005 38.913574
   If the knowledge base being used supported the
                                                             ))"^^sf:wktLiteral))
query rewriting rules, and the data set included de-
                                                             }
fault geometries, the first query could be rewritten even
more simply using a feature-to-feature topological re-
lationship. This method is demonstrated in listing 6.
   Spatial user interfaces often need to look for entities   tion entity and its attached geometry, and the geometry
of a particular type that fall within an explicit bounding   is compared with the filter function geof:within.
box. Consider the query, "What attractions are within        This query is shown in listing 7. Note that the bound-
the bounding box defined by (-77.089005, 38.913574)          ing box is expressed as a polygon.
and (-77.029953, 38.886321)?" Because we need to                Queries looking for entities within a particular dis-
specify an explicit geometry in the query, we need to        tance of either other entities or a current location are
compare to it using the topological filter functions as      extremely useful as well. "Which parks are within
opposed to the binary properties. We have the attrac-        3km of the Washington Monument?" can be easily ex-
10

                                                            lows GeoSPARQL data to be indexed and provides a
             Listing 8: Example Query 4                     query engine that supports GeoSPARQL queries.
                                                               Parliament has a modular architecture that enables
SELECT ?p                                                   indexes to be built on top of the storage engine. One of
WHERE {                                                     these indexes is a spatial index based on a standard R-
 ?p a ex:Park ;                                             tree implementation [14]. In a similar approach to [7],
  geo:hasGeometry ?pgeo .                                   the spatial index is integrated in a way that provides na-
 ?pgeo geo:asWKT ?pwkt .                                    tive support for spatial relational queries and efficient
                                                            storage of the data. The general goal for this index is to
    ex:WashingtonMonument                                   split SPARQL queries with geospatial information into
     geo:hasGeometry ?wgeo .                                multiple parts, allowing for an optimized query plan
    ?wgeo geo:asWKT ?wwkt .                                 between the spatial components of the query and the
                                                            components with non-spatial triples to be executed.
    FILTER(geof:distance(?pwkt, ?wwkt,                         Before the emergence of GeoSPARQL, Parliament
       units:m) < 3000)                                     indexed data represented in GeoOWL and allowed
}                                                           RCC8 and OGC Simple Features relations to be
                                                            queried [16,17]. We are currently implementing Geo-
                                                            SPARQL based on the public candidate draft standard,
pressed in GeoSPARQL. We assume the same URI for            and we describe this implementation in the following
the Washington Monument in the data example above.          section.
We need to retrieve the two geometries and use the
function geof:distance to calculate the distance            6.2.1. Index Specification
between them. A standard SPARQL less than func-                The index interfaces in the Parliament API include
tion is then applied. This query is shown in listing 8.     methods for building a record from data, adding and
These example queries require relatively little in terms    removing records, finding a record by URI, and finding
of non-spatial constraints, but serve to illustrate some    records by value. Existing indexes include the afore-
basic functionality with GeoSPARQL. With a more             mentioned GeoOWL spatial index, a temporal index
complex ontology, queries could include more compli-        (for indexing OWL Time16 ), and a basic numeric in-
cated thematic elements as well.                            dex (for optimizing range queries on numeric property
                                                            values).
6.1. Exploiting Geospatial Data                                The Parliament triple store is built with support for
                                                            Jena’s RDF Application Programming Interface (API)
    Storing geospatial data just as RDF triples does not    and ARQ SPARQL query engine17 . An implementa-
allow for the spatial exploitation of that data. In order   tion of Jena’s graph interface18 provides access to the
to be able to efficiently query for the relationships be-   base graph store with support for adding, removing,
tween spatial entities, the data must be indexed. This      and finding triples. By attaching a listener19 to the
allows only those resources that match the spatial com-     graph, the addition and removal of triples can be de-
ponent of a query to be retrieved, rather than spatially    tected and forwarded to any associated indexes. Parlia-
filtering all bindings that match a given query. The rel-   ment’s GeoSPARQL index listens for triples that con-
ative performance advantages of using a spatial index       tain the geo:asWKT or geo:asGML predicates. Any
versus spatially filtering a result set is discussed by     triple that is added to the graph is checked to see if it
Brodt et al. in [7].                                        matches. At this point, the index can create a record
                                                            for the geometry that is represented in the object of the
6.2. Parliament and GeoSPARQL                               triple and insert it into the index. For instance, when
                                                            adding the triples in listing 3 to Parliament, the index
   In order to take advantage of data represented in
                                                              16 http://www.w3.org/TR/owl-time/
GeoSPARQL, a SPARQL endpoint needs to under-
                                                              17 http://www.openjena.org
stand the GeoSPARQL ontology and provide support              18 http://openjena.org/javadoc/com/hp/hpl/
for one (or more) of the conformance classes. The Par-      jena/graph/Graph.html
liament triple store provides one such implementation         19 http://openjena.org/javadoc/com/hp/hpl/

(based on a draft version of the specification) that al-    jena/graph/GraphListener.html
11

         Listing 9: Property Function Query                      Listing 10: Optimization Example Query
SELECT ?x                                                  SELECT ?m
WHERE {                                                    WHERE {
  ?x apf:concat( "Hello", " ", "                             ?m a ex:Monument ;
     World") .                                                   geo:hasGeometry ?mgeo .
}                                                            ?mgeo geo:within ex:
                                                                NationalMallGeometry .
                                                           }
will generate a single record that contains a reference
to the resource ex:WMPoint and its WKT value.
   In order to query for data in an index, we have            In this example, there are two sub patterns: the in-
extended the ARQ query engine to support property          dex property function, and the basic graph pattern for
functions that can access indexes. When ARQ parses         ?m. Each sub pattern estimates how many results that
a SPARQL query, it detects the different operators in      it will be able to provide. For operations within the
the query. A particularly useful feature of ARQ is the     spatial index, a bounding box query can be performed
support for property functions. Instead of matching a      to estimate how many results will be returned. In this
triple in a graph, property functions can execute cus-     example, the index will look up the bounding box for
tom code in the context of the SPARQL query. Con-          ex:NationalMallGeometry and calculate how
sider the query in listing 9. It will yield a result set   many items it contains. For basic graph patterns, Par-
with a single binding for ?x by concatenating all of the   liament keeps statistics on the triples it contains and
                                                           can quickly estimate how many matches are in the
arguments together to form the literal "Hello World"
                                                           store for a given triple pattern. Sub patterns containing
instead of attempting to match the triple pattern. Par-
                                                           basic graph patterns use these statistics to estimate how
liament implements the GeoSPARQL spatial relations,
                                                           many triples will be bound by the pattern. After each
such as geo:intersects, as property functions.
                                                           sub pattern has an estimate, they are ordered in ascend-
                                                           ing order executed accordingly. In this case, if there
6.3. Optimizing Query Execution
                                                           were 500 monuments with geometries, but only 100
                                                           geometries within the bounding box, the index prop-
    By using an index, a query can be optimized such       erty function would be executed first. If, however, there
that the most selective part is executed first. Con-       were 500 geometries within the bounding box, but only
sider the query in listing 10. This query is asking for    10 monuments exist in the triple store, the basic graph
all monuments within the National Mall. Parliament’s       pattern would execute first and the spatial relationship
query optimizer splits the query into blocks for execu-    would be tested for each of the 10 results.
tion based on how the variables in the query are used         In addition to optimizing pattern order, optimiz-
and what special operations occur in the query. In this    ing spatial operations can provide significant per-
instance, since the predicate, geo:within, is an in-       formance benefits. Consider the query in listing 11.
dex property function, the triple containing the predi-    This query is deceptive in that it appears to be ask-
cate is considered as one query block. The rest of the     ing a simple geospatial question: "What are all the
query is a simple basic graph pattern containing two       geometries within 10km of the feature described by
triples describing ?m. The basic graph pattern is ana-     ?".
lyzed and partitioned so that no variable crosses par-     However, there is no way for Parliament’s query op-
titions. For this query, this generates two partitions.    timizer to know a priori what the distances between
When the query is executed, the Parliament query op-       geometries are. While some implementations of Geo-
timizer has two choices: (1) it can execute the spatial    SPARQL may be optimized to handle cases like this, in
part first and then match to the graph pattern, or (2)     Parliament this query would not use the spatial index
it can execute the graph pattern first, then execute the   at all; the geometries for all parks would be found be-
spatial operation. The optimizer will decide what to       fore checking their distance. Table 2 shows that there
do based on which path estimates it will provide the       are nine features matching this query. Listing 12 shows
fewest result bindings.                                    a more version of this query that is Parliament can op-
12

            Listing 11: Example Query 5                        Listing 12: Example Query 5 - Optimized
SELECT ?x                                                SELECT ?x
WHERE {                                                  WHERE {
 GRAPH  {                         GRAPH  {
                           
      geo:hasGeometry ?geo1 .                                    geo:hasGeometry ?geo1 .
  ?geo1 geo:asWKT ?wkt1 .                                    ?geo1 geo:asWKT ?wkt1 .
  ?x geo:hasGeometry ?geo2 .                                 BIND (geof:buffer(?wkt1, 10000,
  ?geo2 geo:asWKT ?wkt2 .                                       units:m) as ?buff) .
                                                             ?x geo:hasGeometry ?geo2 .
     BIND (geof:distance(?wkt1, ?wkt2,                       [ a geo:Geometry ;
        units:m) as ?distance) .                               geo:asWKT ?buff ] geo:sfContains
     FILTER (?distance < 10000)                                    ?geo2 .
    }                                                      }
}                                                        }

                        Table 2                          tial context attached. However, the vast majority of this
                 Example Query 5 Results                 geospatial context cannot be utilized for spatial queries
             x                                           because the hosting SPARQL endpoints cannot per-
             http://sws.geonames.org/4199542/            form them. In the following section, we discuss four
             http://sws.geonames.org/7242246/            datasets that are representative of the disparate types
             http://sws.geonames.org/4183291/            of data that can be integrated together via there spatial
             http://sws.geonames.org/4201877/            context. We then demonstrate this integration on two
             http://sws.geonames.org/4224307/            datasets using Parliament.
             http://sws.geonames.org/4192596/            6.4.1. Geospatial Data Sets
             http://sws.geonames.org/4212826/               GeoNames20 provides information for over eight
             http://sws.geonames.org/4192776/            million geospatial features. The data is exposed via an
             http://sws.geonames.org/4194405/            RDF webservice21 that exposes information on a per
                                                         resource basis. The geospatial aspect is represented us-
timize. This version buffers the geometry for the for    ing the W3C Basic Geo vocabulary. There is no abil-
 by                    ity, however, to perform any type of SPARQL query to
10000 meters and then uses that buffer to test the       retrieve data.
geo:sfContains relationship. Parliament can take            Another significant source of geospatial data is DB-
this query and run the spatial component against the     pedia22 . As described in [6], data from Wikipedia23
spatial index in order to reduce the amount of re-       is extracted into RDF. Many of these extracted enti-
sults that need to match the rest of the query. Run-     ties are geospatial in nature (cities, counties, countries,
ning the un-optimized query on our Parliament Geo-       landmarks, etc. . . ) and many of these entities already
SPARQL endpoint (described in the next section) takes    contain some geospatial location information. The data
6.967 seconds. The optimized version, however, runs      also contains owl:sameAs links between the DBpe-
in 0.047 seconds. A future version of the Parliament     dia and GeoNames resources. However, without any
query optimizer will be able to optimize queries like    spatial computation predicates, this geospatial infor-
listing 11 automatically.                                mation can only be retrieved "as is." A query like
6.4. GeoSPARQL and Linked Data                             20 http://www.geonames.org
                                                           21 http://www.geonames.org/ontology/
   As the Semantic Web materializes on the internet in   documentation.html
the form of Linked Data, there is an increasing amount     22 http://www.dbpedia.org

of structured data available with some sort of geospa-     23 http://www.wikipedia.org
13

"Show all cities within 50 miles of Arlington, VA with       can share data while providing access to geospatial se-
a population of at least 100,000 people in which at          mantics.
least one famous person was born" is not possible, even         For the following examples, we have processed a
though the data exists to support it.                        subset of the GeoNames RDF dataset26 and the USGS
   The linked data community has released the Linked-        GeoSPARQL data for Atlanta, GA. This data is acces-
GeoData data set [2]. This data set is a spatial knowl-      sible via a Parliament SPARQL endpoint with a Geo-
edge base, derived from Open Street Map24 and is             SPARQL spatial index27 . The queries in listings 11,
linked to DBpedia and GeoNames resources. It con-            12, and 14 can be executed against this data. The in-
tains over 200 million triples describing the nodes and      dex supports indexing data conforming to WKT and
paths from OpenStreetMap. The data set is accessible         GML serialization and GeoSPARQL queries including
via SPARQL endpoints running on the OpenLink Vir-            those with Simple Features, Egenhofer, and RCC8 re-
tuoso platform. A REST interface to LinkedGeoData            lations, the non-topological query functions and com-
is also provided.                                            mon topological query functions. Query rewriting is
   Another source of geospatial data is the United           not supported at this time.
States Geological Survey (USGS). The USGS has re-               As so much data is represented as individual lat-
leased a SPARQL endpoint25 for triple data derived           itude and longitude data using the W3C Basic Geo
from The National Map[30], a collaborative effort to         vocabulary, including that provided by GeoNames, it
deliver usable topographic information for the United        is desirable to be able to convert it easily into Geo-
States. This dataset is much more specific and special-      SPARQL. It is trivial to provide functions that take
ized than the data that is provided by DBpedia and           a latitude and longitude pair and convert them into
GeoNames. It includes geographic names, hydrogra-            a GeoSPARQL point. Parliament provides SPARQL
phy, boundaries, transportation, structures, and land        property functions, spatial:toWKTPoint and
cover. The group has attempted to follow the forthcom-       spatial:toGMLPoint which take a latitude, lon-
ing GeoSPARQL specification, though some aspects             gitude, and optional spatial reference system identi-
of GeoSPARQL have changed slightly since the data            fier as arguments and return a sf:wktLiteral or
has been published. Due to a lack of available Geo-          gml:gmlLiteral representation of a point. This
SPARQL triple stores, the published dataset includes         makes it possible to load and index existing spatial data
a pre-computation of all of the topological relation-        sets without having to regenerate existing RDF. Af-
ships between entities. A query such as "Show all rail       ter loading the GeoNames data into Parliament, it was
lines that cross rivers" is in fact possible to answer by    aligned with GeoSPARQL using the query in listing
looking at the current precomputed data. However, this       13. This query explicitly assigns GeoNames features to
means that if a new entity is added, the knowledge base      be GeoSPARQL features, creates a new geo:Point
needs to compute everything that the entity is related       resource containing the point information, and links
to and update those entities as well. If this data was not   the feature to the geometry.
precomputed, the only way to answer the query would             The USGS provides their RDF data in a format
be via indexing the data and querying the relations with     that is similar to GeoSPARQL. The geospatial data
a relationship predicate.                                    from The National Map contains polygon, polyline,
                                                             and point data. The data conforms to an earlier revision
6.4.2. Integration in Parliament via GeoSPARQL               of the GeoSPARQL standard and contains features
   GeoSPARQL provides the means to link geospa-              and geometries, but lacks typed literals for the geo:
tial datasets together, resulting in the possibility of      asWKT and geo:asGML property values. For this
new, meaningful entailments. As it is simply not pos-        data, the constructor functions for the GeoSPARQL lit-
sible to pre-compute all of the relations between all of     eral datatypes can be used to create correctly typed val-
the available geospatial datasets, enriching the existing    ues at query time. The existing spatial relations state-
datasets with GeoSPARQL representations, and creat-          ments in the dataset were ignored when processing the
ing indexes for the data is one way that data providers      data.

  24 http://www.openstreetmap.org                              26 http://download.geonames.org/
  25 http://usgs-ybother.srv.mst.edu:8890/                   all-geonames-rdf.zip
sparql                                                         27 http://geosparql.bbn.com
14

       Listing 13: GeoNames Conversion Query
CONSTRUCT {
                                                                   Listing 14: Atlanta Schools Near Rail Lines Query
  ?feature a geo:Feature ;
   geo:hasGeometry [                                              SELECT DISTINCT ?school ?distance
    a geo:Point ;                                                 WHERE {
    geo:asWKT ?wkt                                                 GRAPH  {
  ] .                                                               # approximate rectangle of Atlanta
}                                                                   BIND ("POLYGON((-84.445 33.7991,
WHERE {                                                                -84.445 33.7069,-84.331
  ?feature a gn:Feature ;                                              33.7069,-84.331
   wgs84_pos:lat ?lat ;                                                33.7991,-84.445 33.7991))"^^sf
   wgs84_pos:long ?long .                                              :wktLiteral AS ?place) .
   BIND (spatial:toWKTPoint(?lat, ?
      long) as ?wkt) .                                              ?school a gn:Feature ;
}                                                                    geo:hasGeometry ?school_geo ;
                                                                     gn:featureCode gn:S.SCH .

6.4.3. GeoSPARQL Data Query                                         ?school_geo geo:sfWithin [ a geo:
   Once GeoSPARQL data exists for both GeoNames                        Geometry; geo:asWKT ?place ] ;
and the USGS and is loaded into a GeoSPARQL ca-                       geo:asWKT ?school_wkt .
pable knowledge base, such as Parliament, interest-
ing geospatial questions can be posed. Consider the                 # buffer schools 100m
following query: "What are all the schools near At-                 BIND (geof:buffer(?school_wkt,
lanta, GA that are within 100 meters of a railway?"                    100, units:m) AS ?s_buff) .
GeoNames provides point data for buildings, includ-
ing schools, while the USGS data contains polyline                  # find rail links within buffer
data for rail lines as well as polygons for differ-                 ?rail a trans:railFeature ;
ent regions. Listing 14 shows the GeoSPARQL for-                     geo:hasGeometry ?rail_geo .
mulation for this question. This query takes advan-
tage of several features of the language. First, the                # only get railroads within
geo:sfWithin predicate is used to determine what                       Atlanta
schools exist within a boundary for Atlanta (as defined             ?rail_geo geo:sfWithin [ a geo:
by a polygon in the query). Each school geometry is                    Geometry; geo:asWKT ?place ] ;
then buffered by 100 meters using the geof:buffer                     geo:asWKT ?rail_wkt_s .
function. The rail features are retrieved and checked to
see if they fall within this buffer. Finally, for all rail line     # convert string to WKT literal
segments that intersect the buffer, the actual distance to          BIND (sf:wktLiteral(?rail_wkt_s)
the school is calculated using the geof:distance                       AS ?rail_wkt) .
function. Table 3 displays the school resources and the             FILTER (geof:sfIntersects(?
distance to the nearest rail line segment. This query,                 rail_wkt,?s_buff)) .
however, could be made more effective if there was an               BIND (geof:distance(?school_wkt,?
equivalent to the ST_DWithin from OGC Simple Fea-                      rail_wkt,units:m) AS ?distance
tures Specification [25].                                              ) .
   While the sample query assumes everything is con-                }
tained in a single graph, it is not unusual for the data to       }
exist in several different graphs or endpoints. In fact, it       ORDER BY ASC(?distance)
would be ideal to not have to replicate data and to be
able to query it remotely through the dataset provider.
Until the means for query federation, such as the mech-
15

                           Table 3
                                                            ized. This has allowed us to identify and resolve addi-
            Atlanta Schools Near Rail Lines Results
                                                            tional issues with the specification earlier.
   school                              distance                We would also like to thank the OGC and the other
   http://sws.geonames.org/4183400/    44.09248276546987    members of the GeoSPARQL working group for their
   http://sws.geonames.org/7242795/    80.33873752510539    continued efforts to bring this marriage of the geospa-
   http://sws.geonames.org/7242287/    91.9176767544314     tial community and the Semantic Web to life.

anism discussed in [27], are widely supported, query-
                                                                          Listing 15: RDF Prefixes
ing across remote linked datasets will be difficult. Geo-
SPARQL queries should be compatible with query fed-         apf: 
plications. In lieu of query federation, querying across    ex: 
amples is actually contained in two different graphs. A     gn: 
ever, enables a shorter and simpler query.                  gu: 
                                                            geo: 
                                                            geof: 
of previous work on combining RDF and OWL with              sf: 
geospatial data. Its creation means that geospatial data    gml: 
with an expectation of efficient geospatial queries.        os: 
nally utilize the significant amount of geospatial con-     ose: 
   Many triple stores, though they do not yet support       osg: 
thereof. This is an indicator of how important geospa-      owl: 
standard is released, the relevant vendors will move to
                                                            rdfs: 
tently exchange and process geospatial data.
                                                            spatial: 
truly realize the geospatial Semantic Web, technolo-
                                                            time: 
gies such as GeoSPARQL and implementations like
                                                            trans: 
it useful both for working on geospatial Semantic Web
applications and understanding the specification.           units: 
                                                            wgs84_pos: 
8. Acknowledgments
                                                            xsd: 
   We would like to thank the people at the Cen-
                                                            virtrdf: 
SPARQL even before the specification has been final-
You can also read