Understanding the European Data Portal - High level presentation of the architecture

Page created by Mildred Davis
 
CONTINUE READING
Understanding the
         European Data Portal
High level presentation of the architecture
        data.europa.eu/europeandataportal
countries involved in the EU’s              Access to the Portal
What we do                             neighbourhood policy.
                                       But how does the Portal                     The access to the Portal
More and more volumes of               function? This factsheet                    is provided in two ways: a
data are published every               provides a summary of the                   machine-readable API and
day. The amount of data                architecture of the European                a human readable web
across the world is increasing         Data Portal.                                site (GUI). The API enables
exponentially. A substantial                                                       its users to search, create,
amount of this data is                                                             modify and delete metadata
collected by the public sector.       The architecture                             on the portal.
But for the data to be re-used,                                                    The GUI is basically built on
it needs to be accessible.             For a better understanding                  two components: CKAN and
                                       of the integration of the                   DRUPAL. CKAN manages
The first official version of          components into the                         and provides metadata
the European Data Portal               overall architecture, each                  content (datasets) in a central
is available since February            component’s functionality                   repository. DRUPAL provides
2016. The Portal harvests              as well as interactions from                the Portal’s Home Page with
the metadata available on              different perspectives (user/               editorial content (e.g. Portal’s
public data and geospatial             system) are described and                   objectives, articles, news,
portals across European                published on GitLab.                        events, tweets, etc.) and links
countries. Portals can be              The figure at the bottom of                 to an Adapt Framework
national, regional, local              this page provides a high                   based training platform. In
or domain specific. They               level overview diagram of                   addition it offers extended
cover the EU Member                    the European Data Portal                    functionalities to registered
States, EFTA countries and             architecture.                               users via user login by ECAS.

                            Third party
                                                                          User/Expert
                          portals / Experts

                                               API

                                                         Proxy
                                                                    Graphical                                API/Portal
                                                     Usage, access Visualisation    Graphical
                                                                                                         Portal
   Portal   API/Portal    Portal              Portal  search etc       tool            pre-
                                                      statics and                   processor   API/Portal
                                                        caching

              Help        map.                                                                       Licensing
  MQA         Desk        apps      FME         Drupal    SOLR     CKAN       Gazetter Harvester Assistant
                                                                                       (Transformer)
             (JIRA)      backend                                    Sync

                                                     SPARQL                   DCAt-AP           Virtuoso
                                                     Manager
Both systems are used in a          Searching the Portal               to datasets can be visualised
side-by-side architecture.                                             in tabular (tables) and
A proxy is responsible for          The portal uses the SOLR           graphical (charts) form using a
delivering the web pages            search engine in order to          D3.js library.
requested by the user. Both         separately search for editorial
systems are equally themed          content in DRUPAL and
with the same Look & Feel           for datasets in the CKAN
so that the user is not aware       repository. The GUI includes
on which system he/she is           a Licensing Assistant
currently browsing.                 component that supports                The Portal architecture
                                    the user by providing legal           includes three additional
Multilingualism                     information on the usage of           components to enhance
                                    a specific dataset in terms                  the quality
The Portal GUI will support         of licenses that apply to the
all 24 official EU languages        dataset.
for main editorial and main
metadata content (using MT@         The SPARQL Manager
EC). Training content will be       component allows the user to
available in English and French     enter and run SPARQL queries
only. Additional material will be   on the Virtuoso linked data        Harvesting Data
made available in English or in     repository. It also allows the     On the Harvesting side, the
the source language.                logged-in user to store and        portal follows a two-fold
                                    re-run SPARQL queries and          architecture too. CKAN is
                                    notifies the user when a query     used as the central metadata
                                    has finished running.              repository for storing,
                                    Geo spatial data                   browsing and searching
                                                                       datasets in a POSTGRES
                                    Using the map.apps backend         relational database.
                                    application, Geo spatial data
                                    is visualised on geographical      In order to also support a
                                    maps. The application              linked data functionality the
                                    is a proprietary solution          CKAN metadata is replicated
                                    that comes with different          into a Virtuoso quad store
                                    tooling and thematic focus,        repository via a CKAN
                                    a graphical configuration          synchronisation extension,
                                    interface, supports                in order to ensure that both
                                    responsive web-design and          repositories have the same set
                                    internationalisation files.        of metadata.

                                    The application also               The Harvester is a separate
                                    implements the OSGI                component that is able to
                                    specification on the client side   harvest data from multiple
                 OPEN DATA          (in JavaScript) allowing sharing   data sources with different
                                    and re-usage of the bundled        formats and APIs. The
                                    application logic as well as a     harvester is acting as a
                                    straightforward maintenance.       single point of entry for all
                                    Statistical data that is linked    metadata that gets harvested,
transformed into the CKAN        of all spatial file/database       metadata and creates
JSON schema and pushed           formats and that is used for       tickets for the Helpdesk in
into the CKAN repository.        harvesting the sources for         case of harvesting issues.
                                 geographical names.
The Gazetteer component                                             The third component is
is used by the Harvester         Enhancing quality                  the monitoring component
to enhance the metadata                                             based on PIWIK and
                                 The Portal architecture            located at the Proxy in
with geo-spatial data            includes three additional
and information (geo-                                               the architecture. In the full
                                 components to enhance              respect of data privacy,
coordinates, names, places,      the quality of the metadata
etc.). The Gazetteer is                                             it records requests and
                                 and the portal. A Helpdesk         user interactions on the
mainly used to improve the       handles user support
search functionality. It uses                                       portal in order to generate
                                 requests and feedback.             anonymised user traffic
the FME component as a
universal spatial ETL tool       The Metadata Quality               statistics that will help
(Extract-Transform-Load)         Assistant (MQA) periodically       enhancing the usage of the
that supports the accessing,     generates reports on the           Portal.
processing and outputting        quality of the harvested

 Component          Description

API Access         Machine-readable (SOAP / REST) API

GUI Access         Portal website graphical user interface

CKAN		             Portal’s central metadata (dataset) repository

DRUPAL             Portal’s Home Page managing editorial content

ECAS		             European Commission Authentication System used for user registration
		                 and login in order to provide extended functionalities of the Portal

Adapt Framework Online platform used for Portal Training Modules (available in EN + FR)

Proxy		            Routing of (HTTP(S)) user requests to corresponding components

SOLR Portal        Search engine used for searching portal editorial content
Search
(editorial data)
SOLR Dataset      Search engine used for searching and filtering datasets in the CKAN
Search (metadata) metadata repository

Licensing          Component to provide legal information on (re-)usage of specific datasets
Assistant

SPARQL             SPARQL query editor allowing to run SPARQL queries on
Manager		          linked data in the Virtuoso repository
Component          Description

Virtuoso          Linked data quadruple store that is synchronized with the CKAN repository
map.apps:
Geo-spatial        Proprietary application to visualize geo-spatial data and information using
Data Visualisation geo-maps. It comes with different tooling and thematic focus, a graphical
		                 configuration interface, supports responsive web-design, i18n
                   internationalization files, client side implementation of the OSGI
		specification (JavaScript)

Graphical Data    Recline.js/D3.js JavaScript libraries to visualize (statistical) data in
Visualisation     tables and graphical charts

Pre-processor RESTFUL web API, running on a Node.js server which analyzes and
        		    transforms XSL/XLSX files into CSV format in order to be used from the
		Visualisation tool

Harvester         Single entry point component for harvesting data from multiple data
		                sources in different formats and from different APIs

Gazetteer         Component providing geo-spatial data and information

FME		             Component used by the Gazetteer as a universal spatial ETL tool (Extract-
		                Transform-Load) that supports the accessing, processing and outputting
                  of, all spatial file / database formats and that is used for harvesting the
		                sources for geographical names

smart.finder      Component used by the Gazetteer and simplifying searches for spatial
		                data, services and documents. It enables fast and structured access to
		                extensive, distributed and heterogeneous data stores
Helpdesk          Portal offers a user request/feedback form via the JIRA-API and generates
		                JIRA tickets for follow-up by helpdesk

Metadata Quality Component to report on the quality of the harvested metadata and to alert
Assistant (MQA) helpdesk in case of issues

Monitoring        PIWIK component that provides Traffic Analytics of portal usage

Multilingual      Web pages + core editorial content + dataset descriptions available in all
Support		         24 official EU languages

MT@EC		           Machine Translation Services of the European Commission used for
		                translation of the metadata into all of the supported languages by the portal
Understanding the European Data Portal
High level presentation of the architecture

Last update: August 2016

For more information, please visit the European Data Portal or contact us via email.

data.europa.eu/europeandataportal | info@europeandataportal.eu

The European Data Portal initial content has been collected by harvesting national public data
and geospatial portals. Progressively, the portal will harvest additional metadata collected
from regional, local and domain specific portals. Do you want your portal or website to be
harvested by the European Data Portal? Read the requirements on the website.

Share your story about how you make use of Open Data. Are you an entrepreneur? A non-
governmental organisation? A civil servant responsible for publishing data? A local authority?
Tell us your story! The purpose of the collection of use cases is to assemble interesting
European stories about the benefits and efficiency gains that result from the use of Open
Data. Contact us via the website and fill out the form.

Consortium:

www.capgemini-consulting.com       www.opendatainstitute.org         www.intrasoft-intl.com

www.timelex.eu                     www.sogeti.com                    www.southampton.ac.uk

www.conterra.de                    www.fraunhofer.de
You can also read