Deliverable D2.2 Social Data Collection and Processing Pipeline - (DEMO) - PlasticTwist

Page created by Shannon Munoz
 
CONTINUE READING
Deliverable D2.2 Social Data Collection and Processing Pipeline - (DEMO) - PlasticTwist
Ref. Ares(2018)4435267 - 29/08/2018

                           Deliverable D2.2
               Social Data Collection and
                       Processing Pipeline
                                         (DEMO)

This project has received funding from the European Union’s Horizon 2020 Research and Innovation
                           Programme under Grant Agreement No. 780121
Deliverable D2.2 Social Data Collection and Processing Pipeline - (DEMO) - PlasticTwist
PTwist – GA No. 780121
                                                    D2.2 Social Data Collection and Processing Pipeline
 H2020 ICT-11-2017

Copyright
© Copyright 2018 The PTwist Consortium

Consisting of:

       ARISTOTELIO PANEPISTIMIO THESSALONIKIS
       FACHHOCHSCHULE ZENTRALSCHWEIZ - HOCHSCHULE LUZERN
       NUROGAMES GMBH
       BETTER FUTURE FACTORY BV
       ALMERYS
       EOLAS S.L.
       DIKTYO MESOGEIOS SOS
       STICHTING BLUECITY
       TEKNOLOJI ARASTIRMA GELISTIRME ENDUSTRIYEL URUNLER BILISIM TEKNOLOJILERI SANAYI VE
        TICARET ANONIM TICARET

This document may not be copied, reproduced, or modified in whole or in part for any purpose without
written permission from the PTwist Consortium. In addition, an acknowledgement of the authors of the
document and all applicable portions of the copyright notice must be clearly referenced.

All rights reserved.

This document may change without notice.

                                                                                                          1
Deliverable D2.2 Social Data Collection and Processing Pipeline - (DEMO) - PlasticTwist
PTwist – GA No. 780121
                                                          D2.2 Social Data Collection and Processing Pipeline
 H2020 ICT-11-2017

 Document Classification

 Title                      Social Data Collection and Processing Pipeline
 Deliverable                D2.2
 Type                       DEM: Demonstrator
 Work Package               WP2 – Pilots Requirements and Data Modelling
 Partners                   AUTH
 Authors                    Dimitriadis Ilias
 Dissemination Level        PU (Public)

 Abstract

This document presents a demonstration of the component that implements the heterogeneous social
media content collection and processing pipeline. This crowdsourcing component consists of two main
modules:

   1. The Plastics Topics observatory
   2. The Open Designs repository

This document provides a detailed description of how to use the up-to-now developed modules, a report of
the technologies and frameworks used to develop this component and finally a short demonstrator video
of some possible use cases.

 Version Control

 Version    Description                                           Name                        Date
 1.0        Initial draft                                         Dimitriadis Ilias           1 Aug 2018
 2.0        Revised                                               Dimitriadis Ilias           8 Aug 2018
 3.0        Added Video Demo                                      Dimitriadis Ilias           18 Aug 2018
 4.0        Final Version                                         Dimitriadis Ilias           21 Aug 2018

                                                                                                                2
Deliverable D2.2 Social Data Collection and Processing Pipeline - (DEMO) - PlasticTwist
PTwist – GA No. 780121
                                                                                      D2.2 Social Data Collection and Processing Pipeline
 H2020 ICT-11-2017

Table of Contents
1. Introduction ................................................................................................................................................... 5
2. Data Collection and Processing Modules ...................................................................................................... 7
   2.1. Twitter .................................................................................................................................................... 7
   2.2. Facebook............................................................................................................................................... 12
   2.3. Thingiverse and FlickR .......................................................................................................................... 14
3. Demonstration of the Crowdsourcing Platform .......................................................................................... 14
   3.1. Welcome Page ...................................................................................................................................... 14
   3.2. Locations............................................................................................................................................... 15
   3.3. Wordclouds .......................................................................................................................................... 17
   3.4. Influencers ............................................................................................................................................ 18
   3.5. Tweets .................................................................................................................................................. 19
       3.5.1. Top Tweets .................................................................................................................................... 19
       3.5.2. Top URLs ........................................................................................................................................ 20
       3.5.3. Topic modelling ............................................................................................................................. 21
   3.6. Repository............................................................................................................................................. 22
4. Future Work................................................................................................................................................. 24
5. Video Demonstration .................................................................................................................................. 25
6. References ................................................................................................................................................... 26

                                                                                                                                                                     3
Deliverable D2.2 Social Data Collection and Processing Pipeline - (DEMO) - PlasticTwist
PTwist – GA No. 780121
                                                                                    D2.2 Social Data Collection and Processing Pipeline
 H2020 ICT-11-2017

Table of Figures
Figure 1 Topic observatory – Design repository ................................................................................................ 5
Figure 2 Social Media Sources ........................................................................................................................... 6
Figure 3 Python and Twitter ............................................................................................................................ 10
Figure 4 Data Life Cycle ................................................................................................................................... 11
Figure 5 Find page of interest Facebook ......................................................................................................... 12
Figure 6 Find all posts in page ......................................................................................................................... 13
Figure 7 Get information for specific post....................................................................................................... 13
Figure 8 Get all comments for this post .......................................................................................................... 13
Figure 9 Landing Page ...................................................................................................................................... 14
Figure 10 Locations .......................................................................................................................................... 15
Figure 11 Locations heatmap .......................................................................................................................... 16
Figure 12 Wordcloud ....................................................................................................................................... 17
Figure 13 Influencers ....................................................................................................................................... 18
Figure 14 top tweets........................................................................................................................................ 19
Figure 15 top URLs ........................................................................................................................................... 20
Figure 16 topic modelling ................................................................................................................................ 21
Figure 17 repository ........................................................................................................................................ 22
Figure 18 Plastic Twist Thingiverse Group ....................................................................................................... 23
Figure 19 future work ...................................................................................................................................... 24

                                                                                                                                                                4
Deliverable D2.2 Social Data Collection and Processing Pipeline - (DEMO) - PlasticTwist
PTwist – GA No. 780121
                                                            D2.2 Social Data Collection and Processing Pipeline
 H2020 ICT-11-2017

1. Introduction

In the Ptwist project, we are developing an open platform for plastic lifecycle awareness, monetization and
sustainable innovation. In this Deliverable, we focus on the crowdsourcing platform which has been
developed. The final version of the crowdsourcing part of the platform will be publicly available after the
end of month 10 (ten) of the Project, which is in October. We present the component that implements the
heterogeneous social media content collection and processing pipeline, along with the technologies we
have been developing to facilitate the data collection and analysis.

The Plastic Twist platform will deploy bottom-up and trustworthy applications and tools, supporting plastics
as an asset principle by cutting edge, mostly open-source technologies, such as:

       Social media data threads that have been monitored and analysed to capture the wisdom of the
        crowds and systemize a plastics topic observatory
       Open data, such as plastic machine designs, 3d printer designs, images of plastic reuse ideas, etc.

The plastics crowdsourcing topic observatory and the open data & plastic designs repository are built upon
the intelligence that has been extracted by data collected using Social Media and open data sources.
However, very few of the popular sources allow users – developers to have access on data, even if these
are public posts. Therefore, the crowdsourcing tool will offer information and knowledge that has been
collected by analysing data on the following social networks:

       Facebook (Facebook, n.d.)
       Twitter (Twitter, n.d.)
       Flickr (Flickr, n.d.)
       Thingiverse (Thingiverse, n.d.)

                                   Figure 1 Topic observatory – Design repository

                                                                                                                  5
Deliverable D2.2 Social Data Collection and Processing Pipeline - (DEMO) - PlasticTwist
PTwist – GA No. 780121
                                                              D2.2 Social Data Collection and Processing Pipeline
   H2020 ICT-11-2017

  Facebook and Twitter will be responsible for providing the topics and other textual content, while
  Thingiverse and Flickr will be the main open designs – plastic reuse ideas sources. At this point, we should
  stress out that Facebook has recently updated the usage terms of the Graph API and does not yet allow
  access to any kind of data, not even public posts at public pages. Due to this fact, Facebook has been used
  as a data source for only a short period of two months.

Social Nets Approved:                                               Media Sharing Communities:
Twitter:                                                            Thingiverse:
          +   Open API                                                         •   a 3d printer designs repository
          +   Ability to search for term or user                               •   Find users – groups
          +   Multiple features                                                •   Open API
          +   Can extract past information                                     •   300calls/5min window
          +   Rate Limit in Search API                                         •   Returns the license as well (can be
          +   Hard to extract user network                                         used)
          +   Noisy data - Short texts
          +   Not very popular in all countries
                                                                    Flickr:
                                                                               •   Can be used as a data source for
Facebook:                                                                          plastic reuse ideas
          +   Tons of information                                              •   Open API (3600calls/hour)
          +   High user engagement worldwide                                   •   Search by tags (max 4000 newest
          +   Graph API – Strict rate limits                                       photos)
          +   No ability to search by term (not                                •   Search groups by text
              even the public posts containing this
              term) – access only on pages and
              personal information
          +   200 calls / user rate limit
                                            Figure 2 Social Media Sources

                                                                                                                    6
Deliverable D2.2 Social Data Collection and Processing Pipeline - (DEMO) - PlasticTwist
PTwist – GA No. 780121
                                                                      D2.2 Social Data Collection and Processing Pipeline
           H2020 ICT-11-2017

          2. Data Collection and Processing Modules

          AUTH has developed a pipeline of modules responsible for the collection and processing of publicly
          available social media content and open data. We will mostly focus on the process of collecting data from
          Twitter and Thingiverse, since the Flickr Data Crawler is still under development and as we have mentioned
          earlier, Facebook has restricted the access to public posts in public Pages.

          2.1. Twitter
          Twitter allows users to search for a specific keyword(s) or posts from a certain user account(s). Since Ptwist
          focuses on plastic, the pilot partners have provided AUTH with a list of specific keywords and certain user
          accounts that are considered to be experts on the field of plastic reuse. Each pilot has contributed in the
          creation of this list. The list contains they keywords in all the pilots’ native languages in order to collect data
          that reflect trends and interesting topics in each pilot’s country. As we can see in the following tables, there
          are four languages available: English, Dutch, German and Greek.

English                    Dutch                  German                          Greek                        Groups

plastic                    plastic                Plastik                         πλαστικό                     General terms

single use plastic                                Einwegplastik                   πλαστικό μιας χρήσης         General terms

reuse                      hergebruik             Wiederverwendung                επαναχρησιμοποίηση           General terms

reduce                     verminderen            Reduktion                       μείωση                       General terms

recycle                    recycleren             rezyklieren                     ανακύκλωση                   General terms

upcycling                  upcycling              upcyclen                        upcycling                    General terms

downcycling                                                                       downcycling                  General terms

waste                      afval                  Abfall                          απόβλητα                     General terms

litter                     zwerfafval             Abfall                          σκουπίδια / απορρίμματα General terms

plastic soup               plastic soep           Plastiksuppe                    πλαστική σούπα               General terms

zero-waste                 zero-waste             Null-Abfall                     μηδενικά απόβλητα            General terms

no-waste                   afvalvrij                                              χωρίς απόβλητα               General terms

plastic free               plastic vrij           frei von Plastik                χωρίς πλαστικό               General terms

virgin plastics            virgin plastics                                        καθαρό/ καινούριο/           General terms
                                                                                  πρωτογενές πλαστικό

deposit fee                statiegeld             ohne Inhaltsstoffe              τέλος ταφής                  General terms

recycling fee                                                                     τέλος ανακύκλωσης            General terms

                                                                                                                            7
Deliverable D2.2 Social Data Collection and Processing Pipeline - (DEMO) - PlasticTwist
PTwist – GA No. 780121
                                                                    D2.2 Social Data Collection and Processing Pipeline
         H2020 ICT-11-2017

deposit return                                                                 σύστημα επιστροφής,         General terms
system                                                                         συλλογής / σύστημα
                                                                               εγγυοδοσίας

pollution                 vervuiling           Umweltverschmutzung             ρύπανση                     General terms

packaging                                                                      συσκευασία                  General terms

Eco-design                                     Ökodesign                                                   General terms

End-of-waste                                                                                               General terms

Microplastic                                   Mikroplastik                                                General terms

"Unrecyclable"                                 Nichtrezyklierbar                                           General terms

Jetsam                                         Strandgut                                                   General terms

single-use products                            Einwegprodukt                                               General terms

Bag                                            Tüte                                                        General terms

Plastic tax                                    Plastiksteuer                                               General terms

Bio-based packaging                            Biobasierte Verpackung                                      General terms

Recyclability                                  Rezyklierbarkeit                                            General terms

Waste recovery                                                                                             General terms

Anthropogenic Litter                                                                                       General terms

Incineration                                                                                               General terms

Trash                                          Müll                            σκουπίδια                   General terms
                                                              Table 1

         English               Dutch             German                       Greek                    Groups

         Straws                Rietje            Strohhalme                   Καλαμάκια                Product

         plastic cup           Plastic beker     Plastiktassen                Κύπελλα / Ποτήρια        Product

         plastic bottle        plastic fles      Plastikflasche               Μπουκάλι                 Product

         plastic cap           plastic dop       Plastikdeckel                Καπάκι                   Product

         Wrapping              Verpakking        Verpackung                   Περιτύλιγμα              Product

         Foil                  Folie             Folie                        Αλουμινόχαρτο            Product

         Filament              Filament          Filament                     Νήμα                     Product

                                                                                                                          8
Deliverable D2.2 Social Data Collection and Processing Pipeline - (DEMO) - PlasticTwist
PTwist – GA No. 780121
                                                         D2.2 Social Data Collection and Processing Pipeline
H2020 ICT-11-2017

plastic bag        plastieken zak     Plastiktüte                    πλαστικές σακούλες       Product

Sachet             Zakje              Beutel                         Φακελάκι                 Product
                                                   Table 2

English           Dutch               German                         Greek                    Groups

Shredder          Vermaler            Reisswolf                      Τεμαχιστής               Machines

Extruder                              Extruder                       Extruder                 Machines

Ultimaker                             Ultimaker                      Ultimaker                Machines

                                                                     Τρισδιάστατος            Machines
3D printer                            3D-Drucker                     εκτυπωτής

Container         Container           Container                      Περιέκτης                Machines
                                                   Table 3

English            Dutch              German                         Greek                    Groups

Granulation        Granuleren         Granulation                    Κοκκοποίηση              Process

Molding            Spuitgieten        Formen                         Καλούπι                  Process

Injection          Injectie           Injektion                      Έγχυση                   Process
                                                   Table 4

English        Dutch                German                   Greek                   Groups

Compostable    Composteerbaar       compostierbar            Κομποστοποιήσιμο        Innovations

Biodegradable biologisch            biologisch               Βιοαποικοδομήσιμο Innovations
              afbreekbaar           abbaubar

Coating        Coating              Beschichtung             Επικάλυψη               Innovations

Bioplastics    Bioplastic           biologischer             Βιοπλαστικά             Innovations
                                    Kunststoff

Biobased       Biobased             biobasiert               Biobased                Innovations

Sea-weed       Zeewierverpakking Seegrasverpackung Συσκευασία από                    Innovations
packaging                                          φύκια

               Meelmotlarwe                                                          Innovations

               Pyrolyse                                                              Innovations
                                                   Table 5

                                                                                                               9
PTwist – GA No. 780121
                                                            D2.2 Social Data Collection and Processing Pipeline
 H2020 ICT-11-2017

 English              Dutch                German                         Greek                 Groups

 Plastic pollution    Plastic pollution    Plastic pollution              Plastic pollution     Plastic
                                                                                                pollution
                                                     Table 6

These keywords are then classified using a taxonomy designed by the pilot partners as well, offering an
initial set of different topics. More specifically, the main categories are the following:

       General terms
       Products
       Machines
       Processes
       Innovations
       Plastic pollution (global)

The module responsible for collecting data from Twitter has been built using Python (Python, n.d.)
programming language. AUTH has developed a Twitter listener that connects to Twitter’s Streaming API
(twitter api, n.d.) responsible for collecting posts that contain any of the keywords above. In order to collect
posts from specific users, we use Twitter’s Search API. The Twitter listener that has been developed, uses a
Python wrapper for Twitter’s API called Tweepy (Tweepy, n.d.). The data that are needed to perform all the
analysis tasks are stored in a MongoDB database (MongoDB, n.d.). More information regarding the
database related tasks can be found in the Deliverable 2.1.

                                            Figure 3 Python and Twitter

                                                                                                                  10
PTwist – GA No. 780121
                                                            D2.2 Social Data Collection and Processing Pipeline
 H2020 ICT-11-2017

Following the filtering process, we end up with a set of collections for each keyword category and language,
along with the results of the data analysis process.

                                             Figure 4 Data Life Cycle

Once the collected data are filtered and processed, we proceed to the data analysis phase which refers to
the following modules:

       Create Wordclouds (most frequent words – hashtags)
       Identify the locations with higher social media presence regarding plastic
       Discover the most propagated tweets and urls
       Find the most influential users
       Discover popular topics

The developed python modules communicate with the database using the Pymongo library (Pymongo,
n.d.). Each Twitter post contains information regarding the time and date the post has been published. The
developed python module is also responsible for converting this information into a Datetime object, using
the datetime python library. The plastic topics repository, as we will see later, includes certain modules
responsible for. These modules have been extensively described in Deliverable 2.1. The python libraries
that have been used in these modules are the following:

       nltk (wordclouds, filtering) (Nltk, n.d.)
       networkx (influencers) (NetworkX, n.d.)
       gensim (Topic Modeling) (Gensim, n.d.)
       ArcGIS (Reverse geocoding API for locations) (ArcGis, n.d.)

The results of this phase are presented in a web application which has been developed by AUTH. The web
application retrieves all the data needed by making calls to an API that has been designed for this reason.
This API has also been developed by AUTH and will be open to public after month 10 of the project. The API
has been developed using Flask, a REST API Framework in Python (Flask, n.d.).

                                                                                                                  11
PTwist – GA No. 780121
                                                             D2.2 Social Data Collection and Processing Pipeline
 H2020 ICT-11-2017

In the front-end of the developed web application, cutting-edge technologies have been used. Most of
them are JavaScript components, mainly used to enhance the user experience of the platform user. A list of
the JS components used can be found below:

       Webpack (Assets bundling) (webpack, n.d.)
       VueJS (Front-end framework) (VueJS, n.d.)
       Bootstrap v4 (CSS Framework) (Bootstrap, n.d.)
       JQuery (Bootstrap dependency) (JQuery, n.d.)
       leaflet.js (Interactive OpenStreet Maps) (leaflet, n.d.)
       Axios (JS client for HTTP requests) (Axios, n.d.)
       Moment.js (Datetime parsing) (momentJs, n.d.)
       DateRangePicker (dateRangePicker, n.d.)
       Wordcloud2.js (WordCloud, n.d.)
       pyLDAvis (Topic Modelling visualization) (pyLDA, n.d.)

2.2. Facebook
Similar to Twitter, Facebook also has an API, the Graph API (Graph API, n.d.) that allowed users to collect
data from public posts in public Facebook Pages. The pilot partners provided AUTH with a list of Facebook
pages that refer to plastic reuse, reduce, pollution, etc. Most of the pages had a great number of followers,
so it would be a precious data source. In the first months of the project AUTH started collecting data using
the python sdk framework for Facebook’s Graph API (Facebook SDK, n.d.). The process followed the steps
below:

       Find a page of interest
       Find posts
       Get information for each post
       Get the comments for each post

                                       Figure 5 Find page of interest Facebook

                                                                                                                   12
PTwist – GA No. 780121
                                                             D2.2 Social Data Collection and Processing Pipeline
 H2020 ICT-11-2017

                                          Figure 6 Find all posts in page

                                     Figure 7 Get information for specific post

                                      Figure 8 Get all comments for this post

However, as already mentioned, Facebook has updated the terms of usage of the Graph API and this
process is not permitted any more. For this reason, a small amount of data has been collected and will be
used just as a showcase in subsequent phase of the crowdsourcing platform

                                                                                                                   13
PTwist – GA No. 780121
                                                           D2.2 Social Data Collection and Processing Pipeline
 H2020 ICT-11-2017

2.3. Thingiverse and FlickR
Thingiverse is a thriving design community for discovering, making, and sharing 3D printable things. It offers
a well-documented open API (Thingiverse API, n.d.) that allows users – developers to collect all the publicly
available data. AUTH, using a python wrapper for Thingiverse API collects and updates all the designs and
information that are hosted in the Thingiverse platform. Moreover, a PlasticTwist group has been created
in the Thingiverse groups section and a module which will allow plastic twist users to upload their own
designs in the PlasticTwist Thingiverse group directly from the Plastic Twist platform is being developed.

Flickr, also offers an open API (flickr API, n.d.) which allows users – developers to search for images using
certain keywords. Using a python wrapper for Flickr’s API , AUTH will develop a module that will crawl
images of plastic reuse ideas that are popular in Flickr. This module is also still under development and will
be available by the end of October.

3. Demonstration of the Crowdsourcing Platform

The crowdsourcing platform is currently hosted in AUTH’s premises and is only open to the pilot partners,
since it is still under development. The pilot partners send their feedback to improve the platform as much
as possible.

3.1. Welcome Page
The landing page of the Crowdsourcing tool features the Plastic Twist logo, a navigation bar and a link to
the documentation of the Ptwist API (Figure 9 Landing Page) . The documentation is not yet available since
the final version of the crowdsourcing platform will be announced later.

                                              Figure 9 Landing Page

                                                                                                                 14
PTwist – GA No. 780121
                                                           D2.2 Social Data Collection and Processing Pipeline
 H2020 ICT-11-2017

On top of the homepage, the user can see the navigation bar which will guide him/her through all the
functionalities that are currently available. All the functionalities will be described in the following
subsections.

3.2. Locations
As we can see in Figure 10 Locations, the user can select a certain date range, a specific keyword category
and a specific language.

                                               Figure 10 Locations

In this example we have chosen to view the locations of all the tweets that refer to plastic innovations, in
English language, from July 23rd to August 2nd. The user can zoom in or out of the map using either the scroll
wheel of the mouse, the signs + - on the top left corner of the map or by clicking on the numbers in the
map. The numbers display the number of the tweets related to plastic innovation posted in that period, in
that location. The heatmap option offers a different visualization of the numbers displayed in the cluster
option (see Figure 11 Locations heatmap).

                                                                                                                 15
PTwist – GA No. 780121
                                         D2.2 Social Data Collection and Processing Pipeline
H2020 ICT-11-2017

                         Figure 11 Locations heatmap

                                                                                               16
PTwist – GA No. 780121
                                                           D2.2 Social Data Collection and Processing Pipeline
 H2020 ICT-11-2017

3.3. Wordclouds
Wordclouds visualize the most frequent words and hashtags used in the posts we have already collected.
The highest the frequency, the largest the font size. The user can select a date range in this section as well.
Once the user selected the date range, he/she chooses the category of the keywords and the language.

                                               Figure 12 Wordcloud

In this example (see Figure 12 Wordcloud) we have chosen to search for the most popular hashtags for the
dates between July 23rd and August 2nd, for the plastic pollution keyword category. As we can see, the
hashtags #plasticpollution, #beatplasticpollution and #adidasparley are the most prevalent.

                                                                                                                  17
PTwist – GA No. 780121
                                                            D2.2 Social Data Collection and Processing Pipeline
 H2020 ICT-11-2017

3.4. Influencers
A social media influencer is a user who has established credibility in a certain topic. Influencers usually have
access to a large audience and can persuade others by virtue of their authenticity and reach (pixlee, n.d.). In
order to discover these users, we have created a social graph based on the retweets, mentions and replies
between all the users whose posts have been collected. By applying, specific algorithms, we have managed
to identify the top-100 influencers. More information regarding these algorithms can be found in D2.1. For
the moment the influencer detection is not time dependent. However, the final version of the
crowdsourcing platform will include this feature as well.

                                               Figure 13 Influencers

                                                                                                                  18
PTwist – GA No. 780121
                                                           D2.2 Social Data Collection and Processing Pipeline
 H2020 ICT-11-2017

To view the top-100 influencers (see Figure 13 Influencers) the user should visit the “influencers” link on
the navigation bar. Over there, the user will find the Twitter accounts of the user – experts as identified by
the influencer detection algorithm (e.g. UNEnvironment , NatGeo, etc. ). Moreover, the user can click on
the “provided by the pilots” tab and see some of the accounts that the pilots recommended as experts.

3.5. Tweets
The user can also view the most propagated tweets. To do so, he/she has to visit the tweets tab in the
navigation bar of the homepage. Once the user hovers over the “Tweets” tab, three options are available:

3.5.1. Top Tweets
The top tweets is a list of the tweets with the highest number of retweets. The first one is the tweet that
has been retweeted more times than any other post in our collection during a certain period. Same as in
the Wordcloud section, the user can select a date range, a keyword category and a language. In this
example (see Figure 14 top tweets ) we show the top tweets for a period of 15 days, regarding plastic
pollution.

                                               Figure 14 top tweets

                                                                                                                 19
PTwist – GA No. 780121
                                                         D2.2 Social Data Collection and Processing Pipeline
 H2020 ICT-11-2017

3.5.2. Top URLs
If the user clicks on the top URLs tab, he/she is redirected to a list with the most propagated URLs. Twitter
users often include URLs in their posts. This page (see Figure 15 top ) presents the most frequent URLs in
our collection. For now, the application offers only the option of top URLs in total. However, the final
version of the crowdsourcing tool will also include date range, keyword category and language filtering. The
user can visit the link of interest by clicking on each URL accordingly.

                                              Figure 15 top URLs

                                                                                                               20
PTwist – GA No. 780121
                                                            D2.2 Social Data Collection and Processing Pipeline
 H2020 ICT-11-2017

3.5.3. Topic modelling
Finally, the topic modelling option redirects the user to a page that includes a list of the top 10 topics along
with top 30 most relevant terms for each topic. For example, topic number 1 (see Figure 16 topic
modelling) includes the terms on the right. As we can see, it focuses on plastic pollution and its impact on
the oceans and marine life. This section is still under development, but the user can get a small glimpse of
what we are about to offer.

                                             Figure 16 topic modelling

                                                                                                                   21
PTwist – GA No. 780121
                                                           D2.2 Social Data Collection and Processing Pipeline
 H2020 ICT-11-2017

3.6. Repository
As mentioned in the beginning of this demonstration, apart from the plastic topic observatory, Ptwist will
also offer a plastic designs repository. This repository will feature 3D printer designs, using the Thingiverse
open API and images of plastic reuse ideas posted on Flickr. Up to now, we have developed a repository
interlinked with Thingiverse, while the Flickr interlink is still under development. The user can visit the
interlinked repository by clicking on the “repository” tab on top. Over there, the user can find three (3)
categories (see Figure 17 repository) of 3D printer designs as classified by Thingiverse: Popular, Featured
and New. In the future another section, which will contain 3D printer designs made by Ptwisters and hosted
in the Plastic Twist Thingiverse Group, will be integrated. This group has already been created but for the
moment remains inactive (see Figure 18 Plastic Twist Thingiverse Group).

                                               Figure 17 repository

                                                                                                                 22
PTwist – GA No. 780121
                                                D2.2 Social Data Collection and Processing Pipeline
H2020 ICT-11-2017

                         Figure 18 Plastic Twist Thingiverse Group

                                                                                                      23
PTwist – GA No. 780121
                                                         D2.2 Social Data Collection and Processing Pipeline
 H2020 ICT-11-2017

4. Future Work

Since the final version of the crowdsourcing component of the Ptwist platform is not going to be delivered
earlier than the end of October, there are a lot of different modules yet to be implemented. Some of those
include (see Figure 19 future work ):

                                 Future Work

 1. Topic modelling per location

 2. Identification of topics

 3. Integrate alternative options of identifying influencers

 4. If possible – Integrate data collected from Facebook

 5. Add date, keyword and language filtering in top Urls

 6. Interlink the repository with Flickr and Plastic Twist
 Thingiverse Group
                                            Figure 19 future work

                                                                                                               24
PTwist – GA No. 780121
                                                      D2.2 Social Data Collection and Processing Pipeline
H2020 ICT-11-2017

5. Video Demonstration

   In order to view the video in full resolution click on the play icon on the above image and then
   right-click and choose 'Full Screen Multimedia'

                                                                                                            25
PTwist – GA No. 780121
                                                           D2.2 Social Data Collection and Processing Pipeline
 H2020 ICT-11-2017

6. References

ArcGis. (n.d.). Retrieved from ArcGis: https://www.arcgis.com/index.html
Axios. (n.d.). Retrieved from https://github.com/axios/axios
Bootstrap. (n.d.). Retrieved from https://getbootstrap.com/
dateRangePicker. (n.d.). Retrieved from http://www.daterangepicker.com/
Facebook. (n.d.). Retrieved from www.facebook.com
Facebook SDK. (n.d.). Retrieved from (https://facebook-sdk.readthedocs.io/en/latest/
Flask. (n.d.). Retrieved from Flask: http://flask.pocoo.org/docs/1.0/license/
Flickr. (n.d.). Retrieved from www.flickr.com
flickr API. (n.d.). Retrieved from https://www.flickr.com/services/api/
Gensim. (n.d.). Retrieved from Gensim: https://radimrehurek.com/gensim/
Graph API. (n.d.). Retrieved from https://developers.facebook.com/docs/graph-api/
JQuery. (n.d.). Retrieved from https://jquery.com/)
leaflet. (n.d.). Retrieved from https://leafletjs.com/
momentJs. (n.d.). Retrieved from https://momentjs.com/
MongoDB. (n.d.). Retrieved from https://www.mongodb.com/
NetworkX. (n.d.). Retrieved from NetworkX: https://networkx.github.io/
Nltk. (n.d.). Retrieved from Nltk: https://www.nltk.org/
pixlee. (n.d.). Retrieved from https://www.pixlee.com/definitions/definition-social-media-influencer
pyLDA. (n.d.). Retrieved from https://github.com/bmabey/pyLDAvis
Pymongo. (n.d.). Retrieved from Pymongo: https://api.mongodb.com/python/current/
Python. (n.d.). Retrieved from Python Docs: https://www.python.org/doc/
Thingiverse. (n.d.). Retrieved from https://www.thingiverse.com/about/
Thingiverse API. (n.d.). Retrieved from https://www.thingiverse.com/developers
Tweepy. (n.d.). Retrieved from Tweepy: http://www.tweepy.org/
Twitter. (n.d.). Retrieved from www.twitter.com
twitter api. (n.d.). Retrieved from https://developer.twitter.com/en/docs
VueJS. (n.d.). Retrieved from https://vuejs.org/
webpack. (n.d.). Retrieved from https://webpack.js.org/
WordCloud. (n.d.). Retrieved from https://github.com/timdream/wordcloud2.js/

                                                                                                                 26
You can also read