SENTIMENT ANALYZER USING MACHINE LEARNING - IRJMETS

Page created by Henry Hart

Society

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
Volume:02/Issue:06/June -2020 www.irjmets.com

SENTIMENT ANALYZER USING MACHINE LEARNING
Rakesh Suryawanshi*1, Akshay Rajput*2, Parikshit Kokale*3, Prof. Subodh S. Karve*4
Dept. of Computer Engineering, Datta Meghe College of Engineering, University of Mumbai Airoli, Navi Mumbai.
ABSTRACT
Nowadays everyone using social media like Facebook, Instagram, Twitter and other social medias. This social media platform
allows everyone to share their own point of view, opinions, fillings and emotions about anything. Twitter is a popular social
media where all peoples share their opinions about many public and private subjects in form of text called as tweets. This tweets
are big source for organizations to find out what customers think about their products and this information very useful for
organizations to make changes in services that companies provides and achieve satisfaction of customers towards their products,
plan best strategies and help peoples for decision making. This paper reports focus on design of sentiment analysis. Sentiment
analysis is where we can measure emotions and fillings of people from text. Sentiment analysis is done by extracting tweets in
large amount or people opinions from a social media platform. We are using Natural Language Processing Toolkit and techniques
to determine sentiment of tweet where it is positive, neutral and negative. This tweet is then categorised as per their sentiments.
The classified results are represented in charts like pie charts, bar charts, line charts and on html pages using flask framework of
python.
Keywords-component; Twitter, sentiment, social media, natural language processing, NLTK

I. INTRODUCTION
Now days everyone using social media to a share there opinions, emotions, what they think about particular things and their
private daily activities. Social media like Facebook, Instagram and Twitter allow users to communicate with the whole world.
However, people write anything such as social activities or any comment on products. Write their own opinions about products
or share their moments, even influence politics and companies. Social media gives opportunity for business that giving a platform
to connect with their customers such as social media to advertise or speak directly to customers for connecting with customer’s
perspective of products and services. Twitter, almost every company have an account on Twitter to know about their customers
feedback about their services or products. Due to social media like Twitter it is easy and more fast way for organizations to find
out what customers think about there products and services. Organizations can take actions on correct time and make some best
strategies to provide best products, customer satisfaction and services to customer.
To perform sentiment analysis we required a large data of people opinions that available on Twitter in large amount. For
sentiment analysis we need to extract data from twitter. There are some methods in natural language processing that helps to
directly extract tweets data from twitter. These tweets are in unstructured form. To perform sentiment analysis we need data in
structured form this is done by processing and cleaning tweets or data where data is converted into structured form and some
technique are applied to analyse sentiment of each tweets and categorize them into positive, negative and neutral tweets. To
make this in reality we are using Natural Language Processing Toolkit of NLP which is and python3 with python3 libraries and
twitter API called Tweepy that helps to extract tweets from twitter. We have used the VADER sentiment analysis. VADER is
stand for Valence Aware Dictionary and Sentiment Reasoner it’s a sentiment analysis tool that is rule based and lexicon based
analysis tool. This tool helps to provide sentiment of tweets. The analysed sentiments of data then represented in different way
like in pie chart, bar chart and line charts.
We also Focus on Single Tweet Only. We provided one section to analyse single tweet or sentence, latest news, any product
review, comments about any topic in form of text. In this section user only have to type or copy paste text that he wants to analyse
sentiment of it. User also able to analyse sentiment of large amount of tweets data at once. Once processing is done user will get
total number of tweets, sentiment on each single tweet as well as percentage of number of positive, negative and neutral tweets.
II. METHODOLOGY
A. Natural Language Processing (NLP)
NLP techniques are based on machine learning. NLP is specially used to understand human language by machine, translate
humans languages to other languages. This is done with the help of some rules of learning. In NLP contains some algorithms,
has some databases of words that contains meanings of word, full forms of words and spelling correction. Because of NLP
sentiment analysis become more easy and that helps to analyse sentiment of tweets or any input given by human.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[1234]

e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
Volume:02/Issue:06/June -2020 www.irjmets.com
B. Lexicon-based Approach
The most important sentiment indicator is sentiment words. These are words that are commonly used to express positive or
negative sentiments. For example, good, beautiful, and amazing, nice are words of positive sentiment, and bad, angry, sad and
terrible are words of negative sentiment. The word that not contains any positive and negative meaning I consider as word of
neutral sentiment. Such words are combined together and one list of same sentiment are created this list called sentiment lexicon.
There are some method that are lexicon based uses this predefined list. Each word in list has some meaning and sentiment in
database of words. In lexicon based approach sentiment of word in database and predicted sentiment of word both are matched.
C. NLTK (Natural Language Processing Toolkit)
We are using natural language processing techniques and methods. For sentiment analysis we are using NLTK 2.0.4 powered
text classification process. User types comments or tweets on social media contains acronyms, punctuation marks and emoticon
to express sentiments of them.
Twitter allows maximum 140 characters in single tweet, thus the limited length of tweet, might comprise of one or two sentences.
Thus our task is the simple breakdown of the tweet to extract the sentiment from it. The data extracted from tweets downloaded
from Twitter is in JSON format. We consider tweets and re-tweets to capture the sentiments of people and businesses. Our main
focus on parameters like username, Location, Positive tweet, Negative tweet, Neutral tweet, subject of tweet and Date. The
extracted readable data is saved to csv files.
D. Data Collection
To extract tweets from twitter we are using API of Python Tweepy to extract tweets from twitter and to find out tweets on single
topic in large amount. We can search and extract tweets on any topic like KFC or any political party and any products. At a time
twitter allows to extract only latest 4000 tweets. The data is downloaded in the JASON format. This JSON data are proceed to
extract readable tweets and user information. This data then cleaned and sentiment analysis is being performed. At the end we
get sentiments like positive , neutral and negative. We can also make use of dataset and we are using a dataset of “Us Airlines
tweets” which has total number of 14000 tweets.
E. Application Programming Interface (API)
Python Twitter Application Programming Interface (API) is used to extract tweets from twitter called Tweepy. Tweepy or Twitter
API allows to extract tweets from twitter on any topic, tweets from any location or state or country, also allow to extract tweets
of specific users of twitter as well as tweets in any language. Twitter allow us to only extract tweets upto 3000 at a time. There
are other APIs that helps to extract and search tweets more than 3000 and in large amount of tweets.
F. Dataset
We are using dataset of total 3000 tweets were extracted from Twitter. We use “Us airline dataset” which contains tweets about
Us airlines. This dataset have total number of 14486 tweets. These tweets are in unstructured format that contain unstructured
contents like hashtags, web address, url, special characters and symbols, spaces etc. We are first perform processing and cleaning
on dataset for sentiment analysis. Once analysis get finished sentiment classification is performed to categorizing positive tweets,
neutral tweets and negative tweets. After analysis we get result there are 6313 positive tweets, 5243 negative tweets and 2930
neutral tweets.
III. IMPLIMENTATION
A. Pre-processing
The data extracted from twitter is highly unstructured data, it has to be cleaned and prepared first before analysing it.Pre-
processing the data involves many tasks. In our case we are interested on text only.
As tweets extracted from twitter or data that we are using to analyse sentiment of it are in unstructured manner. This tweets or
data contains content like links, hashtags, some special symbols and characters that not carry any meaning that machines never
understands. All this contents in tweets has meaning for sentiment analysis. To analyse sentiment we required only text from
tweets and data. So all this are done in pre-processing where all links, hashtags, special symbols and characters are removed
from tweets and remaining text are used to analyse sentiment. Tweets also carry some capitalized word and some repeated words
this repetition of word are removed in pre-processing, capitalized words are converted to lower case and tweets also contains
spelling mistake that also are corrected, some words that written in short form that only humans are understand the meaning of
it for example, “Gm” which is stand for “Good morning” and there are spelling mistake like “goood” to ”good” that will help to
understand sentiment of tweets easily. Once processing is done the proceed tweets are used to analyse sentiment of user opinion.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[1235]

e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
Volume:02/Issue:06/June -2020 www.irjmets.com
B. Cleaning Tweets
As the tweets text is highly unstructured, it has to be cleaned and prepared first before analysing it. Cleaning the data involves
many tasks. In our case we are interested on text only. We extracted text from tweets and convert it to data frame, removed URLs
from text, removed stop words like (the, a, to...), usernames and accounts, removed numbers and unnecessary spaces, removed
punctuations and converting encoding (Emoji’s) from latin1 to ASCII. After cleaning the text and removing unnecessary symbols
the analysis are performed.
1.Tweet Crawling
A user initially inputs the topic so that there are some tweets that match the topic. The topic was parsed through the Twitter API
to the server so that some tweets that contained words according to the topic were obtained.
2.Tokenizing
At this stage, the tweets obtained are divided into separate words. Tweets contains spaces are removed and any word is capitalized
that converted to lower case words.
3. Slang removal
This stage aims to replace words that have current terms with standard words according to language rules. This change process
is done by making a list of slang words and their synonyms on the database. Then, match the existing words with the list of slang
words in the database. If it matches, it will be replaced with the synonym. If the word does not match, the word is omitted.
4. Stop Word removal
In a sentence there are several words that indicate that the meaning of the word is less meaningful. Stop word are those words
which are fixed in language like(a,the,an..) such words are removed from tweets. Twitter API has list of stop words, at this stage
each word is match with list and if exists then removed from tweet.
5. Stemming
This stage aims to get the root word on a tweet. The root word is obtained using Porter Stemming which is in the Python library.
This is done as a data reduction so that the processing of tweets has minimal data for processing. This will provide root words
for example from word “running “ it will trim the “ing” from word and replace running word with root word “run”.
6. Numbers removal
At this stage aims to remove number from tweets as number has no value and meaning in the world of sentiment analysis. At
this stage numbers are removed from tweets.
7. Special character and symbols removal
At this stage special symbols and character are removed like (#,*,/,$,¢,https://,www) because special symbols and characters has
not carry any meaning and we required only text for analysis.
8. Calculate emoticons sentiment score
Emoticons is small pictures that help to represent emotions or felling of human without any text to represent emotions. Emoticons
are very useful for communication easily. People use this emoticons to show there opinions in effective way. For example, ☺
, indicates a happy state of mind. At this stage all the emoticons sentiment score are get calculated. For this we created a
dictionary that contains all this emoticons and all symbolic Emoji’s like :) ;) :( . We provided (+ value to positive emoticons and
– value to negative emoticons and else is 0). At this stage first emoticons are identified and matched with dictionary of emoticons
then value of that emoticon is divided by 10 and the value we get that is our emoticon score.
C. Sentiment score
After processing and cleaning we have only text of tweets. This tweets word are taken one by one and by using Vader lexicon
tool of NLTK library of python this words are matched with WordNet words. WordNet is a database of words with meaning of
word and sentiment values of that word. Lexicon Vader tool provide that word value from WordNet. This value of each words
are calculated and denoted as sentiment score of tweets. Once have sentiment score then by using classification of machine
learning we classify each tweets as positive, natural and negative.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[1236]

e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
Volume:02/Issue:06/June -2020                                                                    www.irjmets.com
                          IV.       RESULTS AND ANALYSIS REPORTS
                                                     Table 1

                         Positive                   Neutral                 Negative                Total

 Total no tweets         6313                       2930                    5243                    14486

 Percentage of tweets    44%                        20%                     36%                     100%

                                      Fig. 1: Bar chart of analysed tweets.

                                    Fig. 2: Pie chart representation of tweets.

www.irjmets.com                 @International Research Journal of Modernization in Engineering, Technology and Science
                                                      [1237]

e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
Volume:02/Issue:06/June -2020 www.irjmets.com
V. CONCLUSION
Twitter sentiment analysis is developed to analyse customers sentiment and help to decision making. Sentiment analysis is a tool
that help peoples to take decision on basis of others sentiment towards product. It Helps organizations to find what their customers
thinks about there products and also help to plan best strategies to fulfil customers needs and provide best services. We are using
natural language processing to analyse sentiment of twitter data and classify them into positive sentiment, negative sentiment
and neutral sentiment. After classification of tweets as per sentiments the whole output represented in different charts like bar
chart, line chart and pie chart. We used flask framework of python to represented sentiment analysis on web portal.
VI. REFERENCES
[1] M. Rambocas, and J. Gama, “Marketing Research: The Role of Sentiment Analysis”. The 5th SNA-KDD Workshop’11.
University of Porto, 2013
[2] P. Lai, “Extracting Strong Sentiment Trend from Twitter”. Stanford University, 2012.
[3] M. Comesaña, A. P. Soares, M. Perea, A.P. Piñeiro, I. Fraga, and A. Pinheiro, “ Author ’ s personal copy Computers in
Human Behavior ERP correlates of masked affective priming with emoticons,” Computers in Human Behavior, 29, 588–
595, 2013.
[4] A. H. Huang, D.C. Yen, & X. Zhang, “Exploring the effects of emoticons,” Information & Management, 45(7), 466–473,
2008.
[5] S. Y. Yoo, J. I. Song, and O. R. Jeong, “Social media contents based sentiment analysis and prediction system,” Expert Syst.
Appl., vol.105, pp. 102–111, 2018.
[6] H. Saif, Y. He, and H. Alani, “Semantic Sentiment Analysis of Twitter,” Proceeding of the Workshop on Information
Extraction and Entity Analytics on Social Media Data. United Kingdom: Knowledge Media Institute, 2011
[7] A. Blom and S. Thorsen, “Automatic Twitter replies with Python,” International conference “Dialog 2012”.
[8] B. Pang, and L. Lee, “Opinion mining and sentiment analysis,” 2ndworkshop on making sense of Microposts. Ithaca: Cornell
University. Vol.2(1), 2008.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[1238]

You can also read