Twitter Sentiment Analysis of the Indian Union Budget 2020

Page created by Fernando Holmes
 
CONTINUE READING
Twitter Sentiment Analysis of the Indian Union Budget 2020
International Journal of Advanced Science and Technology
                                                                                    Vol. 29, No. 4s, (2020), pp. 2282 - 2288

Twitter Sentiment Analysis of the Indian Union Budget 2020
            Rupinder Kaur1, Rajvir Kaur2, Manpreet Singh3, Dr. Sandeep Ranjan4*
                            1,2,3
                              M.Tech Scholar, GNA University, Phagwara (India)
                           4
                            Assistant Professor, GNA University, Phagwara (India)

                                                       Abstract
The presented study conducts a real-time sentiment analysis of the public reaction towards the announcement of the
Indian Union Budget 2020. On social media platforms, the general public vents their opinions regarding popular
issues. A total of 6000 tweets were mined to gauge the quick response of the public sentiment about the union budget
which was presented in the Indian Parliament on February 1, 2020, at 11 am. The instantaneous reaction of the
general public in the form of tweets was mined during the budget announcement hours. A sentiment analysis based
model was used to analyze the dataset. Subjectivity and polarity values obtained for the dataset were used to
calculate the sentiment for individual tweets. The experiment produced an overall positive score of +149.3387
based on the Twitter user’s collective opinions.

Keywords: Instant reaction, Sentiment Analysis, TextBlob, Twitter, Union budget.

I. INTRODUCTION
         Events like general elections, movie releases, annual budget announcement, and sports tournaments attract
the attention of the general public. With the advancement in Internet and telecommunication technologies, social
media has emerged as a popular platform for conversations [1]. Twitter is one such platform where users express
their opinions about topics and issues of concern. India is the second-most populous country in the world with a
population of about 1375 million [2] and is the 8th leading country based on the number of active Twitter users. The
number of active Twitter users in India is approximately 34 million [3]. Twitter can be exploited to analyze the
instant reaction of the general public about important events. The present experiment studies a dataset of tweets
related to the Indian Union Budget 2020 presented by the Honorable finance minister of India, Ms. Nirmala
Sitharaman on 1st February 2020, at 11 AM IST. The general public is very curious about the budget as it strongly
affects every aspect of life including the financial sector, health, education, and travel, etc. This paper contains
instantaneous reactions of the public towards the budget as expressed on Twitter.

          In this experiment, a total of 6000 tweets posted during the announcement hours of the union budget were
mined from Twitter. For this purpose, the Twitter Application Program Interface (API) was used for fetching tweets
using hashtag “Budget2020”. To remove noise and other anomalies from data, text processing was performed which
includes removing non-relevant words and unwanted blank spaces. Thereafter, sentiment analysis was done on the
classification of these tweets. Sentiment analysis techniques enable us to make sense of data present in social media
for understanding social or political events, movie releasing or product marketing and to make more informed
decisions [4]. To illustrate the sentiment analysis, two parameters are used; subjectivity and polarity [5]. As a result
of which it can be concluded that whether the response is positive, neutral or negative. The summation of the
sentiment analysis score for the entire dataset represents the collective opinion of Twitter users about the union
budget.

II. LITERATURE REVIEW
         In the past few years, a lot of work has been done on sentiment analysis. Sentiment analysis of tweets
related to the Indian Union Budget 2016 was performed [6]. Lexicon-based sentiment analysis along with a
dictionary of positive and negative words was used. The researchers worked on the co-occurrence of words in the
dataset. Taxation related terms had the highest frequency which reflects the opinions of the general public. It was
concluded that youth and the rural population had a little contribution to the conversations on Twitter.
         Public opinion on large hydro projects was studied using sentiment analysis [7]. A lexicon-based sentiment
analysis model was proposed by the researchers. To show the efficiency of the model, the Three Gorges Project (a

      ISSN: 2005-4238 IJAST                                                                                          2282
      Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology
                                                                                   Vol. 29, No. 4s, (2020), pp. 2282 - 2288

large hydro project) was used as a case study that assisted leaders in decision making related to the project. With the
help of the model, users' opinions were mined from social network sites and the sentiment analysis showed that
about 50% of the responses contained negative sentiment towards the project and the remaining messages were
either neutral or positive.
          The public response to the demonetization policy of the Indian Government to ban INR 500 and 1000
currency notes with effect from 8th Nov 2016 was studied by analyzing tweets [8]. The experiment concluded that
people from 21 out of 30 states and union territories of India considered in the study showed support to this decision
of the union government. Social media user feedback can be used by governments in public policymaking [9]. The
feedback of the general public was seen against the government public policy using sentiment analysis. The research
studied Indonesian language tweets about public policy criticism.
          The European Union (EU) Cohesion Policy in the mass media was analyzed by applying Computational
Text Analysis using three media sources: user-generated content, news media and social media [10]. Sentiment
Analysis of news media revealed that international media publishes a higher negative sentiment compared to the EU
media. User-generated comments from the UK showed a much higher bias towards negative sentiment while in
Spain it is mainly neutral or positive. Sentiment analysis of social media concluded that the majority of tweets and
posts mostly were expressed as objective statements and are largely neutral.
          Sentiment analysis was performed to study international negotiations (Brexit negotiations) to contribute
towards government decision making [11],[12]. The results of the studies provided real-time input for sensitive
policymaking. The European Union constituent country’s policies about the common bonds triggered market
sentiments. These sentiments were analyzed in the context of the Brexit referendum of 2016.
          Government-citizen interactions on important issues were studied for 5 Latin American countries [13]. The
results were validated for work, health, environment, education and social development sectors. A social opinion
gold standard for the Malta Government Budget 2018 has been formulated by researchers [14]. Data from various
social media platforms like Facebook and Twitter; newswires like MaltaToday, Times of Malta was fetched. Malta
has about 90% population on social media platforms and 80% of the population reads online news. The research
claims to be significant for the government in building intelligent tools for the economy.

III. METHODOLOGY

                                      Figure 1. Research model flow Chart

     ISSN: 2005-4238 IJAST                                                                                          2283
     Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology
                                                                                  Vol. 29, No. 4s, (2020), pp. 2282 - 2288

         Figure 1 shows the flow chart of the research model implemented in this study. The initial steps shown in
the flow chart include fetching tweets from Twitter using Tweepy, the Twitter API. These steps form part of the data
mining process. Often tweets contain uniform resource locators (URLs), emojis, hashtags (#) and spelling mistakes.
The text preprocessing step removes noise from the dataset and rectifies spelling and punctuation errors. Sentiment
analysis is performed on the processed dataset to calculate the subjectivity and polarity of individual tweets.
         The weight of individual tweets is calculated using the formulae of the model and summation of the
weights of all the tweets gives a final score of the dataset. This score helps in ascertaining whether the instant
reactions of the general public were for the budget announcements or against them.

A. Data Collection
        Twitter API Tweepy was used to retrieve the Indian Union budget 2020 related tweets. A total of 6000
tweets were fetched for the hashtag “Budget2020” starting at 11:00 AM IST 1st February 2020, the day the union
budget was presented by the Honorable Finance Minister of India, Ms. Nirmala Sitharaman in the Indian Parliament.
The tweets were exported to a Comma Separated Values (csv) file. Figure 2 shows a snapshot of the mined tweets.

                                            Figure 2. Mined Tweets
B. Text Preprocessing
         Text pre-processing is the stage, where the raw data containing noise and impurities is transformed into a
refined form that a classifier can read [15]. The tweets downloaded from Twitter generally result in the noisy and
obscure dataset because of the casual nature of people’s usage of social media. This dataset contains special
characters such as hashtags, user mentions, Uniform Resource Locators (URLs), etc which need to be removed
before doing any analysis.
         A number of preprocessing steps are applied for cleaning noise and other anomalies from data. Some basic
steps such as removing numbers, punctuations, stopwords such as “of”, “in”, “and”, “the”, etc are applied to
standardize the data. It includes removing unwanted symbols such as (“@”, “|”, “#”, “/”), etc. and replace them with
blank space. URLs are not significant for sentiment analysis models as they lead to inadequate features and
inaccurate classification [16]. URLs in tweets are replaced with white space. Figure 3 shows a snapshot of tweets
after preprocessing extracted from the preprocessed column of the csv file. Finally, 4732 tweets were obtained after
the preprocessing phase.

     ISSN: 2005-4238 IJAST                                                                                         2284
     Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology
                                                                                    Vol. 29, No. 4s, (2020), pp. 2282 - 2288

                                          Figure 3. Preprocessed Tweets

C. Sentiment Analysis
         Sentiment analysis provides valuable insights by detecting emotions or opinions from a large volume of
data [17], [18]. Sentiment analysis can be treated as a branch of computational linguistics, machine learning, natural
language processing, data mining and borrows elements from psychology and sociology [19],[20]. In the presented
study, polarity and subjectivity have been computed for each tweet of the dataset using the Python TextBlob library.
Polarity ranges from -1 to +1 and subjectivity ranges from 0 to 1. Subjectivity value close to 0 indicates a highly
objective phrase while the value close to 1 indicates a highly subjective phrase. Polarity value -1 indicates a negative
statement, 0 indicates neutral and +1 indicates a positive statement. The sentiment is calculated as the product of
subjectivity and polarity for individual tweets as shown in table 1.

                                         Table 1. Sentiment computation

                                                                    Polarity    Subjectivity     Sentiment
                               Cleaned tweet                          (p)           (s)            (p*s)

             Today Budget India going very important
             economy moves according to govt already done             -0.525                 1        -0.525

             Insurance Bank Deposits raised from lakh rupees         0.6786                  0              0

             Here India gets money spends                                 0.7                0              0

             allocation approved budget Mission expected
             provide piped water                                          0.5              0.4            0.2

             budget speech political rally speech                       0.45               0.1         0.045

             Airports                                                       0                0              0

             Setting large solar panel capacity alongside
             railway tracks land owned railways                             0          0.4286               0

      ISSN: 2005-4238 IJAST                                                                                          2285
      Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology
                                                                                 Vol. 29, No. 4s, (2020), pp. 2282 - 2288

        Words having a high occurrence frequency depict their significance in the topic of discussion. Table 2
shows the most common words in the dataset with their frequency of appearance.

                                              Table 2 Word Frequency

                                   S.No.            Words           Frequency

                                     1       Budget                    2606

                                     2       Speech                     536

                                     3       Finance                    423

                                     4       Government                 377

                                     5       Minister                   375

                                     6       Jobs                       332

                                     7       India                      317

                                     8       Youth                      311

                                     9       Income                     277

                                    10       Union                      180

        Wordcloud representation is an effective visualization technique to understand the dataset in view of the
most popular words. Figure 4 shows the word cloud of high-frequency words in the dataset.

                                           Figure 4. Word Cloud

     ISSN: 2005-4238 IJAST                                                                                        2286
     Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology
                                                                                  Vol. 29, No. 4s, (2020), pp. 2282 - 2288

IV. RESULTS
         A sentiment analysis calculation was done for each of the 4732 tweets. Considering only the polarity
values, the number of negative tweets was 506, positive was 1690 and neutral was 2537. The mean subjectivity of
the dataset was 0.28 with 2277 tweets having 0 subjectivity and 3344 tweets had subjectivity less than 0.5. The final
sentiment score obtained has been calculated as the summation of sentiment scores of all the tweets.

Final sentiment score of the dataset = +149.3387

V. CONCLUSIONS
          The study focused on the sentiment analysis of the instant reaction of Twitter users to the Union Budget of
India for the year 2020. A total of 6000 tweets were mined for the hashtag “Budget2020”. After preprocessing, a
total of 4732 tweets are analyzed for sentiment analysis. The final sentiment score of the dataset is +149.3387 which
implies that the overall sentiment of the general public towards the Indian Union budget of 2020 is positive. The
results of the study are limited due to a large number of neutral sentiment tweets that can be improved by training
the model with large and labeled datasets and building a custom dictionary of words for better sentiment analysis.

REFERENCES

[1] S. Ranjan and S. Sood, Exploring Twitter for Large Data Analysis, International Journal of Advanced Research
     in Computer Science and Software Engineering, 6(7), 2016, 325–330.
[2] India Population (2020) - Worldometer. [Online]. Available: https://www.worldometers.info/world-
     population/india-population/. [Accessed: 18-Feb-2020].
[3] Twiter users in India 2019 | Statista. [Online]. Available: https://www.statista.com/statistics/381832/twitter-
     users-india/. [Accessed: 18-Feb-2020].
[4] C. A. Iglesias and A. Moreno, Sentiment analysis for social media, Applied Sciences (Switzerland), 9(23), 2019,
     1-4.
[5] S. Ranjan and S. Sood, Investor community sentiment analysis for predicting stock price trends, International
     Journal of Management, Technology and Engineering, 9(5), 2019, 6012-6020.
[6] M. Shakeel and V. Karwal, Lexicon-based sentiment analysis of Indian Union Budget 2016-17, Proc. 2016
     International Conference on Signal Processing and Communication, 2016, 299-302.
[7] H. Jiang, P. Lin, and M. Qiang, Public-opinion sentiment analysis for large hydro projects, Journal of
     Construction Engineering and Management, 142(2), 2016, 1-12.
[8] P. Singh, R. S. Sawhney, and K. S. Kahlon, Sentiment analysis of demonetization of 500 & 1000 rupee
     banknotes by Indian government, ICT Express, 4(3), 2018, 124-129.
[9] Y. Watequlis Syaifudin and D. Puspitasari, Twitter Data Mining for Sentiment Analysis on Peoples Feedback
     Against Government Public Policy, MATTER: International Journal of Science and Technology, 3(1), 2017,
     110-122.
[10] J. M. Carrascosa, C. Mendez, and V. Triga, EU Cohesion policy in the media : A computational text analysis of
     online news, user comments and social media, Cyprus University of Technology, University of Strathclyde,
     European Policies Research Centre. [online]. Available from:
     http://www.cohesify.eu/wpcontent/uploads/2018/04/CTA_Report_ReseasrchPaper12_CUTfinal.pdf             693127,
     2020.
[11] E. Georgiadou, S. Angelopoulos, and H. Drake, Big data analytics and international negotiations: Sentiment
     analysis of Brexit negotiating outcomes, International Journal of Information Management, October, 2019,
     102048.
[12] P. Schwendner, M. Schüle, and M. Hillebrand, Sentiment Analysis of European Bonds 2016-2018, Frontiers in
     Artificial Intelligence, 2, 2019, 20.
[13] R. B. Hubert, E. Estevez, A. Maguitman, and T. Janowski, Examining government-citizen interactions on
     twitter using visual and sentiment analysis, Proc. ACM International Conference Proceeding Series, 2018, 1-
     10.
[14] K. Cortis and B. Davis, A Social Opinion Gold Standard for the Malta Government Budget 2018, Proc. 5th
     Workshop on Noisy User-generated Text (W-NUT 2019), 2019, 364-369.

     ISSN: 2005-4238 IJAST                                                                                         2287
     Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology
                                                                                Vol. 29, No. 4s, (2020), pp. 2282 - 2288

[15] J. P. Pinto and V. Murari, Real Time Sentiment Analysis of Political Twitter Data Using Machine Learning
     Approach, International Research Journal of Engineering and Technology(IRJET), 6(4), 2019, 4124-4129.
[16] S. Joshi and D. Deshpande, Twitter Sentiment Analysis System, International Journal of Computer Application,
     180(47), 2018, 35-39.
[17] A. Hasan, S. Moin, A. Karim, and S. Shamshirband, Machine Learning-Based Sentiment Analysis for Twitter
     Accounts, Mathematical and Computational Application, 23(1), 2018, 11.
[18] U. Yaqub, N. Sharma, R. Pabreja, S. A. Chun, V. Atluri, and J. Vaidya, Analysis and visualization of
     subjectivity and polarity of twitter location data, Proc. ACM International Conference, 2018, 1-10.
[19] L. Yue, W. Chen, X. Li, W. Zuo, and M. Yin, A survey of sentiment analysis in social media, Knowledge and
     Information Systems, 60(2), 2019, 617-663.
[20] A. Kumar and A. Jaiswal, Systematic literature review of sentiment analysis on Twitter using soft computing
     techniques, Concurrency Computation, 32(1), 2020, 1-29.

     ISSN: 2005-4238 IJAST                                                                                       2288
     Copyright ⓒ 2020 SERSC
You can also read