Datasets for Aspect-Based Sentiment Analysis in Bangla and Its Baseline Evaluation - MDPI

Page created by Tiffany Reese
 
CONTINUE READING
data
Data Descriptor
Datasets for Aspect-Based Sentiment Analysis in
Bangla and Its Baseline Evaluation
Md. Atikur Rahman * and Emon Kumar Dey *
 Institute of Information Technology, University of Dhaka, Dhaka 1000, Bangladesh
 * Correspondence: bsse0521@iit.du.ac.bd (M.A.R.); emonkd@iit.du.ac.bd (E.K.D.)
                                                                                                     
 Received: 20 March 2018; Accepted: 2 May 2018; Published: 4 May 2018                                

 Abstract: With the extensive growth of user interactions through prominent advances of the Web,
 sentiment analysis has obtained more focus from an academic and a commercial point of view.
 Recently, sentiment analysis in the Bangla language is progressively being considered as an important
 task, for which previous approaches have attempted to detect the overall polarity of a Bangla
 document. To the best of our knowledge, there is no research on the aspect-based sentiment analysis
 (ABSA) of Bangla text. This can be described as being due to the lack of available datasets for ABSA.
 In this paper, we provide two publicly available datasets to perform the ABSA task in Bangla. One of
 the datasets consists of human-annotated user comments on cricket, and the other dataset consists
 of customer reviews of restaurants. We also describe a baseline approach for the subtask of aspect
 category extraction to evaluate our datasets.

 Dataset: https://github.com/AtikRahman/Bangla_ABSA_Datasets

 Dataset License: CC0

 Keywords: ABSA dataset; Bangla ABSA; aspect extraction from Bangla

1. Summary
      People trust human opinion more so than traditional advertising. For example, consumers are
used to seeking advice and recommendation from others before making decisions regarding important
purchases. Word of mouth (WOM) has always been salient for consumers when making a decision.
Such referrals have a strong impact on both customer decision-making and new customer acquisition
for the purchasing of a company’s product or service [1]. On the other hand, organizations are
eager to mine all the activities and interactions of people to understand what their weaknesses and
strengths are. This understanding would help them to develop their organizational strategy in this
competitive world.
      Sentiment analysis (or opinion mining) is a process to determine the viewpoint of a person on
a certain topic. It classifies the polarity of a document (i.e., review, tweet, blog, or news), that is,
whether the communicated opinion is positive, negative, or neutral. There are three levels at which
sentiment is analyzed [2]: the document level, sentence level, and aspect level. The document level
considers that a document has an opinion on an entity, and the task is to classify whether an entire
document expresses a positive or negative sentiment. The task at the sentence level regards sentences
and determining whether each sentence expresses a positive, negative, or neutral opinion. Neither
the document level nor the sentence level analysis discover exactly what people liked and did not
like. The aspect level (or aspect-based sentiment analysis—ABSA) performs a finer-grained analysis
that identifies the aspects of a given document or sentence and the sentiment expressed towards

Data 2018, 3, 15; doi:10.3390/data3020015                                           www.mdpi.com/journal/data
Data 2018, 3, 15                                                                                   2 of 10

each aspect. This level of analysis is the most detailed version that is capable of discovering complex
opinions from reviews.
     There are two major tasks when performing ABSA. The first is to extract the specific areas or
aspects mentioned in the opinioned review. The second is to identify the polarity (either positive,
negative, or neutral) for every aspect. For example, the following review of a restaurant reveals
two aspects: service and food. Both aspects have a positive polarity.
     “The service was excellent and the food was delicious.”
     As one can see, the name of the aspect categories are explicitly mentioned in this review. A review
might also contain implicit categories; for example, “The staff makes you feel at home and the chicken is
great.” Here, the same aspects, “service” and “food”, are contained without being directly mentioned.
     Semantic Evaluation (SemEval), a reputed workshop in the NLP domain, introduced a complete
dataset [3] in English for the ABSA task. Later this was expanded to the ABSA task by adding
multi-lingual datasets in which eight languages over seven domains were incorporated. To perform
ABSA, datasets of several languages, such as Arabic [4], Czech [5], and French [6], were created.
There is no dataset for Bangla in the field of ABSA. Consequently, no work is being done to extract
aspects and to identify corresponding polarities for Bangla reviews. We are currently working on a
project to extract the aspects from a Bangla review or comments for a particular product of a company,
as online shopping is very popular nowadays in Bangladesh and is growing rapidly. People like to
buy products online after reading the comments of others.
     In this paper, we have created two new datasets that serve as a benchmark for the ABSA domain
in Bangla texts. We present two datasets named “Cricket” and “Restaurant”. The first dataset
contains 2900 comments on cricket over 5 aspect categories, and the second dataset contains 2600
restaurant reviews.
     Because there is no work in Bangla for the ABSA task, we have introduced ABSA by extracting
aspect categories from Bangla texts in order to evaluate our datasets. We performed the task
with different training approaches and found a satisfactory outcome compared to evaluations of
other languages.
     There are some related works from which we founded the idea of this topic. The restaurant
review dataset, provided by Ganu et al. [7], was used to improve rating predictions. Their annotations
included six aspect categories and overall sentence polarities. They had not prepared a complete ABSA
dataset, as the aspect category was present but the corresponding polarity of that aspect was absent.
The SemEval 2014 evaluation campaign [3] extended their dataset by adding three more fields with the
aspect category. They published their dataset with four fields being contained for each review, that is,
with the aspect term occurring in the sentences, the aspect term’s polarity, the aspect category, and the
aspect category’s polarity. They also provided a laptop-review dataset and manually annotated with
similar entities as for the restaurant dataset. These are the benchmark datasets that [8–11] researches
have used for performing the ABSA task.
     The task was repeated in SemEval 2015 [12], for which aspect categories were the combination of
the entity type and an attribute type. Multilingual datasets were released in the SemEval 2016
workshop [13] on the seven domains (restaurant, laptop, mobile phone, digital camera, hotel,
and museum) and in eight languages (English, Arabic, French, Chinese, Turkish, Spanish, Dutch,
and Russian).
     A book-review dataset in the Arabic language was provided by [4]. They annotated book reviews
into 14 categories and 4 types of polarities, including “Conflict”. In [5], the author created an IT
product-review dataset for the ABSA task, in which a total 2200 reviews were contained.
     The contribution of this paper is as follows:

•     We have collected and presented two Bangla datasets for ABSA and have made them
      publicly available.
•     We performed statistical linguistic analysis on the datasets.
Data 2018, 3, 15                                                                                                 3 of 10

•      We implemented state-of-the-art machine learning approaches for the collected datasets and
 Data 2018, 3, x                                                                           3 of 11
       found satisfactory accuracies.

2.2.Data
    DataDescription
         Description
       “DigitalBangladesh”
      “Digital    Bangladesh”is is thethe  integral
                                       integral  partpart    ofBangladesh
                                                        of the  the Bangladesh     government’s
                                                                             government’s   VisionVision   2021.
                                                                                                   2021. The      The
                                                                                                              Internet
 Internet   is growing    very   fast over  the  country,    and   people  are using  different  online
is growing very fast over the country, and people are using different online platforms in every aspect  platforms   in
ofevery
   theiraspect    of their
         lives. This       lives. This
                       encouraged     usencouraged
                                         to constructus     to construct
                                                         datasets         datasets
                                                                    to analyze      to analyze
                                                                               the Bengali      the Bengali
                                                                                            people’s         people’s
                                                                                                      opinions  and to
 opinions    and  to  extract their  sentiments    in  different  aspects. In this paper, we  have  constructed
extract their sentiments in different aspects. In this paper, we have constructed two different datasets,         two
 different   datasets,   namely,    the  Cricket    dataset   and   Restaurant   dataset,
namely, the Cricket dataset and Restaurant dataset, to evaluate the people’s opinions.    to  evaluate  the  people’s
 opinions.
2.1. Cricket Dataset
 2.1. Cricket Dataset
      The Cricket dataset consists of 2900 different comments from different online sources with
       The Cricket
five different         dataset
                   aspect       consists of 2900
                             categories.     Mostdifferent
                                                      of the comments
                                                                commentsfrom are different
                                                                                  collectedonline
                                                                                             from sources
                                                                                                    Facebookwithpages
                                                                                                                  five
 different     aspect   categories.    Most    of     the   comments      are   collected
(https://www.facebook.com/BBCBengaliService/; https://www.facebook.com/DailyProthomAlo/).  from    Facebook    pages
 (https://www.facebook.com/BBCBengaliService/;
Some   comments are collected from two popular Bengali           https://www.facebook.com/DailyProthomAlo/).
                                                                      Websites, BBC Bangla (http://www.bbc.com/
 Some      comments        are    collected     from      two
bengali), and the Daily Prothom Alo (http://www.prothomalo.com).  popular     Bengali    Websites,
                                                                                    This dataset       BBC Bangla
                                                                                                 was collected  by the
 (http://www.bbc.com/bengali),          and   the   Daily    Prothom     Alo  (http://www.prothomalo.com).
authors of this paper. The comments are of different lengths and each review contains approximately              This
 dataset   was   collected  by  the  authors  of  this  paper.   The  comments    are of different
3–100 Bangla words. The reasons behind choosing these Websites for collecting data are given below:lengths  and  each
 review contains approximately 3–100 Bangla words. The reasons behind choosing these Websites for
•collecting   data areand
      BBC Bangla        given
                            thebelow:
                                  Daily Prothom Alo are very popular online news sites for the Bengali
      community
        BBC Banglaall    around
                       and         the world.
                            the Daily           They
                                          Prothom    Aloare popular
                                                          are         for publishing
                                                              very popular              trustworthy
                                                                                online news    sites forand
                                                                                                          theauthentic
                                                                                                              Bengali
       news.
        community all around the world. They are popular for publishing trustworthy and share
               Bengali   people   frequently   read  the  news   and  sometimes     make    comments     to        their
                                                                                                            authentic
       opinion.
        news. Bengali people frequently read the news and sometimes make comments to share theirof
                  Although   people    write their comments     or opinions    in both Bangla   and  English,  most
       the time, they
        opinion.        choose
                  Although       Bangla.
                              people       Wetheir
                                       write   studied   different
                                                   comments         articles and
                                                                or opinions        found
                                                                              in both      thatand
                                                                                       Bangla   in almost
                                                                                                    English,90%
                                                                                                              mostof the
                                                                                                                     of
       cases, people   expressed    their opinion  in Bangla.
        the time, they choose Bangla. We studied different articles and found that in almost 90% of the
•      The  Facebook
        cases,           page of Prothom
               people expressed               Alo has
                                     their opinion       over 13 million followers, and BBC Bangla has over
                                                    in Bangla.
      11 million.
        The         These
             Facebook      twoofpages
                         page            provide
                                   Prothom   Alo enormous
                                                  has over 13text  postsfollowers,
                                                                million    as well as and
                                                                                      a large
                                                                                            BBCnumber
                                                                                                 Banglaof   comments.
                                                                                                          has over 11
•       million.isThese
       Cricket     one oftwo
                           thepages
                                mostprovide
                                       popularenormous      text postsfor
                                                 games nowadays         as Bengali
                                                                           well as apeople.
                                                                                     large number
                                                                                              We foundof comments.
                                                                                                           that people
       Cricket
       are  moreisinterested
                    one of the in
                                most   popular
                                    making       games nowadays
                                              comments                for Bengalinews
                                                            on cricket-related      people.
                                                                                          thanWeonfound
                                                                                                     any that  people
                                                                                                           other  topic.
        are more   interested  in making    comments
       Thus, we chose this category for our experiment.  on cricket-related   news  than  on  any  other  topic. Thus,
        we chose this category for our experiment.
       Table 1 shows an example of comments collected from the Facebook pages.
        Table 1 shows an example of comments collected from the Facebook pages.
         Table 1. Example of cricket-related comments on Prothom Alo and BBC Bangla Facebook pages.
          Table 1. Example of cricket-related comments on Prothom Alo and BBC Bangla Facebook pages.

                                        Comments                                                   Source
     মাশরািফ এক জাদুকারী নাম । য নামটা নেলই মন ভের যায়। আমােদর
                                                                                         Prothom Alo Facebook page
     জিহর,জনসন, পালক, টিল নাই তেব এক জন মাশরািফ আেছ
      বালারেদরও দাষ নই, দাষটা 100% ম ােনজেমে র। পস বালারেদর জন উইেকট তির তা
                                                                                         Prothom Alo Facebook page
     তারাই কের না। দাষটা ম ােনজেম না মানেল আিম নব।
     আশা কির তাসিকন অিত ত    ু দেল িফরেব আর িনয়িমত খলেব এবং চর আউট করেব।                 Prothom Alo Facebook page
     এখন দশকেদর কােছ জনি য় হে 20--- ট /ওয়ানেড মানুষ এখন দখেতই চায়না..                    Prothom Alo Facebook page
     হারেলও বাংলােদশ জতেলও বাংলােদশ।আগামীেত আবার আমরাই জতেবা। ইশ! আমােদর যিদ
                                                                                         BBC Bangla Facebook page
     িবরাট কাহিলর মেতা একটা ব াট ান থাকেতা
     আমার পরামশ হেলা      েকটারেদর কেপােরট জগত থেক দুের রাখেত হেব।                       BBC Bangla Facebook page

      Peopleusually
     People   usually comment
                      comment inin Bangla
                                   Bangla about
                                           about the
                                                 the news.
                                                      news. We
                                                            We also
                                                                also found
                                                                      foundthat
                                                                            that5–10%
                                                                                 5–10%of
                                                                                       ofthe
                                                                                          thetime,
                                                                                              time,they
                                                                                                    they
 commented in English and also wrote Bangla sentences written in the English alphabet. We did not
commented in English and also wrote Bangla sentences written in the English alphabet. We did not
 consider these opinions in our dataset. In addition, some comments had only emoticons and no other
consider these opinions in our dataset. In addition, some comments had only emoticons and no
 text or words. We also omitted these for our dataset. All of the processes were done manually by the
other text or words. We also omitted these for our dataset. All of the processes were done manually
 authors. The following section describes the annotation process of the collected corpus of cricket-
by the authors. The following section describes the annotation process of the collected corpus of
 related comments.
cricket-related comments.
 2.1.1. Annotation of Cricket Dataset
     The Bangla text on cricket was annotated jointly by the authors, a group of second-year students
 of BSSE, and two employees from the Institute of Information Technology, University of Dhaka,
Data 2018, 3, 15                                                                                                        4 of 10

2.1.1. Annotation of Cricket Dataset
     The Bangla text on cricket was annotated jointly by the authors, a group of second-year students
of BSSE, and two employees from the Institute of Information Technology, University of Dhaka,
Bangladesh. All participants agreed to categorize the whole dataset into five different aspect categories.
   Data
These   2018,
        were3,3,xbowling,
   Data 2018,    x          batting, team, team management, and other. Given a comment,4 the                   of 114task
                                                                                                                      of 12 of

the annotators was to recommend the aspect category and polarity labels for each. Three types of
   Bangladesh.
   Bangladesh.      All participants
                         participantsagreed
                                        agreedtotocategorize
                                                       categorize the   whole   dataset into
                                                                                           five five  different    aspect
polarities  were All
                   considered,   that is, positive,    negative,theandwhole  dataset
                                                                       neutral. Tableinto
                                                                                       2 shows   different  aspect
                                                                                                  the information      about
   categories. These
                  These were bowling, batting, team, team management, and other. Given a comment, the
thecategories.
    participants. were bowling, batting, team, team management, and other. Given a comment, the
   task of the
            theannotators
                 annotatorswas
                             wastotorecommend
                                     recommend     thethe aspect
                                                        aspect    category
                                                               category andand  polarity
                                                                            polarity     labels
                                                                                     labels      for each.
                                                                                            for each. ThreeThree
                                                                                                             typestypes
   of polarities   wereconsidered,
      polarities were    considered,that
                                       thatis,is, positive,
                                               positive,    negative,
                                                          negative, andand  neutral.
                                                                        neutral.     Table
                                                                                 Table      2 shows
                                                                                       2 shows         the information
                                                                                                  the information
                            Table 2. Information about the participants in data collection.
   about the
           the participants.
                participants.
  Participant ID       Gender               Profession                                           Task
                           Table
                            Table2.2.Information
                                       Informationabout  thethe
                                                      about  participants in data
                                                                participants        collection.
                                                                               in data   collection.
         P1             Male          MS student/author                   Data collection (Cricket) and annotation
          Participant
          ParticipantID   Gender
                      IDMale
                           Gender         Profession
                                            Profession                               TaskTask
         P2                              Faculty/author                   Data collection     (Cricket) and annotation
                P1
                 P1        Male       MS  student/author         Data  collection  (Cricket)  and annotation
         P3             MaleMale        MS  student/author
                                        Graduate    student          Data collection
                                                                    Annotation          (Cricket)
                                                                                    (Cricket)   andand  annotation
                                                                                                     translation (Restaurant)
                P2
                 P2        Male
                            Male        Faculty/author
                                          Faculty/author         Data  collection
                                                                     Data          (Cricket)
                                                                           collection         and annotation
                                                                                        (Cricket)  and  annotation
         P4            Female           Graduate    student         Annotation      (Cricket)   and  translation (Restaurant)
                P3         Male       Graduate student Annotation (Cricket) and translation (Restaurant)
         P5      P3    Female
                            Male        Graduate
                                        Graduate student            Annotation
                                                    student Annotation              (Cricket)
                                                                             (Cricket)          and translation
                                                                                          and translation        (Restaurant)
                                                                                                           (Restaurant)
                P4        Female      Graduate student Annotation (Cricket) and translation (Restaurant)
         P6      P4     Male
                           Female Graduate
                                        Graduate student            Annotation
                                                    student Annotation              (Cricket)
                                                                             (Cricket)          and translation
                                                                                          and translation        (Restaurant)
                                                                                                           (Restaurant)
                P5        Female      Graduate student Annotation (Cricket) and translation (Restaurant)
         P7      P5     Male
                           Female Graduate
                                        Graduate student            Annotation
                                                    student Annotation              (Cricket)
                                                                             (Cricket)          and translation
                                                                                          and translation        (Restaurant)
                                                                                                           (Restaurant)
                P6         Male       Graduate student Annotation (Cricket) and translation (Restaurant)
         P8      P6     MaleMale        Graduate
                                        Graduate student            Annotation
                                                    student Annotation              (Cricket)
                                                                             (Cricket)          and translation
                                                                                          and translation        (Restaurant)
                                                                                                           (Restaurant)
                P7         Male       Graduate student Annotation (Cricket) and translation (Restaurant)
         P9      P7    Female
                            Male           Accountant
                                        Graduate    student    Annotation    (Cricket)    andAnnotation
                                                                                               translation (Restaurant)
                P8         Male       Graduate student Annotation (Cricket) and translation (Restaurant)
         P10 P8         MaleMale              Officer
                                        Graduate    student Annotation (Cricket)          andAnnotation
                                                                                               translation (Restaurant)
                P9        Female         Accountant                              Annotation
                 P9
                P10        Female
                           Male             Accountant
                                            Officer                                   Annotation
                                                                                 Annotation
                P10         Male              Officer                                 Annotation
     Each participant categorized every comment of the dataset. We applied the majority voting
        Eachtoparticipant
technique                 categorized
               make the final         every
                               decision     comment
                                         about        of thecategory
                                                the aspect    dataset. and
                                                                       We applied   the majority
                                                                           the polarity           voting As an
                                                                                          of a sentence.
        Each to
   technique  participant  categorized
                make the final decision every comment
                                        about the        of the dataset.
                                                  aspect category and theWe    applied
                                                                           polarity of a the majority
                                                                                         sentence.     voting
                                                                                                   As an
example,  we have taken the following comment:
   technique to make the final decision about the aspect category and the polarity of a sentence. As an
   example, we have taken the following comment:
                            এই following
   example, we have taken“the  িপেচ রান করা  টাফ, বািলং িনঃসে েহ ভােলা হেয়েছ ”
                                           comment:
       The voting result we  “এই
                            foundিপেচ
                                  for  রান
                                      the  করা টাফ,is is
                                          comment
     The voting result we found for the comment       বািলং
                                                      given িনঃসে
                                                            in Table
                                                         given       েহ
                                                                     3. ভােলা
                                                               in Table  3. হেয়েছ ”
       The voting result we found for the comment is given in Table 3.
                              Table 3. Voting example to define the category and polarity.
                             Table 3. Voting example to define the category and polarity.
                              Table 3. এই
                            Comment:   Voting
                                          িপেচ example  to define
                                               রান করা টাফ,  বািলংthe category
                                                                   িনঃসে       andহেয়েছ
                                                                           েহ ভােলা polarity.
                           Participant
                           Comment: এই Voting
                                           িপেচ রানforকরা
                                                       Category
                                                          টাফ, বািলংVoting
                                                                     িনঃসে forেহPolarity
                                                                                 ভােলা হেয়েছ
                          Comment:
                               P1                 Bowling                 Positive
                            Participant      Voting for Category        Voting for Polarity
                         Participant
                               P2               Voting  for Category
                                                  Bowling                    Voting for Polarity
                                                                          Positive
                                 P1                 Bowling                   Positive
                               P3
                              P1P2                Batting
                                                       Bowling            Negative Positive
                                                    Bowling                   Positive
                               P4
                              P2                  Batting
                                                       Bowling            Negative Positive
                                 P3                 Batting                   Negative
                               P5                  Other                   Neutral
                              P3                       Batting                     Negative
                               P6P4                 Batting
                                                  Bowling                     Negative
                                                                          Positive
                              P4P5                     Batting
                                                      Other                        Negative
                                                                               Neutral
                               P7                  Other                   Neutral
                              P5                        Other                      Neutral
                               P8P6                 Bowling
                                                  Bowling                     Positive
                                                                          Positive
                              P6                       Bowling                     Positive
                               P9P7                   Other
                                                  Bowling                      Neutral
                                                                          Positive
                              P7                        Other                      Neutral
                                 P8
                               P10                  Bowling
                                                  Batting                     Positive
                                                                          Negative
                              P8                       Bowling                     Positive
                                 P9                 Bowling                   Positive
                              P9                       Bowling                     Positive
       From Table 3, we can      P10
                               see that the commentsBatting                   Negative
                                                         had three votes for Batting  with a negative polarity,
                             P10                       Batting                     Negative
  two votes for Other with a neutral polarity, and four votes for Bowling with a positive polarity. Thus,
       From Table
  our method         3, we can
               determined    thissee that theascomments
                                   comment                  had
                                                 being in the    three votes
                                                              Bowling         for Batting
                                                                        category          with a negative
                                                                                  with a positive  polarity. polarity,
                                                                                                             We
    From
  two      Table
  also votes      3,
             forfor
       had ties      we
                 Other    can
                    somewith   see  that
                               a neutral
                           comments.     the  comments
                                       Inpolarity,          had
                                                     and four
                                           this situation,       three
                                                               votesboth
                                                           we took      votes  for
                                                                      for Bowling  Batting  with
                                                                                    with with
                                                                          the categories       their negative
                                                                                                  a
                                                                                          a positive  polarity.inpolarity,
                                                                                                     polarity    Thus,
two votes
  our      forTable
      method
      dataset.  Other    withthis
                determined
                      4 shows   aanneutral
                                    comment
                                    example  polarity,
                                                asthis
                                              for  being and  four
                                                          in the
                                                       scenario.    votes category
                                                                 Bowling    for Bowlingwith with  a positive
                                                                                            a positive  polarity.polarity.
                                                                                                                   We
Thus,
   alsoour
        hadmethod
             ties for determined this comment
                      some comments.              as beingwe
                                       In this situation,  intook
                                                              the Bowling  category with
                                                                  both the categories withatheir
                                                                                            positive  polarity.
                                                                                                 polarity in
Weour
    alsodataset.
         had tiesTable  4 shows
                    for some    an example
                             comments.    In for
                                             thisthis scenario.
                                                  situation, we took both the categories with their polarity
in our dataset. Table 4 shows an example for this scenario.
Data 2018, 3, 15                                                                                                              5 of 10
Data 2018, 3, x                                                                                                               5 of 11

                                           Table 4. Voting category identification.
       Data 2018, 3, x                  Table 4. Voting category identification.                                        5 of 12

                                  Comment: ওরা4.200
                                       Table
                                  Comment:          কেরেছ,
                                                 Voting     তামরাidentification.
                                                        category  100 করেত পারেব না?
                              Participant
                              Participant   Voting
                                      Comment: ওরা
                                                      forfor
                                                          Category
                                                   200 কেরেছ,
                                                Voting                  Voting for Polarity
                                                              তামরা 100 করেতVoting
                                                             Category        পারেব না?
                                                                                    for Polarity
                                  P1
                                 P1
                                 Participant          Batting
                                                 Voting  Batting
                                                        for Category         VotingNegative
                                                                                    for Negative
                                                                                        Polarity
                                  P2P1
                                 P2                    Team
                                                          Team
                                                       Batting                     Negative
                                                                                        Negative
                                                                                 Negative
                                  P3P2
                                 P3                   Batting
                                                         Batting
                                                        Team                       Negative
                                                                                        Negative
                                                                                 Negative
                                 P4
                                  P4P3                   Batting
                                                      Batting
                                                       Batting                          Negative
                                                                                   Negative
                                                                                 Negative
                                 P5                      Batting                        Negative
                                  P5P4                Batting
                                                       Batting                     Negative
                                                                                 Negative
                                 P6                       Team                          Negative
                                  P6P5
                                 P7
                                                       Batting
                                                       Team
                                                          Team
                                                                                 Negative
                                                                                   Negative
                                                                                        Negative
                                  P7P6
                                 P8                     Team
                                                       Team
                                                         Batting                 Negative
                                                                                   Negative
                                                                                        Negative
                                  P8P7
                                 P9                     Team
                                                          Team
                                                      Batting                    Negative
                                                                                        Negative
                                                                                   Negative
                                 P10 P8                Batting
                                                          Team                   Negative
                                                                                        Negative
                                  P9                   Team                        Negative
                                     P9                 Team                     Negative
                                  P10P10               Team
                                                        Team
                                                                                   Negative
                                                                                 Negative
      We can see from Table 4 that 50% of the evaluators voted for Batting, and 50% of the votes were
      We
for the   can
           We see
        Team    can from
                category,
                    see fromTable
                             both   4 that
                                   with
                                Table       50%
                                              50%ofofthe
                                         a negative
                                       4 that             evaluators
                                                       polarity.
                                                       the               voted
                                                                   As they
                                                           evaluators        had
                                                                         voted forfor Batting,
                                                                                   tied, ourand
                                                                                    Batting,     and 50%
                                                                                              algorithm
                                                                                                  50%        of votes
                                                                                                            took
                                                                                                        of the  the  votes
                                                                                                                   both      were
                                                                                                                         of these
                                                                                                                       were
for the
      forTeam
categories       category,
          theinTeam
                the  labeled
                      category,both  with
                                dataset
                                  both      a anegative
                                          with
                                        with     anegative
                                                   negativepolarity.
                                                               polarity.
                                                            polarity.   Asthey
                                                                        As  theyhadhadtied,
                                                                                        tied,
                                                                                            ourour  algorithm
                                                                                                 algorithm    tooktook
                                                                                                                    bothboth
                                                                                                                          of of
theseWecategories
      these
          also      in the
            categories
                faced         labeled
                          in the
                         another       dataset
                                 labeled
                                    kind  dataset with
                                                    withaato
                                           of problem     negative
                                                           negative
                                                              constructpolarity.
                                                                       polarity.
                                                                           the dataset. After calculating the category,
we found   We
      We also   also
                 facedfaced   another
                         another
             dissimilarities           kind
                                 forkind
                                     some  ofof  problemto
                                              problem
                                             comments      toamong
                                                               construct
                                                              constructthethe
                                                                           thedataset.
                                                                               dataset.
                                                                            participants After calculating
                                                                                           After  calculating
                                                                                            regarding     thethepolarity
                                                                                                                 category,
                                                                                                                  the category,
                                                                                                                           of the
      we
comment.  found
we found For      dissimilarities
             dissimilarities
                  example, when     for some
                                 for some       comments
                                             comments
                                      we considered          among
                                                         theamong      the participants
                                                                       the comment,
                                                               following    participants  regarding
                                                                                         weregarding  the
                                                                                             found the the polarity
                                                                                                          voting     of the
                                                                                                                polarity
                                                                                                                    result of  the
                                                                                                                            given
      comment.
comment.
in Table  5. For For    example,
                   example,        when
                                when    weweconsidered
                                                considered thethe following
                                                                   following comment,
                                                                                 comment,   wewefound
                                                                                                   foundthe the
                                                                                                             voting   resultresult
                                                                                                                  voting
      given in Table 5.
given in Table 5.
                                    Table 5. Problem related to polarity determination.
                                    Table 5. Problem related to polarity determination.
                                 Table 5. Problem related to polarity determination.
                 Comment: রা াক বুেড়া হেয় গেছ, খলা পােরনা। তাহেল আজ িকভােব িক করেলা?
                 Comment:
              Comment:   রা াক বুেড়া হেয়
                    Participant           গেছ,
                                     Voting      খলা পােরনা। তাহেলVoting
                                            for Category           আজ িকভােব      িক করেলা?
                                                                         for Polarity
                     Participant        Voting for Category        Voting for Polarity
                Participant
                        P1         Voting Bowling
                                           for Category            Voting   for Polarity
                                                                       Positive
                          P1                 Bowling                     Positive
                    P1 P2                  Bowling
                                         Bowling                       Positive
                                                                          Positive
                          P2                 Bowling                     Positive
                    P2 P3 P3
                                            Team
                                         Bowling
                                               Team
                                                                      Negative
                                                                          Positive
                                                                        Negative
                    P3 P4 P4
                                           Bowling
                                           Team
                                             Bowling
                                                                      Negative
                                                                         Negative
                                                                        Negative
                        P5                  Other                      Positive
                    P4 P5                      Other
                                         Bowling                         Positive
                                                                         Negative
                        P6P6                Team
                                               Team                   Negative
                                                                        Negative
                    P5                     Other                          Positive
                        P7P7                Other
                                               Other                  Negative
                                                                        Negative
                    P6 P8 P8
                                           Team
                                            Team
                                               Team
                                                                         Negative
                                                                      Negative
                                                                        Negative
                    P7 P9 P9               Other
                                             Bowling
                                           Bowling                       Negative
                                                                         Positive
                                                                       Positive
                    P8 P10
                         P10               TeamTeam
                                            Team                         Negative
                                                                        Negative
                                                                      Negative
                         P9                          Bowling                                   Positive
           From Table
                    P10 5, we can see that, for the comment, both the Bowling andNegative
                                                                                      Team categories had four
     From Table 5,     we can see that, for theTeamcomment, both the Bowling and Team categories had four
      votes. Thus, we keep both of these categories in our annotated dataset. In polarity, there were three
votes. Thus, we keep both of these categories in our annotated dataset. In polarity, there were three
      positive
     From      votes5,and
             Table     weone
                           cannegative
                               see that,vote for bowling. Thus,  we the
                                                                     considered it as having a positive polarity four
positive
      andvotes
          addedand
                 it toone
                       our negative   votefor
                            dataset. Table
                                              the
                                           6for
                                                   comment,
                                                bowling.
                                             shows
                                                              both
                                                          Thus,
                                                     a sample
                                                                        Bowling
                                                                  welabeled
                                                              of the  considered
                                                                            Cricket
                                                                                   and
                                                                                   it as Team
                                                                                         havingcategories
                                                                                      dataset.  a positivehad
                                                                                                            polarity
votes.
and     Thus,itwe
     added      to ourkeep    both ofTable
                           dataset.       these6 categories
                                                       shows a sample  in ourofannotated   dataset.
                                                                                 the labeled  Cricket Indataset.
                                                                                                         polarity, there were three
positive  votes    and   one   negative        vote     for  bowling.       Thus,
      In Table 7, the summary of the complete Cricket dataset is presented. We    we  considered   it as having   a positive
                                                                                                            can see          polarity
                                                                                                                     that there are a
and  added   it to    our  dataset.     Table     6    shows      a  sample   of  the labeled  Cricket
total of 3034 different comments with five different categories, that is, Batting, Bowling, Team, Team   dataset.
Management, and Other.Table         Each 6.   ofAthe     categories
                                                      part               contains
                                                            of the Cricket          threein
                                                                                dataset   different  polarities: positive, negative,
                                                                                            xlsx format.
and neutral. For example, the Batting category contains a total of 583 comments, for which 138 are of
positive, 389 are ofবয্াপার    না। eটা and
                          negative,       ধু মাt56ঘর্টare
                                                       না ছাড়া  িকছু না। polarity. The Bowling category
                                                           of neutral                                   other contains 154neutral
                                                                                                                             positive,
145 negative, and 33 neutral      ভকামনা     টাiগারেদরcomments.
                                        polarity          জনয্।           The Team, Team Management,    team and Other categories
                                                                                                                           positive
contain totals of   বাংলােদশ  eখেনাand
                       774, 332,      তািমেমর
                                           1013 েযাগয্   oেপনার েপেলাrespectively.
                                                     comments,           না।                           batting             negative
                                  বাংলােদশ হারেব আজ ।                                                   team              negative
           সাতজন িsনার িনেয় মােঠ েবাতল টানােনা যায় ময্াচ েজতা যায় না।                                 bowling             negative
             টািনর্ং িপচ বািনেয় ঔষধ খু েজ লাভ িক বাuিn িপচ বানােলi হয়।                         team management             negative
                          eটা পু রাদেম িফিkং eকটা েখলা হiেস।                                            team              negative
                                  জয় ধু সমেয়র aেপkা।                                                    other              positive
                                    েযi জেয়র সমান!!                                                     other               neutral
Data 2018, 3, 15                                                                                                      6 of 10
      Data 2018, 3, x                                                                                          6 of 12

                                           6. A
                                     Table 6.
                                     Table    A part
                                                part of
                                                     of the
                                                        the Cricket
                                                            Cricket dataset
                                                                    dataset in
                                                                            in xlsx
                                                                               xlsx format.
                                                                                    format.

                             বয্াপার না। eটা ধু মাt ঘর্টনা ছাড়া িকছু না।                    other           neutral
                                        ভকামনা টাiগারেদর জনয্।                              team           positive
                         বাংলােদশ eখেনা তািমেমর েযাগয্ oেপনার েপেলা না।                    batting         negative
                                         বাংলােদশ হারেব আজ ।                                team           negative
                   সাতজন িsনার িনেয় মােঠ েবাতল টানােনা যায় ময্াচ েজতা যায় না।             bowling          negative
                    টািনর্ং িপচ বািনেয় ঔষধ খু েজ লাভ িক বাuিn িপচ বানােলi হয়।         team management      negative
                                 eটা পু রাদেম িফিkং eকটা েখলা হiেস।                         team           negative
                                         জয় ধু সমেয়র aেপkা।                                 other          positive
                                       েযi    জেয়র সমান!!                                   other           neutral
             েবালাররা েয পিরমােন শটর্ বল িদেc- তােত রান কতেবিশ হয় েসটাi েদখার িবষয়!       bowling          negative
                              তােক েটs আর oিডআi দেল িনয়িমত চাi।                            team            positive
               বাংলােদশ িkেকট আেরা eিগেয় যােব, oেপনারেদর eকটু ভােলা করেত হেব।              team            positive
               বাংলােদশ িkেকট আেরা eিগেয় যােব, oেপনারেদর eকটু ভােলা করেত হেব।             batting          negative
                                   িফেরi চমক েদখােলন রাjাক                                bowling          positive
                                  নবীনেদর সু েযাগ েদয়া দরকার.                              other            neutral
           েবািলং িপচ তেব আমােদর বয্াটসময্ানেদর আuট েলা আtহতয্া ছাড়া আর িকছু i নয়।        batting          negative
           েবািলং িপচ তেব আমােদর বয্াটসময্ানেদর আuট েলা আtহতয্া ছাড়া আর িকছু i নয়।        bowling           neutral
                                    দািয়tjান হীনতার aভাব?                                  other           negative

           In Table 7, the summaryTableof7.the
                                            Thecomplete
                                                 completeCricket  dataset
                                                           statistics      is presented.
                                                                      of Cricket dataset. We can see that there are
      a total of 3034 different comments with five different categories, that is, Batting, Bowling, Team, Team
      Management, and Other. Each of the categories contains Polaritythree different polarities: positive, negative,
      and neutral. ForCategory
                         example, the Batting category contains a total of 583 comments, for    Total
                                                                                                   which 138 are of
                                             Positive        Negative          Neutral
      positive, 389 are of negative, and 56 are of neutral polarity. The Bowling category contains 154
                         Batting
      positive, 145 negative,   and 33 neutral138                389
                                                 polarity comments.    The Team,56Team Management,
                                                                                                 583
                                                                                                         and Other
                        Bowling                154               145              33             332
      categories containTeam
                           totals of 774, 332, and
                                               166 1013 comments,502 respectively.66             774
                    Team Management         24                293              15                    332
                         Other    Table 7. The
                                            89 complete statistics
                                                              828 of Cricket dataset.
                                                                               96                   1013
                                                     Total Comments
                                                           Polarity                     3034
                                      Category                                   Total
                                               Positive Negative Neutral
2.1.2. Analysis of Proposed Cricket
                               Batting Dataset 138           389         56       583
                              Bowling            154         145         33       332
      We used Zipf’s law [14] for our proposed Cricket dataset. Zipf’s law is a statement-based
                                Team             166         502         66       774
observation that states that
                         TeaminManagement
                                 a collection of data,
                                                  24 the frequency
                                                             293       of15a given332
                                                                                   word should be inversely
proportional to its rank in the Other
                                corpus. The word  89that is the
                                                             828most frequent
                                                                         96     and
                                                                                 1013scores rank 1 in a dataset
should occur approximately twice as the       second
                                           Total       most frequent word, three
                                                 Comments                        3034times as the third most
frequent, and so on. Figure 1 shows the diagram in which we plotted the words of our Cricket dataset.
     2.1.2.follows
The plot    Analysisthe
                      of trend
                         Proposed Cricketlaw.
                               of Zipf’s  Dataset
                                              We also calculated the reliability of the annotation process.
The valueWe of the intraclass  correlation (ICC)  was 0.71.
                used Zipf’s law [14] for our proposed   Cricket dataset. Zipf’s law is a statement-based
      observation that states that in a collection of data, the frequency of a given word should be inversely
2.2. Restaurant Dataset
      proportional to its rank in the corpus. The word that is the most frequent and scores rank 1 in a dataset
     should
     To     occur
        create  theapproximately    twice asdataset,
                    Bangla Restaurant        the second
                                                     we most
                                                         tookfrequent word, three
                                                              help directly   from times   as the third
                                                                                      the English       most
                                                                                                    benchmark’s
     frequent, and so on. Figure 1 shows the diagram in which we plotted the words of our Cricket dataset.
Restaurant dataset [3]. All comments were abstractly translated into Bangla with their exact annotation.
     The plot follows the trend of Zipf’s law. We also calculated the reliability of the annotation process.
The original English dataset contains a total of 2800 different comments. Participants from the same
     The value of the intraclass correlation (ICC) was 0.71.
group involved in the Cricket dataset’s creation were involved in the translation process of the
Restaurant dataset, except participants P9 and P10. We divided the original dataset equally into eight
parts and distributed these to the participants. They translated their assigned parts of the original
English dataset abstractly. Finally, participants P1 and P2 merged the separate sections and performed
an extensive proofread.
Data 2018, 3, 15                                                                                                                         7 of 10

                        Figure
                     Figure 1. 1. Distributionof
                               Distribution   of word
                                                 word fre que ncie s of
                                                      frequencies    ofCricke t datase
                                                                        Cricket        t using
                                                                                 dataset  usingZipf’s law.
                                                                                                 Zipf’s law.

Annotation Schema for Restaurant
     The Restaurant reviews [3] dataset used in this paper was abstractly translated into Bangla.
There were five types of aspect categories, that is, Food, Price, Service, Ambiance, and Miscellaneous.
As the objective was to identify the aspect category and corresponding polarity, the participants did
not add aspect terms or their polarities. In terms of the polarity of an aspect category, we considered
only three polarity labels, that is, positive, negative and neutral. The original dataset consisted of four
different polarity labels: positive, negative, neutral, and conflict. In our translated Bangla dataset,
we omitted the conflict category and assumed it to be the same as the neutral category. The annotators
were asked to assign each translated Bangla restaurant review into their categories and their polarities
for the original dataset. Table 8 shows a sample of the translated Restaurant dataset.
          Data 2018, 3, x                                                                                                      8 of 12

                                    Table8.8.AApart
                                   Table        partof
                                                     ofthe
                                                        theRestaurant datasetin
                                                           Restaurant dataset inxlsx
                                                                                 xlsxformat.
                                                                                      format.

                       খু ব সীিমত আসন আেছ eবং খাদয্ পাoয়ার জনয্ যেথ aেপkা করেত হেব।                      ambience           negative
                     খু ব সীিমত আসন আেছ eবং খাদয্ পাoয়ার জনয্ যেথ aেপkা করেত হেব।                          service          negative
                          Figure 2. Wordদাম       freতুque
                                                        লনামূncy   of Bangla
                                                               লকভােব কম।      Re staurant datase t according to Zipf’s law.
                                                                                                            price            positive
                                                        াi িছল মজাদার                                       food             positive
      3
                                       যিদo খাবারিট চমৎকার িছল, eিট সsা িছল না।                             food             positive
                                       যিদo খাবারিট চমৎকার িছল, eিট সsা িছল না।                             price           negative
                                                          খু ব ভাল!                                    miscellaneous         positive
                                             আচােরর সংেযাজন খু ব ভাল িছল ।                                  food             positive
                         ধু মাt রাnাi েয েসরা তা নয় , েসবা সবসময় মেনােযাগী eবং ভাল হেয়েছ।                   food             positive
                      ধু মাt রাnাi েয েসরা তা নয় , েসবা সবসময় মেনােযাগী eবং ভাল হেয়েছ।                     service           positive
                                     সবর্দা eকিট সু nর িভড়, িকn েকান েকালাহল েনi।                        ambience            positive
                               সjা alsl eবং পির ার - িব াn বা pশংসা করা িকছু i েনi                       ambience            neutral
                                    আিম িনি ত েয আমােক বারবর িফের েযেত হেব, !!!                        miscellaneous         positive
                  সmাবত eিট eকিট েছাট আরামদায়ক েরsু েরn,ভাল সjার সে েরামািnক aনু ভূিত।                   ambience            positive
                                          যিদo খাদয্ ভাল িছল পিরেবষনা িছল িব ।                              food             positive
                                          যিদo খাদয্ ভাল িছল পিরেবষনা িছল িব ।                             service          negative
      Da ta 2018, 3, x; doi:                  কম রা মেনােযাগী eবং বnু tপূ ণ।র্                                  www.mdpi.c om/journal/data
                                                                                                           service           positive
                                                      খাবার ভাল িছল।                                        food             positive

                Table 9 shows the complete statistics of the Bangla Restaurant dataset. We can see from the table
      Table
         that9five
               shows    the complete
                   different categories, statistics  of the
                                         that is, Food,      Bangla
                                                         Price,       Restaurant
                                                                Service, Ambiance,dataset.    We can see
                                                                                    and Miscellaneous,      from the table
                                                                                                          contained
that five713,
          different   categories,
              178, 336,            thatreviews,
                         234, and 613    is, Food,   Price, Service,
                                                  respectively,       Ambiance,
                                                                 with three         and
                                                                            different    Miscellaneous,
                                                                                      polarities.           contained
                                                                                                  For example,   the    713,
178, 336,category
           234, andFood
                      613contained
                           reviews,500    positive, 126with
                                      respectively,       negative,
                                                              threeand   87 neutral
                                                                     different      sentiment
                                                                                polarities.  Forlabels. The Service
                                                                                                  example,    the category
         category contained
Food contained                 186 positive,
                    500 positive,            118 negative,
                                     126 negative,       andand87 32 neutralsentiment
                                                                  neutral    sentiments.labels.
                                                                                         We alsoThe found  that this
                                                                                                         Service   category
          Restaurant dataset also followed Zipf’s law, which is shown in Figure 2.

                                       Table 9. Complete statistics of Bangla Restaurant dataset.

                                                                       Polarity
                                        Category                                                   Total
                                                          Positive     Negative       Neutral
                                           Food             500          126            87          713
Data 2018, 3, 15                                                                                                8 of 10

contained 186 positive, 118 negative, and 32 neutral sentiments. We also found that this Restaurant
dataset also followed Zipf’s law, which is shown in Figure 2.

                              Table 9. Complete statistics of Bangla Restaurant dataset.

                                                             Polarity
                   Category                                                                             Total
                                         Positive            Negative             Neutral
                   Food                    500                 126                   87                   713
                   Price                   102                  60                   16                   178
                  Service                  186                 118                   32                   336
                Ambiance                   138                  53                   43                   234
                     Figure 1. Distribution of word fre que ncie s of Cricke t datase t using Zipf’s law.
               Miscellaneous               300                 120                  193                   613

                     Figure2.2.Word
                   Figure       Wordfrequency
                                     fre que ncyof
                                                of Bangla
                                                   Bangla Re staurant datase
                                                          Restaurant  datasett according
                                                                               accordingtotoZipf’s
                                                                                             Zipf’slaw.
                                                                                                     law.

     3
3. Baseline Evaluation
     Our objective is to provide benchmark datasets for Bangla ABSA. Our datasets are designed for
two major tasks of ABSA. These are aspect category extraction and the identification of polarity for
each aspect category. In this paper, we experimented with the first subtask, that is, the extraction of
the aspect category. We applied three major steps to extract the aspect category. Firstly, preprocessing
was performed on the dataset. After this, we extracted features from the data and finally performed
classification using some popular classification models.

3.1. Preprocessing and Feature Extraction
    In the preprocessing phase, each Bangla document was represented as a “bag of words”.
We applied     traditional preprocessing steps for the evaluation. Firstly, punctuations
     Da ta 2018, 3, x; doi:
                                                                                           and stop words
                                                                               www.mdpi.c om/journal/data
were removed from each of the comments. After this, we removed the digits from our dataset, because
we found that digits were not necessary for the aspect category. Finally, we tokenized each Bangla
word from our dataset.
    Thus, a vocabulary of Bangla words was prepared after preprocessing. We created a feature
matrix for which each review was represented by a vector of that vocabulary. Term frequency–inverse
document frequency (TF–IDF) was used for calculating the features.

3.2. Results
     In the training phase, extracted feature sets were trained by the popular supervised machine
learning algorithms. Because this was a multi-label classification problem, we trained our models
Data 2018, 3, 15                                                                                                                       9 of 10

by setting up multi-label output. We used linear SVC in the support vector machine (SVM)
implementation. The following machine learning algorithms were used:

I.        Support Vector Machine (SVM)
II.       Random forest (RF)
III.      K-nearest neighbor (KNN)

     After the training was completed, our proposed Bangla test dataset was executed on the trained
model. The result is shown in the following table and figure.
     Table 10 shows the results for the task of aspect category extraction of the datasets we have
presented in this paper. We can see that using the SVM, we obtained the highest precision rate for
both of the datasets. Both datasets showed a low recall and F1-score. Figure 3 shows the overall
accuracy of the models using our datasets. The inherent nature of the datasets is the reason behind the
lower performance of the models for both datasets. People share their opinion with their individual
judgment. Therefore, the variety of opinions in the datasets is much larger. On the other hand, aspect
extraction is a multi-label classification problem. One’s opinion might have multiple aspect categories.
Conventional classifiers miss some of these aspect categories.

                                                 Table 10. Performance of proposed datasets.

                                     Dataset           Model                 Precision          Recall           F1-Score
                                                        SVM                    0.71                 0.22           0.34
                                     Cricket             RF                    0.60                 0.27           0.37
                                                        KNN                    0.45                 0.21           0.25
                                                        SVM                    0.77                 0.30           0.38
                                 Restaurant              RF                    0.64                 0.26           0.33
                                                        KNN                    0.54                 0.34           0.42
                   Data 2018, 3, x                                                                                          10 of 11

                                                                        Accuracy
                                 0.9
                                 0.8
                                 0.7
                                 0.6
                                 0.5
                                 0.4
                                 0.3
                                 0.2
                                 0.1
                                   0
                                           SVM           RF             KNN             SVM            RF        KNN
                                                       Cricket                                      Restaurant

                                                                 Precision     Recall    F1-score

                                                   Figure 3. The result of three models of our datasets.
                                          Figure 3. The result of three models of our datasets.
                   4. Conclusions and Future Work
                  Two datasets are provided for the ABSA of Bangla text. These datasets have been designed to
    These results     can be improved if we process and train the datasets in a more sophisticated
             perform two tasks covering aspect category extraction and the identification of polarity for that aspect
way. In this work,
             category. Wehave
                     we           taken
                           also report     all ofresults
                                       baseline    the vocabulary         as features
                                                         to evaluate the task             for theextraction.
                                                                              of aspect category   evaluation after removing
                  As future plans, we  aim to enhance  our  work by  including further
punctuation, stop words, and digits. Some state-of-the-art techniques for information   domains  such as cars, mobiles, gain can be
             and laptops. We are working on more advanced methods for the ABSA of Bangla text using our
applied to the dataset before classification and after the preprocessing steps to attain better results.
                   datasets to achieve better performance.

4. Conclusions  and
            Author   Future Work
                   Contributions: All authors contributed equally to this work, and have read and approved the final
                   manuscript.
     Two datasets    are provided for the ABSA of Bangla text. These datasets have been designed to
            Conflicts of Interest: The authors declare no conflict of interest.
perform two tasks covering aspect category extraction and the identification of polarity for that aspect
category. WeReferences
             also report baseline results to evaluate the task of aspect category extraction.
                   1.    Trusov, M.; Bucklin, R.E.; Pauwels, K. Effects of word-of-mouth versus traditional marketing:
                         Findings from an internet social networking site. J. Mark. 2009, 73, 90–102.
                   2.    Jeyapriya, A.; Selvi, C.K. Extracting Aspects and Mining Opinions in Product Reviews Using
                         Supervised Learning Algorithm. In Proceedings of the 2015 2nd International Conference on
                         Electronics and Communication Systems (ICECS), Coimbatore, India, 26–27 February 2015.
                   3.    Pontiki, M.; Galanis, D.; Pavlopoulos, J.; Papageorgiou, H.; Androutsopoulos, I.; Manandhar, S.
                         SemEval-2014 Task 4: Aspect Based Sentiment Analysis. Available Online:
Data 2018, 3, 15                                                                                          10 of 10

    As future plans, we aim to enhance our work by including further domains such as cars, mobiles,
and laptops. We are working on more advanced methods for the ABSA of Bangla text using our
datasets to achieve better performance.

Author Contributions: All authors contributed equally to this work, and have read and approved the
final manuscript.
Conflicts of Interest: The authors declare no conflict of interest.

References
1.    Trusov, M.; Bucklin, R.E.; Pauwels, K. Effects of word-of-mouth versus traditional marketing: Findings from
      an internet social networking site. J. Mark. 2009, 73, 90–102. [CrossRef]
2.    Jeyapriya, A.; Selvi, C.K. Extracting Aspects and Mining Opinions in Product Reviews Using Supervised
      Learning Algorithm. In Proceedings of the 2015 2nd International Conference on Electronics and
      Communication Systems (ICECS), Coimbatore, India, 26–27 February 2015.
3.    Pontiki, M.; Galanis, D.; Pavlopoulos, J.; Papageorgiou, H.; Androutsopoulos, I.; Manandhar, S. SemEval-2014
      Task 4: Aspect Based Sentiment Analysis. Available online: http://www.aclweb.org/anthology/S14-2004
      (accessed on 3 May 2018).
4.    Al-Smadi, M.; Qawasmeh, O.; Talafha, B.; Quwaider, M. Human Annotated Arabic Dataset of Book Reviews
      for Aspect Based Sentiment Analysis. In Proceedings of the 2015 3rd International Conference on Future
      Internet of Things and Cloud (FiCloud), Rome, Italy, 24–26 August 2015.
5.    Tamchyna, A.; Fiala, O.; Veselovská, K. Czech Aspect-Based Sentiment Analysis: A New Dataset and Preliminary
      Results. Available online: https://pdfs.semanticscholar.org/cbd8/7f4201c427db33783b1890bca65f5bf99d2c.pdf
      (accessed on 3 May 2018).
6.    Apidianaki, M.; Tannier, X.; Richart, C. Datasets for Aspect-Based Sentiment Analysis in French. Available
      online: http://www.lrec-conf.org/proceedings/lrec2016/pdf/61_Paper.pdf (accessed on 3 May 2018).
7.    Gayatree, G.; Elhadad, N.; Marian, A. Beyond the Stars: Improving Rating Predictions Using Review Text
      Content. Available online: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.150.140&rep=rep1&
      type=pdf (accessed on 3 May 2018).
8.    Kiritchenko, S.; Zhu, X.; Cherry, C.; Mohammad, S. NRC-Canada-2014: Detecting Aspects and Sentiment
      in Customer Reviews. Available online: http://www.aclweb.org/anthology/S14-2076 (accessed on
      3 May 2018).
9.    Kiritchenko, S.; Zhu, X.; Cherry, C.; Mohammad, S. Supervised and Unsupervised Aspect Category Detection
      for Sentiment Analysis with Co-cccurrence Data. In IEEE Transactions on Cybernetics; IEEE: Piscataway, NJ,
      USA, 2017.
10.   Soujanya, P.; Cambria, E.; Gelbukh, A. Aspect extraction for opinion mining with a deep convolutional
      neural network. Knowl.-Based Syst. 2016, 108, 42–49.
11.   Pengfei, L.; Joty, S.; Meng, H. Fine-Grained Opinion Mining with Recurrent Neural Networks and Word
      Embeddings. Available online: http://www.aclweb.org/anthology/D15-1168 (accessed on 3 May 2018).
12.   Pontiki, M.; Galanis, D.; Papageorgiou, H.; Manandhar, S.; Androutsopoulos, I. Semeval-2015 Task 12: Aspect
      Based Sentiment Analysis. Available online: http://www.aclweb.org/anthology/S15-2082 (accessed on
      3 May 2018).
13.   Pontiki, M.; Galanis, D.; Papageorgiou, H.; Androutsopoulos, I.; Manandhar, S.; AL-Smadi, M.;
      Al-Ayyoub, M.; Zhao, Y.; Qin, B.; De Clercq, O.; et al. SemEval-2016 Task 5: Aspect Based Sentiment
      Analysis. Available online: http://www.aclweb.org/anthology/S16-1002 (accessed on 3 May 2018).
14.   Pak, A.; Paroubek, P. Twitter as A Corpus for Sentiment Analysis and Opinion Mining. Available online:
      http://crowdsourcing-class.org/assignments/downloads/pak-paroubek.pdf (accessed on 3 May 2018).

                        © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
                        article distributed under the terms and conditions of the Creative Commons Attribution
                        (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
You can also read