Predicting the Academy Awards' result by analyzing tweets

Page created by Beth Anderson
 
CONTINUE READING
Predicting the Academy Awards' result by analyzing tweets
Global Journal of Science, Engineering and Technology (ISSN : 2322-2441)
   Issue 8, 2013 , pp. 39-47
   © GJSET Publishing, 2013.
   http://www.gjset.org

      Predicting the Academy Awards’ result by analyzing tweets

                                    Abouzar Abbaspour Ghomi
                             Student of MSc of Software Engineering
                    University of Tehran International Kish Campus, Kish, Iran
                                    Abbaspourghomi@ut.ac.ir

                                          Ehsan Shirzadi
                             Student of MSc of Software Engineering
                    University of Tehran International Kish Campus, Kish, Iran
                        Ehsan.Shirzadi@ut.ac.ir, www.EhsanShirzadi.com

                                        Afagh Movassaghi
                             Student of MSc of Software Engineering
                    University of Tehran International Kish Campus, Kish, Iran
                                      a.movassaghi@ut.ac.ir

Abstract: - the innovation of internet cause too many important things to be possible. But one of the main
goals of internet was bringing people together. As internet provides infrastructure for different people to
connect each other their diversities come to play and brings some mentionable cases to study. Some
studies show “wonders of aggregation“ meaning that decision of a group of people can be more accurate
than individuals. In this paper we examines a theory that says crowd of people can predict an event or for
example they could votes which nominee will win and award and most of the times it could be true and
this result can come from the social interaction between friends and people.

Key-Words: - Twitter, Forecast, Wisdom of crowd, Crowdsourcing, Social network

                                                           interactions.
1. INTRODUCTION                                            Many schedule event will happen during a year.
You know someone but you don’t know him                    For example different award shows that have
exactly.                                                   multiple nominees. People will talk about their
 As a matter of fact you just know his or her              favorite nominees by sharing tweets via twitter.
name. where do you go to find out what does                As different people use twitter in different
he/she like.                                               situations, by studying these tweets we could find
Social networks, Simple answer, think about if             out what is happening in that location and we
you know what do all the people of small city              may predict the events that could happen as the
like or want it to have, what major did they study.        result of different people’s ideas that could
What could you do with it. By studying such a              translate to actions.
society you could find out the next big thing. By          Al tahrir square was an event that could be related
studying the pattern of thought and analyzing the          to strong activity of youth of Egypt in twitter.
text that they are sharing each day via different          And twitter being a major factor for protesters to
social networks. Twitter has being part of these           get the latest news an know the location and the
social networks is a preferable place for                  time of next protest.
researchers to study diverse cases of human                In this article we gathered a large amount of

                                                      39
Predicting the Academy Awards' result by analyzing tweets
Global Journal of Science, Engineering and Technology (ISSN : 2322-2441)
   Issue 8, 2013 , pp. 39-47
   © GJSET Publishing, 2013.
   http://www.gjset.org

tweets in a week before academic awards 2013.              have boycotted the ceremonies and refused to
And we analyzed the data which is the text that            accept their Oscars. For example Marlon Brando
has been tweeted by users and we predicted that            refused his award( best actor in 1972 for the
which nominee would win the Oscars. We                     godfather ), citing the film industry's
compared it with the real results. In section 2 we         discrimination and mistreatment of native
will introduce some of typology of forecasting             Americans. Tim Dirks , editor of AMC's
election models.                                           filmsite.org has written of the academy awards,
In section 3 we will talk about how                        un fortunately the critical worth ,artistic vision
crowdsourcing and wisdom of crowd could help               ,cultural influence , and innovative qualities of
us to forecast an election. In another section we          many films are not given the same voting weight .
will showcase the analyzes that we have done on            especially since the 1980s , moneymaking
our dataset. Finally we will conclude our results.         "formula-made" blockbusters with glossy
                                                           production values have often been crowd-
 a. Academy awards                                         pleasing titans(and best picture winners), but they
                                                           haven't necessarily been great films with depth or
The academy awards , now officially known as
                                                           critical acclaim by any measure.
the Oscars , are a set of awards given annually for
excellence of cinematic achievements.
 The Oscar statuette is officially named the
Academy Award of Merit and is one of nine types
of Academy Awards. Organized and overseen by               2. A typology of election forecasting
the academy of motion picture and sciences                                 models
(AMPAS),the awards are given each year at a                There are many election forecasting models
formal ceremony.                                           [11][12]. The forecasting models mainly differ in
the awards were first given in 1929 at a ceremony          what they try to forecast, what predictors they use
created for the awards , at the hotel Roosevelt in         to forecast, and who does the forecasting.
Hollywood. Over the years that the award has               With regard to the forecasting target, some
been given , the categories presented have                 models
changed ; currently Oscars are given in more than          forecast vote intentions using monthly poll data
a dozen categories, and include films of various           [12][13] some forecast vote and seat shares
types.                                                     [14][15][16][17], whereas others forecast whether
Over 100 countries show academy award live and             incumbents will be re-elected [18] There are also
in recent years seeing academy awards discussion           variations in the predictors used, though usually
in social networks became a hot topic and there            the models include one or more of the following
are a lot of real time tweet parties and statuses          three types: lagged values of the dependent
during the broadcasting of Oscars, since 2004 ,            variable, economic variables and political
academy award nomination results have been                 variables. For example, [19] forecasts the
announced to the public in late January. prior to          Conservative vote shares with lagged values only,
that , the result were announced in early                  whereas [18] use inflation rate, government
February.                                                  popularity, and terms in office to forecast vote
the academy of motion picture arts and                     and seat shares of the incumbent party.
sciences(AMPAS), a professional honorary                   Another type of model relies on citizens’
organization , maintains a voting membership of            expectations     of     the   election     outcome
5,783 as 2012.                                             [21][22][23]. Most individual British voters
Actors constitute the largest voting bloc ,                correctly forecast the
numbering 1,311 members (22 percent).                      winner of the general election [23]. Further,
some winners critical of the academy awards

                                                      40
Predicting the Academy Awards' result by analyzing tweets
Global Journal of Science, Engineering and Technology (ISSN : 2322-2441)
   Issue 8, 2013 , pp. 39-47
   © GJSET Publishing, 2013.
   http://www.gjset.org

voters seem to be able to forecast the national           learning have sometimes identified the negative
vote share of the winning party. The present              consequences of “herd behavior” (Chamley,
research extends this line of research to the             2004). People behave as a herd if its members
constituency level. The present research                  first observe the actions of other members and
aggregates citizens’ local expectations about the         then behave in the same way dismissing any
election outcome in each constituency.
                                                          relevant personal information they might have.
This aggregation yields a very accurate forecast
                                                          Models of “herd behavior” highlight the
because of the “wisdom of crowds”-effect.
                                                          importance of sequenced if the first person makes
     3. “Wisdom of crowds” and                            the wrong decision, they lead the whole group
                                                          astray. Condorcet’s jury theorem and its
      forecasting election outcomes
                                                          extensions highlight the conditions under which
The “wisdom of crowds” refers to the                      groups outperform individuals. Condorcet’s jury
henomenon that aggregated judgments are (often)           theorem implies that if the probability that a
more precise than that of the smartest person in          group member votes for the “correct” alternative
the group (Hogarth,1978; Hastie and Kameda,
                                                          is greater than 0.5, the probability of a correct
2005; Larrick and Soll, 2006). Numerous studies
                                                          majority vote will increase rapidly towards unity
in different areas report this effect (for an             as the group size increases to infinity (Condorcet,
overview see Surowiecki 2004). Galton (1907),             1785, 1994; Miller, 1986). By the same token, if
for example, reports that on an exhibition, the           the probability that a group member votes for the
average of all estimates of an ox’s weight was
                                                          “incorrect” alternative is smaller than 0.5, the
much more precise than any individual estimate.           probability of an incorrect majority vote will
Recent research on public opinion mirrors these           decrease rapidly towards zero as the group size
findings. Page and Shapiro (1992) maintain that           increases to infinity. Condorcet derived this jury
although many individuals seem to have non-               theorem under specific assumptions same
attitudes (Converse, 1964), public opinion in             probability of being correct for each member,
aggregate responds to political events as one             independence of members, and a binary decision;
would expect. The reason for this effect is that a        however, recent efforts have generalized it. A
response is the sum of the true value and an error        jury theorem holds even if not every member has
term. By summing over several responses these             the same probability of choosing the correct
errors are likely to cancel each other out and the        outcome as long as the mean probability of being
true opinions emerge clearly. There are, however,         correct is greater than 50 percent (Grofman et al.,
those who expressed suspicion of the “miracle of          1983; Boland, 1989). The same authors also show
aggregation” (Converse,1990; Surowiecki 2004).            that a jury theorem still holds even under specific
                                                          forms of correlated votes between members (see
Critics have suggested that the group may cloud           also Ladha, 1992; Estlund, 1994). Finally, List
the judgment of individuals. Janis (1982) and ’t          and Goodin (2001) generalize Condorcet’s jury
Hart (1994) both speculate about the danger of            theorem to more than two alternatives. The
“groupthink” for those making judgments, and              theorem, accordingly, seems applicable to the
Solomon Asch’s experiments about social                   present case, where voters may talk with each
conformity show the negative consequences                 other or share the same information, have
groups may have on an individual’s judgments              different probabilities of making the correct
(Asch, 1955). Moreover, models of social                  forecast and where more than two parties

                                                     41
Predicting the Academy Awards' result by analyzing tweets
Global Journal of Science, Engineering and Technology (ISSN : 2322-2441)
   Issue 8, 2013 , pp. 39-47
   © GJSET Publishing, 2013.
   http://www.gjset.org

compete. Surowiecki (2004) claims that two other           a. Crowdsourcing
conditions increase the likelihood of “wisdom of          Innocentive was an initiative launched by a
crowds “diversity and decentralization (together          pharmaceutical firm which faced a problem in
with aggregation). Diversity refers to information        designing a product.
and not to socio-demographics (Surowiecki
2004). It is important that members of a group            Despite its large R&D department, it had no clue
have different pieces of information so that              about how to solve a specific problem for
combining their responses enriches the                    developing a new product. A new matchmaking
prediction. Decentralization highlights the value         system was devised to link outside experts to
of tacit knowledge (von Hayek, 1945): people              unsolved R&D problems. A monetary reward was
who live in a constituency have local 772 A.E.            offered to whoever could solve the problem.
Murr / Electoral Studies 30 (2011) 771–783
knowledge they can bring to the problem of                Scientists from all over the world competed for
predicting which party would win their                    the prize. The winner got the prize and the
constituency.1 Meehl (1957) makes a similar               company got the solution at a comparatively
point with his “broken leg analogy”. One may              inexpensive price [6][7]. Another example was
have a plausible statistical model to predict             seen in the case of Procter & Gamble, who were
whether professors watch a movie in the cinema            using less than 10% of internal innovation in their
based on two variables his or her degree of               new products, so the company changed its mind
extroversion and area of specialization. This             on the way they were innovating and changed
model may be on average right. But the model              their policy on intellectual property (IP). They
lacks common sense. The model may predict that            open the patent to any outsider if the idea has not
a particular professor goes to the cinema, though         been applied in the last 3 years [8]. The IP
he or she has a broken leg. The model lacks a             collaboration loop went further with the
“broken leg”-variable.                                    emergence of marketplaces such as yet2.com,
                                                          where ideas (under IP rights) are on sale [9].
Similarly, centralized election forecasting models
may miss variables that are important for                       4. The present research
forecasting particular constituencies. Statistical        In this research first we explore why groups are
models cannot incorporate all relevant                    better than individual in forecasting elections and
information because they lack the degrees of              the next section we will talk about why we think
freedom to do so. In contrast, citizens have the          groups can forecast and we will go on with how
ability to incorporate unforeseen events in their         we collected data and the ways that we
decision making.                                          aggregated specific results from this dataset and
Diversity and decentralization match the situation        we will predicted result of an award show and
in forecasting elections. Firstly, a random sample        compare it to the real results. There could be
of people within a constituency ensures a diverse         some questions about how accurate it could be?
selection of interviewed people. Secondly, letting        This could be clear in the conclusion section.
citizens collectively forecast.
                                                           a. Are groups better than individual in
                                                              forecasting?
                                                          Who is better in forecasting? Individual or
                                                          groups? This question compares the percentage of

                                                     42
Predicting the Academy Awards' result by analyzing tweets
Global Journal of Science, Engineering and Technology (ISSN : 2322-2441)
   Issue 8, 2013 , pp. 39-47
   © GJSET Publishing, 2013.
   http://www.gjset.org

correct constituency forecast between individual           with task difficulty elections vary in how easy
and groups.[10] forecasts are more impressive the          they are to forecast. The easy election are the
earlier they are made. It seems relatively easy to         higher the chances of getting it right. What makes
predict the election outcome one day before                an election easy or hard to forecast? I propose
election.                                                  several objective majors of task difficulty. These
                                                           objective majors fall into 2 broad categories.
Forecasting the election before the campaign               Characteristics of electorate at the degree of party
started seems more difficult. Plurality voting is          competition.
perhaps the simplest aggregation method.
                                                           Group characteristic should be defined as the
It just looks at the person of people who thought          groups size and the variations in the response
that the party or group would win(the group                date. For getting the prediction right.[1]
forecast score)

The party or the group with the highest group
                                                            c. Oscar buzz
forecast score is predicted to win the                     In the past for different event friends gathered in
constituency. Although polarity voting is widely           frin of TV and watch their favorite show. And
use as a standard voting procedure. It discards a          would tell their opinions and predictions to each
lot of available information. For instance polarity        others. As of today the same things happn. But in
voting only looks at the first ranked party. An            a different way. They all watch their favorite
omits information about which party came second            show but they express themselves by tweetin their
or third in the respondent forecast.                       comments and sharing it with hundreds of
                                                           followers and millions of random people. This is
Similarity polarity voting discard how certain the         done not just by social networks like twitter but
respondent was in making this forecast.[] To sum           by podcasts too that has been a popular service in
up groups are on average better forecasters than           recent years. As watching an award show through
individuals. This is a robust finding independent          twitter became an enjoyable hobby the tweets that
of weather whether polarity voting or another              has been shared is a very helpful data for
way of voting is used such as(e.g. range voting)           researchers to base their researches on them.
to yield group forecasts. However this is not to
say that groups are always better than individuals.         d. Collecting data from twitter
In some constituencies groups get it wrong                 Twitter website has been a great place for both
whereas some individuals correctly predict the             developers and researchers in recent years. By
winner.                                                    introducing the new API, twitter made it easy to
                                                           search through millions of users and billions of
 b. Why can groups forecast                                tweets. The mechanism would be that twitter
The previous subsection has shown that most                assigns an oAuth credential for each specific app
groups can correctly forecasts. We can also map            that has been registered. You can use the oAuth
out that some regions are better than others in            credential to request different available services
forecasting. But there could be some questions.            from twitter server. This service could range from
What can explain correct forecast. What can                a user’s account information to tweets that he/she
explain why some regions over perform. We can              shared. However the number of requests per hour
answer these questions by exploring task                   are limited but we managed to gather about
difficulties and group characteristics. To begin

                                                      43
Predicting the Academy Awards' result by analyzing tweets
Global Journal of Science, Engineering and Technology (ISSN : 2322-2441)
   Issue 8, 2013 , pp. 39-47
   © GJSET Publishing, 2013.
   http://www.gjset.org

200,000 of tweets in 5 days with information
about users that tweeted them.

We decided to collect our dataset from twitter.
The most important thing to make our dataset
more accurate is to collect tweet in the Oscar
buzz. When many people are talking about this
event. We have used twitter APIs to collect these
tweets based on the words related to Oscar, i.e
movie names, directors, nominations and stored
them in a database.
                                                         Fig2. Percentage of negative and positive tweets.
 e. Analyzing the collected data
We analyzed the collected data and extract some          As we studied this tweets we found out that we
results. First we created a dataset of important         can conclude that in a large group of people most
words which their meaning is positive and a              of them talk about things that get them excited
dataset for the words with negative meaning.             and situations that they like and enjoy to have.
                                                         This 49% that contains positive adjectives that is
After analyzing the tweets with these datasets we
                                                         used for different nominees are separated by each
found that about 49% of tweet contained positive
                                                         positive word that is shown in Fig3. For instance
words and only 4% contained negative words and
                                                         users used the word “best” about 24000 times
other 47% didn’t contained any of them. Fig.1
                                                         about different nominees in their tweets and used
and Fig2 is showing it as a diagram.
                                                         the word “win” to express their opinions about
                                                         the winners of the academy.

Fig1. Percentage of positive and negative ratio

                                                          Fig3. Positive adjectives ordered by number of
                                                         tweets.

                                                         In Fig4 we can see that we analyzed the tweets
                                                         based on the negative adjectives, as we concluded

                                                    44
Predicting the Academy Awards' result by analyzing tweets
Global Journal of Science, Engineering and Technology (ISSN : 2322-2441)
   Issue 8, 2013 , pp. 39-47
   © GJSET Publishing, 2013.
   http://www.gjset.org

before we can say that people tend to say positive        academy award for best actor, in Fig5 Daniel
things about a particular subject.                        Day-Lewis came third by lacking about 700
                                                          tweets.

                                                          In Fig6 we have the best supporting actor in our
                                                          prediction tweets show that Christoph Waltz
                                                          would win the Oscars and that did came true.

                                                          In other figures shown below we can see that
                                                          most of the analyzes that we have done came
                                                          true. Those predictions that we had been wrong,
                                                          the actual winner came very close to winning in
                                                          our predictions. In some cases although our
Fig4. Negative adjectives ordered by number of            prediction wasn’t like the real result, it may be
tweets.                                                   true amongst people for example the movie Argo
                                                          that caused a huge wave of criticism toward
                                                          academy award and accusing academy to be
                                                          politicized.

                                                          The method that we used to recognize and predict
                                                          the winner is that we searched for the name of
                                                          each nominee with the positive adjectives and we
                                                          assigned each one of them a score. we also
                                                          searched for the nominees in all of the tweets and
                                                          we scored them as ¼ of each score for a nominee
                                                          that is mention by a positive adjective, Then we
                                                          summed this scores.

Fig5. Best leading actor prediction based on
tweets

We gathered our data in a database and we
analyzed this data by inquiring specific
commands to get the best results and accurate
predictions. These predictions are based on the
real tweets that has been shared by each user. In
Fig5 we have th category of best leading actor,
our prediction by analyzing the tweets shows that
bradley cooper could win the academy rewards
for best actor. In this category our prediction           Fig6. Best supporting actor prediction based on
didn’t came true and Daniel day-lewis won the             tweets

                                                     45
Predicting the Academy Awards' result by analyzing tweets
Global Journal of Science, Engineering and Technology (ISSN : 2322-2441)
   Issue 8, 2013 , pp. 39-47
   © GJSET Publishing, 2013.
   http://www.gjset.org

Fig7. Best leading actress prediction based on
                                                         Fig9. Best Picture prediction based on tweets.
tweets.

Fig8. Best supporting actress prediction based on        Fig10. Best directing prediction based on tweets.
tweets.
                                                                        5. Conclusion
                                                         This studies and researches that we have done on
                                                         tweets show that most of the times we can
                                                         forecast events if we have the right amount of the
                                                         users that can produce different opinions. The
                                                         present research applies the users forecasting
                                                         ideas to an award show that has been televised in

                                                    46
Predicting the Academy Awards' result by analyzing tweets
Global Journal of Science, Engineering and Technology (ISSN : 2322-2441)
   Issue 8, 2013 , pp. 39-47
   © GJSET Publishing, 2013.
   http://www.gjset.org

most of countries all over the world and that is              [8] Tapscott, D., & Williams, A. D. (2006).
not specified by locations. We could limit data                    Wikinomics: How mass collaboration changes
                                                                   everything. London: Penguin Books.
gathering and predict the local event by local                [9] Lichtenthaler, U. (2007). Trading intellectual
expectations. For instance a law that is passing in                property in the new economy. International
congress we can have the expectation of each                       Journal of Intellectual Property Management,1(3),
province by studying on their day to day tweets                    241–252.
                                                              [10] Smith, W.D., 2000. Range voting. Available at:
and we can measure how this law helps everyday                     http://math.temple.edu/
life of people. Although user forecasting was in                   wds/homepage/rangevote.pdf.
general very accurate. It sometimes got in wrong.             [11] Campbell, J.E., Garand, J.C. (Eds.), 2000. Before
                                                                   the Vote: Forecasting American National
We used twitter and based our predictions on                       Elections. Sage, Thousand Oaks.
                                                              [12] Sanders, D., 1991. Government popularity and the
tweets. We cleaned our data through the                            next general election. The Political Quarterly 62,
collecting data stages. We think that with more                    235–261.
users participating and providing more                        [13] Whiteley, P.F., 1979. Electoral forecasting from
information in more timely manner our prediction                   poll data: the British case. British Journal of
                                                                   Political Science 9, 219–236.
will improved and we can outperform other                     [14] Mughan, A., 1987. General election forecasting in
predictions. We can analyze different events and                   Britain: a comparison of three simple models.
their consequences on people’s daily life. Using                   Electoral Studies 6, 195–207.
                                                              [15] Bélanger, É, Lewis-Beck, M.S., Nadeau, R., 2005.
this kind of social network predictions, allow us
                                                                   A political economy forecast for the 2005 British
conduct feel the experiment to study social                        general election. British Journal of Politics &
interactions and to measure the social distant of                  International Relations 7, 191–198.
participants by using network data.                           [16] Whiteley, P.F., 2008. Evaluating rival forecasting
                                                                   models of the 2005 general election in Britain: an
                                                                   encompassing experiment. Electoral Studies 27,
                 REFERENCES                                        581–588.
                                                              [17] Whiteley, P.F., Sanders, D., Stewart, M., Clarke,
[1] Andreas Erwin Murr,””Wisdom of crowds”?A                       H., 2011. Aggregate level forecasting of the 2010
    decentralized election forecasting model that uses             general election in Britain: The seats-votes model.
    citizens’ local expectations”                                  Electoral Studies 30, 278–283.
[2] J.Albors, J.C.Ramos, J.L.Hervas, “New learning            [18] Borisyuk, R., Galina, B., Rallings, C., Thrasher,
    network paradigms: Communities of objectives,                  M., 2005. Forecasting the 2005 general election: a
    crowdsourcing, wikis and open source”                          neural network approach. The British Journal of
[3] Thierry Bücheler a,∗, Jan Henrik Sieg b,                       Politics & International Relations 7, 199–209.
    “Understanding Science 2.0: Crowdsourcing and             [19] Norpoth, H., 2004. Forecasting British elections: a
    Open Innovation in the Scientific Method”                      dynamic perspective.Electoral Studies 23, 297–
[4] Howe, J. 2010. Crowdsourcing. Why the Power of                 305.
    the Crowd is Driving the Future of Business.              [20] Lewis-Beck, M.S., Skalaban, A., 1989. Citizen
    http://www.crowdsourcing.com/. Accessed 20                     forecasting: can voters see into the future? British
    Feb. 2011.                                                     Journal of Political Science 19, 146–153.
[5] H.W. Chesbrough, Open innovation. The new                 [21] Lewis-Beck, M.S., Tien, C., 1999. Voters as
    imperative for creating and profiting from                     forecasters: a micromodel of election prediction.
    technology, Harvard Business School Press,                     International Journal of Forecasting 15, 175–184.
    Boston, Mass, 2003.                                       [22] Lewis-Beck, M.S., 2005. Election forecasting:
[6] Allio, R. J. (2004). CEO interview: The                        principles and practice. British Journal of Politics
    InnoCentive model of open innovation. Strategy                 & International Relations 7, 145–164.
    & Leadership, 32(4), 4–9.                                 [23] Lewis-Beck, M.S., Stegmaier, M., 2011. Citizen
[7] Steve, R. (2006).Who’s ready to crowdsource?                   forecasting: Can UK voters see the future?
    Advertising Age, 77(43) (p. 35).                               Electoral Studies 30, 264–268.

                                                         47
Predicting the Academy Awards' result by analyzing tweets
You can also read