Do data from twitter improve predictions of Academy Award winners? - JKU ePUB

Page created by Adrian Blair

Society

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Do data from twitter improve predictions of Academy Award winners? - JKU ePUB

Im Diplomstudium Wirtschaftswissenschaften:

 Diplomarbeit zur Erlangung des akademischen Grades Mag.rer.soc.oec

Do data from twitter improve predictions of Academy

 Award winners?

 Klemens Stutzenstein

 Betreut von

 PD René Böheim, PhD

 Johannes Kepler Universität Linz

 Institut für Volkswirtschaftslehre

 Altenberger Straße 69, A-4040 Linz-Auhof, Österreich

 Sankt Georgen an der Gusen, Jänner 2021

Eidesstattliche Erklärung

Ich erkläre an Eides statt, dass ich die vorliegende Diplomarbeit selbstständig und ohne

fremde Hilfe verfasst, andere als die angegebenen Quellen und Hilfsmittel nicht benutzt bzw.

die wörtlich oder sinngemäß entnommenen Stellen als solche kenntlich gemacht habe.

Die vorliegende Diplomarbeit ist mit dem elektronisch übermittelten Textdokument identisch.

Sankt Georgen an der Gusen, 20.01.2021 Unterschrift

 2

Abstract

Users of social media produce an enormous amount of data each day. Before the internet was

an object of everyday life people had to be surveyed to learn their opinion. With the rise of

social media, it became common to post opinions on social media platforms. When people

communicate their opinions before they are asked for it, this increases efficiency. An

automated index could save time and resources that can be better used in other ways. I

analyse if and how data from Twitter can be used to enhance prediction with box office data,

data from Google Trends, and data from Wikipedia. I cannot use the winner from the

Academy Awards directly as I need more than one point in time to compare the Academy

Award winner with my other data sources. Therefore, I substitute the Academy Award winner

with data from a prediction market that focuses on movies, the Hollywood Stock Exchange.

After the computation of my model, I concluded that the variables I generated from Twitter

data are not significant and do not add value for the prediction of Academy Award winners.

 3

Content

 Abstract ........................................................................................................................... 3

List of Figures ..................................................................................................................... 8
1. Introduction ................................................................................................................. 9
2. Literature Review....................................................................................................... 10
 2.1. Prediction Markets ............................................................................................... 11

 2.1.1.1. Hollywood Stock Exchange (HSX) ............................................................ 12

 2.1.1.2. Are prediction markets good at predicting future outcomes? ....................... 13

 2.2. Social Media........................................................................................................ 14

 2.2.1. Twitter ............................................................................................................. 17

3. Econometric Model and Variables .............................................................................. 17
 3.1. Underlying Hypotheses ........................................................................................ 19

 3.1.1.1. Hypothesis 1: The HSX price correlates with a movie’s success at the
 Academy Awards. ...................................................................................................... 19

 3.1.1.2. Hypothesis 2 and 3: The index and volume of tweets about a certain
 movie/actor/actress/director correlate with success at the Academy Awards. ................ 20

 3.1.1.3. Hypothesis 4: Commercial Success leads to Artistic Success....................... 21

 3.1.1.4. Hypothesis 5: If more people navigate to a movie’s Wikipedia page it is more
 likely that it wins one or more Academy Awards......................................................... 22

 3.1.1.5. Hypothesis 6: If more people search for a movie on Google it is more likely
 that it wins one or more Academy Awards. ................................................................. 23

 3.2. Data .................................................................................................................... 23

 3.2.1.1. Twitter Data .............................................................................................. 23

 3.2.1.2. Volume of tweets ...................................................................................... 23

 3.2.1.3. Index of tweets .......................................................................................... 25

 3.2.1.4. HSX Data – dependent variable ................................................................. 27

 3.2.1.5. Box Office Data ........................................................................................ 28

 3.2.1.6. Wikipedia data .......................................................................................... 32

 3.2.1.7. Google Trends data.................................................................................... 33
 4

4. Descriptive statistics ................................................................................................ 34

4.1. Summary statistics ............................................................................................... 34

4.2. Variable Specification .......................................................................................... 35

4.3. Correlation .......................................................................................................... 37

5. Panel Data Models .................................................................................................. 41

5.1. Variables ............................................................................................................. 41

5.2. Regression estimators .......................................................................................... 44

 5.2.1. Pooled OLS (Ordinary Least Square)............................................................. 44

 5.2.2. Panel models................................................................................................. 44

 5.2.2.1. Fixed effects (FE) ...................................................................................... 44

 5.2.2.2. Random Effects (RE)................................................................................. 45

 5.2.2.3. Swamy-Arora (SA).................................................................................... 45

5.3. Tests for model selection ...................................................................................... 45

5.3.1. Hausman test .................................................................................................... 45

5.3.2. F-Test .............................................................................................................. 46

5.4. Panel data estimation method ............................................................................... 46

5.5. Possible Problems with Endogeneity .................................................................... 46

5.5.1. Solution ........................................................................................................... 47

5.5.2. Testing for Endogeneity.................................................................................... 48

6. Estimation results .................................................................................................... 49

6.1. Comparison table ................................................................................................. 49

6.2. OLS regression .................................................................................................... 50

6.3. Fixed effects regression ........................................................................................ 50

6.4. Random effects regression ................................................................................... 50

6.5. Swamy-Arora regression ...................................................................................... 51

6.6. Tests for model selection ...................................................................................... 51

 6.6.1.1. Hausman test for FE vs. RE ....................................................................... 51

 6.6.1.2. F-Test for FE vs. OLS ............................................................................... 52
 5

6.7. Results ................................................................................................................ 52

 6.8. Prediction ............................................................................................................ 53

 6.8.1. Correct Predictions (in-sample) ..................................................................... 53

 6.8.1. Correct Predictions (out-of-sample) ............................................................... 54

 6.9. Comparison to the literature ................................................................................. 54

7. Conclusio................................................................................................................... 55
 7.1. Further research ................................................................................................... 56

 7.2. Limitations .......................................................................................................... 56

8. Appendix ................................................................................................................... 57
 8.1. Data .................................................................................................................... 57

 8.2. Histograms and Q-Q plots of independent variables .............................................. 59

 8.2.1.1. Twitter volume .......................................................................................... 59

 8.2.1.2. Twitter index ............................................................................................. 60

 8.2.1.3. US weekend receipts ................................................................................. 61

 8.2.1.4. US weekend average receipts ..................................................................... 62

 8.2.1.5. US weekend rank ...................................................................................... 63

 8.2.1.6. US weekend number of screens.................................................................. 64

 8.2.1.7. Google Trends ........................................................................................... 65

 8.2.1.8. Wikipedia.................................................................................................. 66

 8.3. Correlation tables ................................................................................................. 67

 8.3.1.1. Independent variable – dependent variable ................................................. 67

 8.3.1.2. Dependent variable – dependent variable.................................................... 68

 8.1. Variation ............................................................................................................. 70

 8.2. Estimation tables.................................................................................................. 71

 8.2.1. OLS regression................................................................................................. 71

 8.2.2. Fixed effects regression .................................................................................... 72

 8.2.3. Random effects regression ................................................................................ 73

 8.2.4. Swamy-Arora regression .................................................................................. 74
 6

8.3. F-Test for FE vs. OLS .......................................................................................... 74

Bibliography ..................................................................................................................... 75

 7

List of Figures

Figure 1: Underlying Hypotheses ....................................................................................... 19
Figure 2: Accuracy of the HSX award options market. ....................................................... 20
Figure 3: Comparison between movies in million USD ....................................................... 22
Figure 4: Advantages and weaknesses of different Twitter data sources ............................... 26
Figure 5: US weekend receipts example for "American Sniper" in US Dollars. .................... 29
Figure 6: US weekend receipts for all movies in US Dollar. ................................................ 30
Figure 7: weekend average receipts example for "American Sniper" ................................... 31
Figure 8: weekend rank example for "American Sniper" ..................................................... 31
Figure 9: weekend number of screens example for "American Sniper" ................................ 32
Figure 10: Wikipedia hits per day example for "American Sniper" ...................................... 33
Figure 11: Google Trends data for "Gravity","Her","Nebraska","Philomena" and "Frozen".. 34
Figure 12: Summary statistics ............................................................................................ 34
Figure 13: histogram of "HSX" .......................................................................................... 36
Figure 14: histogram of "log of HSX" ................................................................................ 36
Figure 15: Q-Q plot of "HSX" ............................................................................................ 37
Figure 16: Q-Q plot of "log of HSX" .................................................................................. 37
Figure 17: Correlation table for all variables ....................................................................... 38
Figure 18: Twoway linear prediction plot “log of HSX” and “log of Twitter volume” .......... 39
Figure 19: Two-way linear prediction plot “log of HSX” and “log of Twitter index” ........... 40
Figure 20: Comparison of OLS, fixed effects, random effects, and Swamy-Arora models. ... 49
Figure 21: Comparison of in-sample predictions with and without twitter data…………….. 53
Figure 22: Comparison of out-of-sample predictions with and without twitter data…………54

 8

1. Introduction

I analyse if social media data from Twitter will enhance predictions with data from the Box

Office, Google Trends and Wikipedia. Prediction markets such as the Hollywood Stock

Exchange or the Iowa Electronic Market generate forecasts for different kinds of settings. A

major disadvantage of using prediction markets for forecasting is that they are costly, demand

time to be set up, and need enough participants to generate valuable forecasts. I analyse if data

from Twitter in combination with the above-mentioned sources outperform predictions from

the Hollywood Stock Exchange.

The current price for a certain product on a given market offers a good indicator about the future

price (Putler 1992, 287). For settings with no actual markets researchers started “prediction

markets”. The first was the IEM, the Iowa Electronic Market (Berg and Rietz 2006, 142). In

these markets people bet on the outcome of a certain event (Berg and Rietz 2006a, 1). Since the

start of the IEM in 1988, prediction markets proved to be accurate in predicting future events

(Berg and Rietz 2006, 142), (Berg and Rietz 2006a, 12). For predicting another interesting, yet

not thoroughly researched method arose: the use of social media data. Large amounts of data

are generated every day. Making use of these data could bring significant benefits. Being able

to predict the winner of an Academy Award category is of economical relevance as it would

allow people to bet on the winner and make money (if the prediction model is better than the

betting market). The use of an (half) automated index increases efficiency, as it saves resources

that could be better used in other ways. Kogan et al. (2020) propose an early-warning system

for COVID-19 tracking using six digital data sources: (1) COVID-19-related search terms with

Google Trends, (2) COVID-19-related tweets, (3) searches from UpToDate (a search data base

with clinical knowledge used by physicians around the world) that are COVID-19-related, (4)

predictions by GLEAM (Global Epidemic and Mobility Model – an epidemic model that tracks

global disease spread), (5) human mobility data from smartphones and (6) Smart Thermometer

measurements from Kinsa. Kogan et al. found that Twitter data showed significant growth 2-3

weeks, before the growth was visible in confirmed cases. Their combined model was able to

predate an increase in COVID-19 cases with a median of 19.5 days. This information could be

valuable for politicians who must decide when stricter or less strict regulation have to be taken.

During the economic crisis that started in 2008 Askitas and Zimmermann (2009) researched an

innovative method for predicting unemployment in Germany. They performed searches on

Google Insights (later renamed to Google Trends) using two clusters of keywords. The first

cluster contains “Arbeitsamt” or “Arbeitsagentur” and the second cluster contains popular job

search engines in Germany. The prediction matches the unemployment rate closely (R² = 0.909)

and offers the benefit that it is available two weeks before the official unemployment rate is

released.

I use data from the Hollywood Stock Exchange (HSX), box office data, data from Twitter,

Google Trends, and Wikipedia. The left-hand side variable is taken from data from the HSX

which I use as an indicator for the Academy Award winner. The right-hand side consists of

variables from box office, Twitter, Google Trends, and Wikipedia data. I compare pooled

OLS with fixed effects and random effects models to analyse which model is consistent with

my data. After performing tests for model selection, I choose a fixed effects model.

2. Literature Review

Some speculative markets offer good predictions of future events indicated by the price of a

share. Aggregated trader information captures the probability of future events (Bothos,

Apostolou and Mentzas 2010, 50). Depending on the accuracy of the traders’ beliefs every

single trader generates more or less money (Bothos, Apostolou and Mentzas 2010, 51, Zitzewitz

2004, 2). Markets are not available for every kind of information. For some purposes, that are

not covered by “thick“ markets, prediction markets arose. An example for a prediction market

is the Hollywood Stock Exchange (HSX). The HSX allows people to trade shares of different

movies in the fictional currency “Hollywood Dollar” (H$).

2.1. Prediction Markets

The first real-money prediction market - also known as “information market”, “idea futures”,

“decision markets” or “event futures” - (Zitzewitz 2004, 108, Zhao, Wagner and Chen 2008,

285) was founded in 1988 and is called the Iowa Electronic Market (Berg and Rietz 2006, 142).

Built to forecast presidential elections, the IEM expanded to predict other events in 1993 (J. N.

Berg 2003, 1). Throughout the years, the IEM proved itself to be quite accurate in forecasting

political elections, box office revenues, financial outcomes for companies, etc. (J. N. Berg 2003,

1, Berg and Rietz 2006a, 142-149). Besides the IEM, several other prediction markets arose.

For this study, the Hollywood Stock Exchange (www.hsx.com), founded in 1996 and focusing

on box office records and Academy Awards, serves as independent variable (J. N. Berg 2003,

3, Hollywood Stock Exchange 2010). Contrary to the IEM, the HSX trades fictional shares

bought through Hollywood Dollars (Levmore 2003, 592). The HSX is said to be the gold

standard of predictions in the movie industry (Schoen, et al. 2013, 539).

Due to the fact that in prediction markets participants trade contracts of events happening in the

future trying to maximize their output and aggregate their information, prices in prediction

markets should reflect the likelihood of future events (Zitzewitz 2004, 108, Wolfers J. 2006,

2, J. N. Berg 2003, 3, Bothos, Apostolou and Mentzas 2010, 50).

2.1.1.1. Hollywood Stock Exchange (HSX)

On the Hollywood Stock Exchange participants get H$ (Hollywood Dollars) 2,000,000 for

opening an account (Hollywood Stock Exchange 2014). With this virtual money derivatives of

Hollywood movies can be bought: MovieStocks focus on domestic box office (Hollywood

Stock Exchange 2017), Celebstock are issued for celebrities in different field of entertainment

(Hollywood Stock Exchange 2017), TVStocks cover TV series (Hollywood Stock Exchange

2017) and AwardOptions focus on Academy Award nominees (Hollywood Stock Exchange

2018).

The winner of an “AwardOption” will get approximately H$25 per option while all others delist

at H$ 0.00. The combined “AwardOptions” for one category sum up to H$25 on day one of

trading. The higher the price for one option (one film, actor/actress or director), the higher

should the expected probability of this option be to win the category. The AwardOption halt

trading at 1 p.m. Pacific Standard Time on the day of the Academy Award ceremony

(Hollywood Stock Exchange 2014).

HSX AwardOptions are traded in the following eight categories (Hollywood Stock Exchange

2014):

 1. Best Picture: 5-10 movies depending on the year

 2. Best Director: 5 directors

 3. Best Actor: 5 actors

 4. Best Actress: 5 actresses

 5. Best Supporting Actor: 5 actors

 6. Best Supporting Actress: 5 actresses

 7. Best Adapted Screenplay: 5 movies

 8. Best Original Screenplay: 5 movies

 12

2.1.1.2. Are prediction markets good at predicting future

outcomes?

Former studies show that prediction markets are good predictors of future events. Spann and

Skiera (2009) test prediction markets versus betting odds and tipsters. They conclude that

betting odds and prediction markets perform equally well when using data from three seasons

of the German premier soccer league and both outperform tipsters. Levmore (2003) analysed

data from the Iowa Electronic Market (IEM) and the HSX. He shows that the averaged error

rate on IEM is 1.37% for the last four elections. At the same time the Wall Street Journal polled

members of the Academy of Motion Arts and Sciences for the six major category winners of

the Academy Award. While this poll predicted five out of six winners right, the HSX predicted

eight out of eight winners right (Wall Street Journal did not poll the winners for best supporting

actor and best supporting actress). Leigh and Wolfers (2006) review the efficacy of polls

(ACNielsen, Galaxy, Morgan and Newspoll) compared to prediction markets (BetFair and

Centrebet) for the election 2004 in Australia. They compare three forecasting horizons: 1 year

prior to the election, 3 months prior to the election and the election eve. Both prediction markets

forecasted the right winner in all 3 forecasting horizons. One year prior to the election only one

out of four polls forecasted the right winner. The same is true for 3 months before the election.

On election eve two out of four polls forecasted the right winner. Leigh and Wolfers show that

prediction markets performed substantially better than pollsters for the 2004 Australian

elections. Pennock et al. (2001) extract probabilistic forecasts for three “online games” – the

HSX, the Foresight Exchange (FX) and the Formula One Pick Six (F1P6) competition. When

evaluating box office forecasts, they use a model that combines data from the HSX with

forecasts from the movie expert Brandon Gray from Box Office Mojo. They collected data from

50 movie openings between March 3, 2000 and September 1, 2000. Their model correlates with

box office revenue at 0.956. Pennock et al. also assessed the HSX “AwardOption” for the 2000

Academy Awards. They compared the HSX prices to expert opinions of five columnists of the

“Hollywood Stock Brokerage and Resource”, a fan site of HSX. From the opening of the

“AwardOption” market on February 15, 2000 to the market close on March 26, 2000, the HSX

score increased almost continuously. By February 19, 2000 the HSX score surpassed all five

experts.

Prediction markets do have several limitations. People tend to underestimate large probabilities

and overestimate small probabilities. This may lead to an inefficient market (Zitzewitz 2004,

120). If there is an advantage to gain, people might try to manipulate the prediction market –

depending on how thin the prediction market is (Zitzewitz 2004, 123). Also, participants might

not trade based on objective probabilities, but on personal desires and interests, thus leading to

inefficient markets (Bothos, Apostolou and Mentzas 2010, 51).

2.2. Social Media

Social media changes the way we communicate with each other (Qualman 2013, 5-12). People

exchange opinions, impressions and experiences, they generate and share content (Hilker 2010,

11, Kalampokis, Tambouris and Tarabanis 2013, 454). This transforms the consumer to a

“prosumer” who does not only consume information but also generates part of it (Kaplan 2010,

66).

The use of social media may offer a new source for researchers (Lu, Wang and Maciejewski

2014, 58). Data in social media are generated at a high frequency, therefore enabling predictions

that cannot be realized with traditional surveys or administrative resources (D. C. Antenucci

2014, 1-2). Furthermore, social media data can often be generated at lower costs than traditional

sources (D. C. Antenucci 2014, 2, Bothos, Apostolou and Mentzas 2010). Another advantage

of social media data is that they carry incremental information that cannot be generated using a
14

prediction market (D. C. Antenucci 2014, 4, Diakopoulus and Shamma 2010, 1198).While

prediction markets react to events – the events that cause changes in prediction markets

themselves remain unclear. Twitter data offer a possibility to identify events that cause

movements in prediction market graphs. These events are often marked with hashtags: “#”.

Disadvantages of social media are that the data are not structured and are not gathered for a

special purpose (like data from the HSX). Another detrimental effect is that manipulation might

occur if there are enough incentives for doing so. Most likely, Twitter data are biased because

younger people and people from urban areas are overrepresented (Gayo-Avello 2011, 122;128).

Also, users on the internet tend to express extreme positive or negative experiences more often

than moderate experiences (Yu and Kak 2012). Furthermore, raw social media data are very

noisy and need substantial effort to be transformed into high quality data that can be used for

statistical analysis (Kalampokis, Tambouris and Tarabanis 2013, 545). According to Schoen et

al. (2013), prediction markets and social media data cannot be compared as in prediction

markets people put their (virtual) money on the participant they think will win. In social media

people give mentions. That does not necessarily have to reflect their opinion.

Sheng Yu and Subhash Kak identify three requirements for predicting with social media (Yu

and Kak 2012):

- The event to predict must be human related: non-human-related events, for example the

development of an eclipse, has nothing to do with social media data and therefore cannot

be predicted.

- Masses of people must be involved: they act as a sample but might have a built-in bias

(not everyone is posting his/her opinion on social media).

- The event to predict must be easy to be talked about in public: topics that are affected

by social pressure will lead to biased predictions.

Using social media data for predicting the Academy Award winners seems perfect for

conducting a study about.

Several studies cover the topic of user generated data for the prediction of certain events:

Desai et al. (2012) use Google internet query share data with keywords like “stomach virus”,

“stomach flu”, “stomach illness”, “stomach bug”, “stomach sickness”, and “stomach sick” and

construct a model that they compare to US norovirus outbreak surveillance data. Their model

shows a strong correlation (R²=0.95). Tumasjan et al. (2010) show that using the mere number

of tweets about political parties, the share of tweets comes close to traditional election polls

(MAE: 1,65%). A downside of this study is that they look at one election only, so they do not

use a sophisticated data base. Antenucci et al. (2014) create indexes of job loss using tweets

containing keywords that are associated with job loss (“axed”, “canned”, “downsized”,

“outsourced”, “pink slip”, “lost job”, “fired job”, “been fired”, “laid off”, “uneployment”) and

close variants. Their index tracks initial claims for unemployment insurance and predicts 15-

20% of the variance of the prediction error of the consensus forecast for initial claims.

Antenucci et al. also created a website for the so-called “University of Michigan Social Media

Job Loss Index” (Antenucci, Shapiro and Cafarella 2017). It was last updated in 2017 and offers

a download for all initial claims and predictions from 2011-2017. While the average weekly

deviation from the initial claims was 2.86% in 2011, it increased to 4.29% in 2012, 6.76% in

2013, 14.97% in 2014, 25.16% in 2015, 21.81% in 2016 and 15.81% in 2017 (until mid-July).

Asur and Huberman (2010) compare HSX data with a tweet rate time series to predict box

office revenues for 24 different movies. They show that R² for their Twitter time series (0.973)

is slightly better than R² for the HSX time series (0.965).

Thelwall, Buckley and Paltoglou (2011) assess if popular events lead to an increase in sentiment

strength. They study the most popular events within a month of English tweets and measure the

most popular events with a relative increase in term usage. It is surprising for the authors that

negative sentiments play a much bigger role than positive sentiments. They test three

hypotheses on negative sentiments that all deliver strong evidence at a 1% level. Out of the

three hypotheses on positive sentiments two were not significant and one was significant at a

5% level. To the authors it seems that people express their opinions on Twitter and that these

posts are more negative than average for the topic. Their main finding is that important events

on Twitter are associated with increases in average negative sentiment strength.

2.2.1. Twitter

Twitter is the biggest microblogging platform. A microblog (tweet) was initially a short

message of up to 140 (since September 26, 2017 280) characters (Isaac 2017). In 2019, Twitter

had 321 million monthly active users (Statista Inc 2019) who generated 500 million tweets a

day (Twitter Inc. 2019). People communicate about their lives (D. C. Antenucci 2014, 5) and

share thoughts, opinions, and behaviour on Twitter (Kalampokis, Tambouris and Tarabanis

2013, 554).

3. Econometric Model and Variables

I analyse if a model that contains Twitter data can predict the winners of Academy Awards

between 2009 and 2015 better than the same model without Twitter data.

I focus on the usefulness of user generated data from Twitter in combination with common

market data (US weekend receipts, US weekend average receipts per screen, US weekend rank,

US weekend number of screens), hits on the English version of a movie’s Wikipedia page,

Google Trends data and compared to the prediction market “Hollywood Stock Exchange”.

I assume that a movie that gets a lot of mentions from users on Twitter is more likely to win an

Academy Award than a movie that gets only little feedback. I do, however, not state that

mentions on Twitter directly influence the opinions of members of the Academy of Motion

Picture Arts and Sciences. I only assume that artistically good movies get more mentions and

are also more likely to win an Academy Award. According to Tumasjan et al. the number of

tweets without implementing any sentiment analysis represents a plausible prediction

(Tumasjan, et al. 2010, 183). Asur and Huberman show that the rate of tweets per day explained

about 80% of variance in movie revenue prediction (Asur and Huberman 2010, 495).

 18

3.1. Underlying Hypotheses

Figure 1: Underlying Hypotheses

Figure 1 lists the underlying hypothesis, from H1 to H6. While there is enough evidence for

H1-H4, studies that test H5 and H6 are rare.

 3.1.1.1. Hypothesis 1: The HSX price correlates with a movie’s

 success at the Academy Awards.

Chen and Krakovsky (2010) state that the accuracy of the Hollywood Stock Exchange is

remarkably high and beats critics in most years. Pennock, Nielsen and Giles (2001) show that

HSX prices are more accurate than expert opinions in predicting the winners of the Academy

Awards. Pennock et al. (2001) state that HSX prices correlate well with actual award outcome

frequencies (see graph below).

 19

Figure 2: Accuracy of the HSX award options market. Points show frequency versus average normalized price for
buckets of similarly priced options. The dashed line indicates perfect accuracy - figure from (Pennock, Nielsen and
Giles 2001, 179)

 3.1.1.2. Hypothesis 2 and 3: The index and volume of tweets

 about a certain movie/actor/actress/director correlate with

 success at the Academy Awards.

Movies are experience goods – you must see one, before you are sure you like it. To decide if

you want to see a movie therefore depends on the perceived quality (Deuchert, Adjamah and

Pauly 2005, 159). One of the most important quality signals is word of mouth, which spreads

faster the larger the user base is (Deuchert, Adjamah and Pauly 2005, 160). Another quality

signal is user ratings. Bothos, Apostolou and Mentzas (2010) investigate the correlation

between user ratings and the Academy Award winners but could not find statistically

significant predictors. The third important quality signal, the review of critics, is investigated
 20

by Reinstein and Snyder (2005). They used reviews from two popular movie critics (Siskel

and Ebert) to analyse if their reviews have a detectable effect at the box office. They conclude

that positive reviews (2 thumbs up) have a small effect in magnitude (25%) but are only

marginally statistically significant.

Ravid (1999) tests a sample of 180 films released between late 1991 and early 1993. He

creates three different indexes that contain reviews. Index1 = positive reviews/total reviews,

Index2 = (positive reviews + neutral reviews)/total reviews and Index4 = total number of

reviews. Only Index4 is significant at a 1% level and thus he finds evidence that the more

reviews a movie gets, the more economically successful it is – irrespective of whether they

are positive or not.

 3.1.1.3. Hypothesis 4: Commercial Success leads to Artistic

 Success.

While there is enough evidence for artistic success leading to commercial success, studies,

investigating the opposite relationship, are rare. The Economist, for example, stated in 1995

that winning the main category “best picture” increased box office receipts by $25 million

(The Economist 1995, 92).Terry et al. (2005) conclude that an Academy Award nomination

leads to a six million dollar increase in domestic (US) revenue.

Hadida (2010) explicitly tests the hypothesis that the more commercially successful a film is,

the higher its artistic recognition is. She does not reject this hypothesis (0.150, p < .001)

(Hadida 2010, 66).

According to Deuchert et al. (2005) higher quality movies might pull more people into the

cinemas. The following table by Deuchert et al. combined with the information that movies

make between 30.81% (own calculation with data from boxofficemojo.com) and 34% (Terry,

 21

DeArmond and Zachary 2009, 177) of their domestic (US) revenue on the opening weekend

makes it reasonable to believe that a higher grossing movie is more likely to win an Academy

Award (Deuchert, Adjamah and Pauly 2005).

Figure 3: Comparison between movies, Academy Award nominated movies and Academy Award winner movies in
million USD

 Average total US

 box office revenue
 Average US running
 (in million USD)
 time (weeks)

 All movies 25.48 (41.41) 16.08 (9.96)

 Only nominated 41.90 (50.08) 24.01 (10.77)

 Winner movies 90.55 (107.54) 35.20 (12.28)

 Note. Standard deviations are given in parentheses. Deuchert et al. (2005) show that the average box office revenue in

 million USD of all movies in their study is 25.48. They use the 204 most successful movies of each year between 1990 and

 2000. When looking at Academy Award nominated movies only the average box office return in million USD increases to

 41.90. The last column shows only the Academy Award winner movies with an average box office return of 90.55 million

 USD.

 3.1.1.4. Hypothesis 5: If more people navigate to a movie’s

 Wikipedia page it is more likely that it wins one or more

 Academy Awards.

There are hardly any studies that investigate the relationship between Wikipedia hits and

movie performance. Mestyàn, Yasseri and Kertész (2013) published a study that contains a

prediction model based solely on Wikipedia data that was able to predict box office movie

success a month prior to movie release with an R2 coefficient of 0.77.

 22

3.1.1.5. Hypothesis 6: If more people search for a movie on

 Google it is more likely that it wins one or more Academy

 Awards.

For assumption number 6 the same is true as for assumption 5. A simple regression model of

Yahoo!’s Web search query logs for the US market correlates strongly (0.85) with actual

revenue of 119 feature films between October 2008 and September 2009 (Goel, et al. 2010,

17487).

 3.2. Data

 3.2.1.1. Twitter Data

I use a dataset that consists of tweets about movies, actors and directors of 117 Academy

Award-nominated films between 2009 and 2015. These data include 306,016 tweets.

Data from Twitter are gathered using Twitter’s advanced search feature. I create two

variables: number of tweets and an index created based on positive, negative and neutral

mentions.

 3.2.1.2. Volume of tweets

Every tweet that fulfils the requirements of a predefined query is displayed on Twitter’s

advanced search. One query defines a film, actor/actress or director people talk about. Here is

an example of a query for a film:

“Name of film/director/actor” “Oscar” “lang:en” “since:YYYY-MM-DD until:YYYY-MM-

DD” (Y stands for year, M for month and D for day). The variable “Volume of tweets”

 23

represents the number of tweets (normalized to 1 for each category) about a certain

actor/director/film.

“Lang:en” means that only English tweets are gathered and “Since” and “Until” mark the time

frame that data are gathered for (Twitter’s advanced search does not offer the possibility to

segment after geographical data, so I segment tweets based on language).

For the movie “American Sniper” a query looks as follows:

American Sniper oscar lang:en since:2015-01-24 until:2015-02-22

The data are collected for each day and are then aggregated to weekly data to make them

comparable. In the next step, all tweets per week in each category are added up to “total

tweets per category”. This number is important to make the data comparable. In the categories

“best picture” and “best actor/actress” the number of tweets is much higher than in “best

director” or “best supporting actor/actress”. If I do not account for this the latter categories are

underrepresented. After this transformation, the tweets per movie/director/actor/actress in

each category always add up to one.

The reason for including the volume of tweets into my panel data set is that some studies

suggest that the number of tweets/opinions correlates significantly with different output data –

for example HSX (Doshi 2010, 44) or elections (Tumasjan, et al. 2010, 184). I do not think

that the number of mentions will provide very useful data. If there are a lot of mentions, but

all are negative I do not think this number will be correlated with winning an Academy

Award.

3.2.1.3. Index of tweets

I create an index consisting of the number of positive mentions, the number of negative

mentions and the number of all mentions. The formula for the index looks as follows:
 _ − _ 
 _ = _ 
 where “Twitter Index” ( _ )is the index for a

movie/actor/actress/director (x) at time (t), m_pos is the number of positive mentions for a

movie/actor/actress/director (x) at time (t), m_neg is the number of negative mentions for a

movie/actor/actress/director (x) at time (t) and m_all is the number of mentions for a

movie/actor/actress/director (x) at time (t).

The query for m_pos: Name of film/director/actor/actress win Oscar lang:en since:YYYY-

MM-DD until:YYYY-MM-DD

The query for m_neg: Name of film/director/actor/actress win Oscar cannot OR can't OR

"should not" OR shouldn't OR "will not" OR won't OR "might not" OR don't lang:en

since:YYYY-MM-DD until:YYYY-MM-DD

The query from m_all: Name of film/director/actor/actress Oscar lang:en since:YYYY-MM-

DD until:YYYY-MM-DD

Sentiment-based analysis is quite en-vogue, although it is difficult for computers to

understand the meaning of human communication. It is difficult for programs to assign the

right sentiment if the communication is short. Therefore, automatic sentiment analysis is

difficult for tweets.

Sentiment Analysis is questioned by some (Gayo-Avello 2011, 128), especially widely used

automatic sentiment analysis software. Metaxas et. al (2011) show that the accuracy of

sentiment analysis software is only slightly better than a random classifier assigning positive,

negative and neutral sentiments. When they investigate a smearing campaign against a

 25

candidate to the US Senate, more than a third of the contained tweets were tagged as positive

mentions.

Advantages and Weaknesses of different Twitter data sources

For collecting data from Twitter there are 4 different approaches:

Twitter Advanced Search: Twitter’s own search engine. All tweets since 2009 are

accessible.

Twitter Data Grants: Twitter started a pilot program for selected academic institutions in

2014 (Krikorian 2014). It seems this program has either ended or is not promoted anymore.

Twitter API: Twitter’s application programming interface allows people to export tweets

after defining a query.

Third Party Programs: social media monitoring programs often offer a Twitter

implementation that gathers data in real time. Most of these programs also offer a sentiment

analysis part.

Figure 4: Advantages and weaknesses of different Twitter data sources

 Advantage Weakness

 Twitter Advanced • free • no export function of

 Search • full data set tweets

 • reliable

 Twitter Data Grants • free • not accessible

 • full data set

 • reliable

 • export function

 26

Twitter API • free • historical data only for

 • export possibility the last 2 weeks

 • requires coding skills

 Third Party Programs • user friendly • expensive

 • often contain sentiment • a query must be defined

 analysis software before being able to

 gather data

 3.2.1.4. HSX Data – dependent variable

As HSX “AwardOption” data are publicly available only for the current Academy Award

ceremony, I bought a dataset (2009-2015), which consists of the aggregated price for each

option (movie/actor/actress/director) in the main 8 categories for every day from at least 4

weeks before the Academy Award ceremony from the Hollywood Stock Exchange. Each year

approximately four to six weeks before the Academy Award ceremony, the HSX starts its

“AwardOption” where people buy and sell options of Academy Award-nominated movies.
 25
The starting price is H$ for each movie in the main category “best picture”
 
(8 nominees in 2015 – the number of nominees in the main category ranges from 5 to 10

depending on the year) and 5 H$ for the other 7 categories (5 nominees each). The

“AwardOption” closes on Sunday, 1 p.m. Pacific Standard Time right before the Academy

Award ceremony. Options can be traded at any time.

 27

3.2.1.5. Box Office Data

This dataset contains all Academy Award nominated movies in the main 8 categories for the

years 2009-2015. The data are gathered from “Wolfram Alpha” (See

https://www.wolframalpha.com or appendix for more information) and consist of “US weekend

receipts”, “US weekend average receipts per screen”, “US weekend rank”, and “US weekend

numbers of screens”.

As the Hollywood Stock Exchange only covers eight different categories (best director, best

supporting actor, best supporting actress, best picture, best original screenplay, best actor, best

actress and best adapted screenplay), I lose some information about the movies that are not

covered. As HSX prices are the dependent variable there is no other way than to drop the data

from movies that are not covered by the HSX.

US weekend receipts

US weekend receipts are measured in 100,000 US Dollars (inflation adjusted to May 2016 US

Dollars) in line with earlier studies by Robins (1993), Miller and Shamsie (2001) and Nelson,

Donihue and Waldman (2001) about the movie industry. In line with Bothos, Apostolou and

Mentzas (2010), I include the first four weeks after the wide opening of a movie. According to

Deuchert et al. this is acceptable as they find a positive and significant impact of the first week’s

box office revenues on the weekly revenues of the following weeks. Nevertheless, movies do

not automatically generate financial success if the first week was successful (Deuchert,

Adjamah and Pauly 2005, 172).

The example below shows US weekend receipts for “American Sniper”:

Figure 5: US weekend receipts example for "American Sniper" in US Dollars January to June 2015.

The graph for weekend receipts looks similar for most of the movies. Movies tend to generate

the biggest revenue in week one after wide release. In this graph wide release is in mid-

January. The low amplitude before mid-January is the limited release in a few selected

cinemas.

 29

Figure 6: US weekend receipts for all movies in US Dollar.

This graph shows the aggregated receipts per week of all movies in US-Dollars. It clearly

shows that the biggest revenue is generated in the first week after wide release and then

declines over time. The x-axis marks weeks since wide release.

US weekend average receipts per screen

US weekend receipts are measured in 1,000 US Dollars, inflation adjusted to May 2016 US

Dollars. The example below shows the US weekend average receipts per screen for

“American Sniper”.

 30

Figure 7: weekend average receipts example for "American Sniper"

This graph again shows the limited release phenomenon. Until mid-January the film is only

released in a few selected cinemas. According to the average receipts per screen the limited-

release cinema shows attract a lot more viewers per screen than the shows after wide release.

weekend rank

Figure 8: weekend rank example for "American Sniper"

The same is true for “weekend rank”. During the limited release the movie ranks low on

cinema charts. This is comprehensible as the movie is only shown in a few selected cinemas.

 31

When “American Sniper” releases to a wide audience it instantly ranks first on the “weekend

rank” chart.

weekend number of screens

Figure 9: weekend number of screens example for "American Sniper"

In the beginning of January, the movie runs only on a few screens (limited release).

Therefore, the weekend receipts are low, the average receipts per screen are high and the

weekend rank is low until wide release.

 3.2.1.6. Wikipedia data

Wikipedia data are measured in hits per day on the English version of the Wikipedia page

about the film, actor, actress or director and are then aggregated to a weekly average.

 32

Figure 10: Wikipedia hits per day example for "American Sniper"

 3.2.1.7. Google Trends data

Google Trends measures the search volume users type into the Google search engine and creates

a time series index that ranges from 0 – 100 (Choi and Varian 2012, 2-3). The index considers

the highest number of searches in a week, normalised to be 100 (Choi and Varian 2012, 3). I

include the most searched for Academy Award nominated movie in each year in every Google

Trends query in order to generate comparable data. This query can include up to five search

terms (movies) and display them visually (see below).

The idea behind using data from Google Trends is that receiving an Academy Award is a

measurement of movie quality. I assume that the higher a movie’s artistic quality is, the more

people search for it on the internet. I use the first 4 weeks after wide release as a measurement

of interest. The chart below shows the level of interest (measured in percentage of search

volume) of five selected movies. Out of these 5, “Frozen” is the most searched for movie with

a peak of 100% relative search volume in January 2014. Each peak indicates a weekend, as

people tend to watch movies in the cinema on weekends. The same happens on January 1, 2014

as it is a public holiday.

 33

Figure 11: Example for Google Trends data for "Gravity" ( ), "Her" ( ), "Nebraska" ( ), "Philomena" ( ) and
"Frozen" ( )

4. Descriptive statistics

 4.1. Summary statistics

Figure 12: Summary statistics

During the creation of the log of Twitter volume I lose two observations. The reason is that the

value for those two observations is below zero. The missing observation from “log of Twitter

Index” derives from my data transformation before creating the logarithmic form. The one

 34

missing observation for “log of weekend Average Records in 100,000$” and “log of weekend

rank” is due to missing values from the movie “Foxcatcher” for week 3. “log of Google Trends”

loses 10 observations because they have a value of “0” and the missing 16 observations from

“log of Wikipedia Hits” derive from missing values for “Up in the Air”, “Before Midnight”,

“Whiplash” and “Inherent Vice”.

 4.2. Variable Specification

I display this information in form of histograms and Q-Q Plots. Creating log variables reduces

skewness and kurtosis in all shown variables except “weekend rank”. Therefore, I create the

logged form of all variables except “weekend rank” and implement it. I show the variable

“HSX” in the following section. Histograms and Q-Q plots of all other variables are attached

in the appendix.

 35

HSX

Histogram HSX Histogram log of HSX

Figure 13: histogram of "HSX" Figure 14: histogram of "log of HSX"

Figure 13 shows the frequency distribution of the variable « HSX ». For a single category the value of

“HSX” can go up to 25. The reason for higher “HSX” values is that they are cumulated for each movie.

The highest values are from “The King’s Speech” which was nominated in five categories and later won

four of them. Most movies are only nominated in one category and do not have a high probability of

winning according to HSX values. Without the log transformation the variable “HSX” is skewed. This

violates the normality assumption. After the log transformation (Figure 14) skewness is reduced.

 36

Q-Q Plot HSX Q-Q Plot log of HSX

Figure 15: Q-Q plot of "HSX" Figure 16: Q-Q plot of "log of HSX"

A Q-Q plot helps to graphically verify if a variable is distributed normally. If it is, the distribution

resembles a straight line with a 45° slope. Figure 15 shows that the variable “HSX” is not normally

distributed, but right-skewed. After logging the variable (Figure 16) it resembles a normal distribution.

 4.3. Correlation

 Independent variable – dependent variable

 When I investigate the correlation between the individual variables to examine the relationship,

 I do not see results that are unpredictable. The variables “log of Twitter volume” and “log of

 HSX price” have the highest correlation of 0.7810 (at a significance level of 0.05), which means

 that they have a strong positive relationship. This might be valuable but could also be random,

 as correlation does not say anything about causation. “weekend rank” has a negative correlation

 with all other variables. It is the only variable where a lower number indicates greater success.

 “log of weekend screens” is the only variable that does not correlate with “log of HSX price”

 37

on a 5% significance level (the star in brackets in figure 17 indicates a significance level of

95%), which means it is the first variable I will drop, as it will not offer relevant information.

Figure 17: Correlation table for all variables

The correlation table shows the strength of the linear relationship between all pairs of two

variables in my data set. The darker the green colour the stronger the relationship is

(correlation can take on values between -1 and +1, where -1 is a perfect negative relationship,

0 means no relationship at all and +1 is a perfect positive relationship). When studying the

table, I see that the variables I am especially interested in, “log of Twitter volume” and “log of

Twitter index”, have the highest correlation with the “log of HSX”, followed by “log of

Wikipedia hits” and “log of Google Trends“. Variables that show success at the box office

have the lowest correlation with “log of HSX”.

 38

To further examine the relationship between the variables I create a two-way scatterplot with a

line of best fit. The line indicates a strong and positive correlation between the two variables

“log of HSX” and “log of Twitter volume”.

Figure 18: Twoway linear prediction plot between “log of HSX” and “log of Twitter volume”

As indicated in the correlation table, the correlation between “log of HSX” and “log of Twitter

volume” is strong and positive. I include correlation graphs of “log of HSX” and “log of Twitter

index positive”. All other correlation tables are attached in the appendix starting on page 63.

 39

You can also read