The Impact of Ebooks on Print Book Sales: Cannibalization and Market Expansion

The Impact of Ebooks on Print Book Sales: Cannibalization and Market Expansion

The Impact of Ebooks on Print Book Sales: Cannibalization and Market Expansion

The Impact of Ebooks on Print Book Sales: Cannibalization and Market Expansion Hui Li January 22, 2013 Abstract The impact of ebooks on print book sales has received increasing attention in the publishing industry. This paper estimates a dynamic model of consumer book purchase, format choice and e-Reader adoption decisions. I combine a unique individual level purchase history panel data set with publicly available book prices and characteristics. Consumers have persistent heterogeneous general reading tastes, format utility from ebook reading, and rational expectation over book purchase when adopting an e-Reader.

My model estimates allow me to quantify the degree of cannibalization and market expansion eect. Taking supply side prices as exogenously given, counterfactual simulation shows that 2/3 of the ebook sales come from cannibalizing print books and 1/3 purely come from market expansion. The introduction of ebooks increases consumer surplus by $709.5 million in year 2011. The model also have implications on publishers and platforms pricing strategies under dierent contract schemes. Finally, I nd that models do not take dynamic device adoption decision into account substantially underestimate the price elasticity of books.

1 Introduction Digital distribution channel inuences traditional channels in a variety of industries these days. Ebook is one of the prominent but understudied settings. Since Amazon launched its rst Kindle in 2007, there has been signicant growth in ebook market. Ebook sales in the US. is $90.3 million in 2011, an increase of 202% compared to 2010 1. Amazon.com, the largest online seller of print and ebooks, reported that Kindle book sales surpassed Amazon's total hardcover sales in July 2010, and surpassed total print sales as of April 1, 2011 2. The impact of ebook on print book sales  cannibalization or market expansion  remains controversial to both practitioners and academics.

Publishers worry that the potential cannibalization eect would hurt their hardcover sales, from which they traditionally 1the Association of American Publishers February 2011 Sales Report, http://www.publishers.org/press/30/ 2http://news.cnet.com/amazon-kindle-books-outselling-all-print-books/8301-1 7938_105-20064302-1.html 1

The Impact of Ebooks on Print Book Sales: Cannibalization and Market Expansion

earn most prots, and tend to delay e-book publishing 3. The issue is so crucial to the publisher's and platform's (e.g. Amazon, Apple and Barnesandnoble) optimal pricing strategies that debates about who should set the ebook prices frequently make headlines 4. So far, most of what we know about the traditional and ebook channel is based on surveys. Lacking of data is the main empirical obstacle. Kanna, Pope and Jain (2009) conduct an experiment on a publisher's website and collect individual level choice data. They nd that pdf format and print format are substitutes after controlling for unobserved reading taste correlation.

Hu and Smith (2011) use aggregate sales data from a publisher who stopped releasing e-book version of new published paper books for six months as a natural experiment. They conclude that delaying the release of ebooks causes in an insignicant change in overall hardcover sales but a signicant decrease in ebook sales, total sales, and likely total revenue and prot to the publisher. Most empirical work are done in a reduced form. The only exception Kanna, Pope and Jain (2009) is conducted in a static cross-section way and do not model e-Reader device adoption decision. In general, ebook reading requires device adoption in the rst place 5.

The hardware side of the market should not be neglected because it will bias the book price elasticity estimates. It also carries important consumer taste information like general reading habit. Those who adopt a device at an earlier time are more likely to be avid readers who are crucial to the book market. Allowing for heterogeneous consumer general reading taste and price elasticity helps better understand demand composition over time. I estimate a dynamic structural model using individual level purchase panel data on book purchase and e-Reader device adoption. In particular, I try to address two research questions: Are there any cannibalization or market expansion eects imposed by the ebook channel on traditional paperbacks? How are they inuenced by the heterogeneity in consumer general reading taste, the price elasticity, and the format preference? Cannibalization here is dened as the books that could have been bought in paperback format in the absence of ebook format.

Market expansion is dened as the sales purely created by ebooks: those books will not be bought anyway without ebook version. Ebooks may cannibalize print book sales because they are cheaper and more convenient to read. Reasons for market expansion eect are: by oering consumers a lower-priced option, platforms attract more visits, which in turn leads to higher sales of paperbacks. It also encourages consumers to try authors and genres 3Publishers tend to delay ebook release in the hope of not cannibalizing hardcover sales. For instance, in early 2010, Hachette Book Group delayed the ebook release of nearly all titles by 3 to 4 months.

Simon & Schuster delayed ebook release of 35 major titles by 4 months. Hu and Smith (2011) studies the eectiveness of this strategic delay in a reduced form model.

4 In February 2012, Amazon.com removed more than 4,000 e-books from its site this week after it tried and failed to get them cheaper from I.P.G., one of the country's largest book distributors (http://bits.blogs.nytimes.com/2012/02/22/amazon-pulls-thousands-of-e-books -in-dispute/). On April 11, 2012, the United States Department of Justice has sued Apple and ve major book publishers, accusing them of colluding to raise e-book prices (http://mediadecoder.blogs.nytimes.com/2012/04/11/justice-les-suit-against -apple-and-publishers- over-e-book-pricing/).

5The evidence is in the Appendix.

2

The Impact of Ebooks on Print Book Sales: Cannibalization and Market Expansion

they may not have otherwise tried, and save budget for more new books. Furthermore, consumers may read faster because ebooks are more convenient to read and carry. I focus on consumer demand alone and treat the supply side as exogenous. The prices of e-Reader and books are taken as given from the data. In my model, there is only one platform  Amazon.com  selling e-Reading device and books in both paperback and ebook formats 6. I abstract away from oine book sales such as local book stores for data availability reasons. Also, while the total book sales in 2011 is $13.7 billion, the sales on Amazon.com is $7.96 billion 7.

As Amazon.com is the major book seller both in ebook and paperback, my model can cover a large size of the market. I focus on the third generation Kindle which does not have functions other than e-Reading 8. The shopping environment, where consumers buy both ebooks and paperbacks on Amazon.com, is a good setting for studying substitution pattern. Amazon.com lists print book and its ebook version side by side on product pages, so cannibalization is most likely to occur as paperback buyers can easily become aware of the competing ebook oers. I combine three unique data sets. The rst is an individual level panel data of consumer book purchase history from January 2011 to December 2011 gathered by comScore.

It is based on a random sample of more than 2 million Internet users in the United States. There are 2922 households buying 9570 book titles on 4978 shopping trips. The data have information on the time they buy a book, the format, the price, the quantity and household demographics like family income and zip code. The second data set is the publicly available book characteristics I collect from the Amazon website. For each title ever purchased in any format, I collect data on the price, rating, number of comments, ranking, genre, publishing date, and other book characteristics (e.g. ISBN, publisher, author) of both paperback and ebook formats.

In total, I have 15,810 pieces of title-format information. The third data set is the device adoption record gathered by comScore. It is an individual level panel data on Kindle purchase in year 2007-2011.

I model both book purchase, format choice, and device adoption decisions. On the device adoption 6 There are also books with both hardcover and paperback versions available. I use book characteristics of the hardcover version when the paperback version is not available yet (publishers often launch hardcover rst). Once the paperback is available, I consider only paperbacks. The actual purchased hardcovers I observe in the data consist of only 10% of the total book sales so I abstract from the distinguishing hardcover and paperbacks.

7http://www.fonerbooks.com/booksale.htm 8 Kindle is the dominant e-Reader in year 2011.

According to the survey conducted by Pew Research Center in January 2012, 62% of the e-reader owners have a Kindle and 22% have a Nook. The third biggest player, Pandigital, only accounts for 2% of the market. Also, I assume that ebook reading is done on Kindle and not on other devices, so that consumers need to buy a Kindle before buying any ebooks. In practice, people may read ebooks on multiple screens - dedicated e-Readers, PCs, iPads and smartphones. But according to the survey conducted by the Book Industry Study Group in 2011, e-Reader is the dominant device: in March 2011, 60% consumers read ebooks on e-Readers, 16% on PCs, 15% on iPad, and only 9% on smartphones.

Another reason is that ebooks sold on dierent platforms (e.g. Apple iBook, Barnesandnoble Nook, and Kindle Amazon) are not compatible and subject to digital right management (DRM) restrictions. As the dominant ebook seller, ebooks sold on Amazon.com are mostly likely read on Kindle. I show more supporting evidence in the Appendix.

3

side, every period dynamically optimizing consumers may choose to buy a Kindle if they have not bought one or wait. The ow utility of having a device purely comes from book purchase: consumers will have an enlarged choice set including both paperbacks and ebooks. Consumers are forward- looking on device adoption decisions and have rational expectation over future book purchase. In general, ebooks are cheaper and e-Reading brings extra utility as it is more convenient. Consumers need to trade o between the option value of buying the device and the current device price.

On the book purchase side, I model both consumer's decision of buying a title and the format choice using a static nested logit model. The two formats share the same content-related characteristics while dier in price and format utility. The device adoption side and book purchase side are linked because (1) consumers take current and future book purchase utility into account when buying the device; (2) device adoption status aects consumer's book choice set in that they cannot read an ebook unless they have bought the device.

The structural estimation follows the nested algorithm proposed by Rust (1987). I make one adjustment. The state space includes prices and other book characteristics of all the book titles in two formats and therefore is too large for practical estimation. To reduce the dimensionality, I use a single inclusive value for each format from the static book purchase side when solving the Bellman equation of the dynamic device adoption decision. Similar approach has been adopted in Hendel and Nevo (2006). For each iteration, I calculate the expected value functions in the inner loop and use MLE in the outside loop.

To get an initial guess for the parameters, I estimate a static discrete choice model only with book purchase side data. This estimation yields consistent, but potentially inecient, estimates of the parameters. In the second step, using the estimates from the rst stage, I compute the inclusive values associated with each format and their transition probabilities. Finally, I solve the simplied dynamic problem which involves device purchase choice exclusively. Rather than having the state space include prices and other characteristics of all the books, it includes only a single inclusive value for each format.

My model estimates allow me to quantify the degree of cannibalization and market expansion eect. Taking supply side prices as exogenously given, counterfactual simulation shows that 2/3 of the ebook sales come from cannibalizing print books and 1/3 purely come from market expansion. Interestingly, I nd that the substitution patterns for popular and niche books are dierent. People tend to view ebooks as a stronger substitute for print books when they buy a niche book. In terms of welfare, consumer surplus increases by $709.5 million in year 2011 because of the introduction of ebooks. Counterfactuals show that under the agency contract, where publishers set ebook prices, publishers have dierent optimal ebook pricing strategies given dierent e-Reader prices set by the platform.

In particular, if Kindle price is low, it is better for the publishers to set a high ebook price to recover from the cannibalization loss. If the Kindle price is high, publishers can safely set a relatively low ebook price and still benet from selling both ebooks and print books. This is because in that region, market expansion eect dominates cannibalization loss. Under the wholesale contract, where 4

platforms set ebook prices, platform need to trade o between revenues from device sales and book sales. This paper is the rst to use representative individual level observations of actual purchasing data and structurally estimate the degree of cannibalization and market expansion. I start from the micro foundation of utility maximizing at the individual level and allow for consumer's heterogeneous reading taste, dierent price elasticity on both device adoption and book purchase, and extra format utility from e-reading. The data cover a broader range of consumers, book titles, and genres, comparing to the extant literature where only one publisher in a particular policy setting is studied.

My model also contribute to the literature by taking into account device adoption decision. Extant literature all abstract from this fact and only concentrate on the format choice. However, buying the device before buying an ebook is common practice in this industry and cannot be ignored. It is also one of the strategic mix which can be used to lock-in consumers in platform competition. Moreover, estimation results show that models that do not take dynamic device adoption decision into account substantially underestimate the price elasticity of books.

The device adoption and book purchase setting relates to the literature on two-sided market. Robin Lee (2012) builds a dynamic structural model on video game and console demand using aggregate price and sales panel data. Several features dier in the video game industry and book industry. First, most of the video games are exclusive to a particular console. So the main benet of buying a console is a larger choice set of games. For the ebook market, most of the book titles are available in both ebook and paperback format. Thus channel advantages in price and reading experience are the driving force of e-Reader purchase rather than content exclusivity.

Second, people tend to wait for game or console price drop, which is less obvious for books 9. Third, people face far more titles of books than games and the distribution is quite disperse 10. I deal with this problem by assuming that each book purchase is an independent choice and only the two formats with the same content are in their choice set. Consumers also have the outside option of not buying.

In the next section I present the literature review. In Section 3, I describe the data set and the ebook industry. Section 4 presents the dynamic demand model of both e-Reading device and books. Estimation method is discussed in Section 5. I present the estimation results and model t in Section 6. I evaluate own- and cross- elasticities, calculate the degree of cannibalization and market expansion, and conduct supply side counterfactuals in Section 7. Section 8 concludes. 2 Literature Review My research is related to several streams of extant work. The rst stream studies the interactions 9The average book price is lower than game's.

Price drop often happens along with the launching of a new edition of books. Otherwise, price remains at.

10The number of games available is in thousands while books are in millions. Besides, a hit game can drive console sales (Robin Lee 2012) which is rarely the case for ebooks and e-Readers. 5

between the Internet and brick-and-mortar economies. There is a large marketing literature analyzing cannibalization and release timing in media channels (e.g., Lehman and Weinberg 2000, Luan and Sudhir 2006, and Prasad et al. 2004). Balasubramanian (1998) models a horizontally-dierentiated traditional channel and analyzes how this channel changes in the presence of an Internet retailer.

He shows that the e-retailer acts as a wedge between the competing retailers and the retailers compete with the e-retailer instead of competing with each other. Yoo and Lee (2011) extend the Balasubramanian model to account for heterogeneous customer preferences for the e-channel, and show that introduction of the e-channel does not necessarily intensify competition, a result contrary to common intuition. In an empirical context, Deleersnyder et al. (2002) nd that the introduction of online newspapers results in a relatively small cannibalization of physical newspaper sales, Biyalogorsky and Naik (2003) nd that the introduction of online storefronts for music does not signicantly cannibalize physical record sales, Waldfogel (2007) shows that Youtube viewing has only a small negative impact on television viewing, and Danaher et al.

(2011) show that the presence of the iTunes distribution channel has no statistical impact on DVD sales, but results in a large reduction in digital piracy. In terms of the online sales of books, Hu and Smith (2011) use data from the publisher under a natural experiment setting and nd that delaying the release of ebooks causes in an insignicant change in overall hardcover sales, but a signicant decrease in ebook sales. Structural models demonstrate further evidence. Gentzkow (2007) studies online newspapers and print ones; Kanna, Pope and Jain (2009) conduct an experiment in pdf and print format choice and analyze optimal pricing strategies for dual channel  print and ebook  publishers.

They all nd a strong substitution pattern after controlling for the complementarity from positively correlated taste. Ghose, Smith, and Telang (2006) show that Internet channels for used books result in a relatively small cannibalization of new book sales.

The second stream of literature is the two-sided market. Two-sided (or more generally, multi- sided) markets are dened as markets in which one or several platforms enable interactions between end-users, and try to get the two (or multiple) sides on board by appropriately charging each side. There is indirect network externality between the two sides in that the number of end-users on one side aects the number on the other side. The ebook market is two-sided because platforms like Amazon and Barnesandnoble get books from publishers and sell books to the consumers. On the demand side, Robin Lee (2012) builds a dynamic structural model on video game and console demand using aggregate price and sales panel data.

On the supply side, a key question is how platforms price to the users on the two sides. Armstrong (2005), Caillaud and Jullien (2003), and Rochet and Tirole (2003) each provide theoretical frameworks of two-sided markets to explain how the structure of prices is determined, with a monopoly platform sets prices or two competing platforms. Evans (2003a, 2003b) provide more examples and discussion of such markets. But the supply side optimal strategies are crucially built on the demand side properties, in particular, the degree of cannibalization and market expansion. My paper serves to ll the gap.

6

The third stream of literature studies ebook market in general. Bounie et al. (2011) nd that books of dierent genre have dierent success possibilities in ebook market and ebook format helps save old books. Oestreicher-Singer and Sundararajan (2010) build an analytical model and conduct reduced form regressions on copyright's impact on e-book prices. Hu and Smith (2011) nd that delaying the release of ebooks aects hardcopy sales of popular books signicantly, but not the niche ones. Since these ndings all come from reduced form regressions, the mechanism behind, however, is not clear.

The device adoption part of my demand side model is also related to the literature on dynamic con- sumer choice models for durable goods (Melnikov 2001; Gowrisankaran, Rysman 2012; Gowrisankaran, Park, Rysman 2010). The dierence is that my model deals with a two-sided market. 3 Industry and Data Description 3.1 The U.S. ebook market Since Amazon launched its rst Kindle in 2007, there is signicant growth in ebook market. Ebook sales in the US. is $90.3 million in 2011, an increase of 202% compared to 2010 11. Amazon.com, the largest online seller of print and ebooks, reported that Kindle book sales surpassed Amazon's total hardcover sales in July 2010, and surpassed total print sales as of April 1, 2011.

The advantage of ebooks are 1) instant delivery: there is no shipping cost, which enables frequent shopping at a low cost; 2) lower management and storage cost, which makes purchasing and storing multiple books easier; 3) e-Readers are easy to carry and convenient to read, especially while traveling, which potentially makes people read more and faster. The limitation is that consumers need to buy an e-Reader rst.

Below I list some stylized facts about the ebook industry. The gures, otherwise indicated, are calculated from my data set which are consistent with recent surveys. (1) The existence of dual channels: traditional paperbacks and ebooks. The sales of ebooks grow at three-digit rate while paperback sales almost remain the same over the past few years 12. The books that make to the bestsellers are also dierent in the two channels. Among the bestselling paper books, 27% are ctions, 23% non-ctions and 24% practicals. The numbers in e-book market are 70%, 12% and 8% (Bounie et al. 2011).

(2) Signicant Consumer Heterogeneity: In my data sample, the avid readers (13.8%) account for nearly half (46.8%) of total book purchase.

304 out of 2922 consumers have ever bought an ebook in year 2011, which is consistent with the survey result: 21% of American adults have read e-books. These consumers are avid readers in all formats: 88% of those who read e-books in the past 12 months 11the Association of American Publishers February 2011 Sales Report, http://www.publishers.org/press/30/ 12According to May 2012 AAP report (http://www.publishers.org/press/68/), total ebook net sales revenue for 2011 was $21.5 million, a gain of 332.6% over 2010; this represents 3.4 million ebook units sold in 2011, which increased by 303.3 %.

As a comparison, print formats (hardcover, paperback and mass market paperback) increased by 2.3% to $335.9 million in 2011.

7

Table 1: Summary Statistics of Book Purchase History household characteristics number of observations mean s.d. min max number of trips 2922 1.7 2.1 1 52 number of genres ever bought 2922 1.6 1.1 1 9 household income 2922 4.7 1.9 1 7 number of books within trip 4978 1.9 1.8 1 46 also read print books. They also tend to read more and buy books more frequently 13. Consumer heterogeneity in general reading taste can also be reected in the device adoption decision. Intuitively, those who adopt a device are more likely to be book reading fans and purchase more books on a more frequent basis.

My model is able to capture this pattern.

(3) Seasonality: The book industry exhibits considerable seasonality in book purchase. November, December and January are the holiday months when people buy more books. The situation is the same for Kindle sales. (4) Prices: For 75.2% of the book titles, ebook price is lower than paperback price. (5) Availability: ebook format availability increases over time. The number of ebooks available in Kindle store increases from 796,131 in January 2011 to 1,112,876 in December 2011 14. The percentage of non-availability of e-format in my data set drops from 45% to 30% (subject to small sample error).

This number is around 10% - 15% in general according to the survey 15.

3.2 Data description I focus on Kindle Amazon only because it is the dominant e-Reader in the year 2011. I combine three unique data sets. The rst is an individual level panel data of consumer's book purchase history from January 1, 2011 to December 31, 2011 gathered by comScore. It is based on a random sample of more than 2 million Internet users in the United States. Each consumer is identied by a machine id which indicates the machine he uses to access the website. For each consumer, each access of the website is recorded and identied by a session id, whether she bought something or not.

A consumer can have several shopping trips over the year. I observe the time they buy a book, the book title, the price and the format. I also observe demographics such as household income, family size, zip code, etc. There are 2922 consumers, with 9570 book titles purchased on 4978 shopping trips. Among them, 304 consumers have ever bought at least one ebook and 732 ebooks are bought in total. Table 1 shows the summary statistics.

The second data set is the publicly available book characteristics I collect from the Amazon website. For each title ever purchased in any format, I collect data on the price, rating, number of comments, 13Pew Research Center's Internet & American Life Project, April 2012 http://libraries.pewinternet.org/2012/04/04/the- rise-of-e-reading/ 14http://ilmk.wordpress.com/category/analysis/snapshots/ 15the Association of American Publishers February 2011 Sales Report, http://www.publishers.org/press/30/ 8

Table 2: summary statistics of book characteristics book characteristics (#obs=15810) mean s.d.

min max rating 4.2 0.6 1 5 # comments 171 5320 2 11,826 price 22 60 0 3,451 ranking 342,475 891,584 3 1.21e07 time since publishing date 17,109 2,728 -18,262 19,176 (in Sta ta date format, days since 1-Jan-1960) (yr=2006) (7.5yrs) (yr=1910) (yr=2011) Figure 1: Kindle monthly sales (2007-2011) ranking, genre, publishing date, and other book characteristics (e.g. ISBN, publisher, author) of both formatspaperbacks and ebooks. In total, I have 15,810 pieces of title-format information. Table 2 summarizes the book characteristics.

The third data set is the device adoption record gathered by comScore. It is an individual level panel data on Kindle purchase in year 2007-2011. I observe the time they buy a Kindle, the price and the quantity. I plot the monthly sales and cumulative sales of Kindle in my sample in Figure 1 and Figure 2. It follows a typical durable good sales pattern. The holiday months particularly drive sales. The rst and the third data set sample dierent groups of consumers some of which overlap. From the overlapping part of the sample, I can see that the the month they adopt a device and the month they start buying ebooks are the same most of the time.

So for each consumer in the rst book purchase data set, I observe the time they start buying ebooks and I assume that they adopt a device in that period.

I also observe the Kindle price throughout the year. The price path is a step function, where the price of the third generation Kindle with wi and 3G dropped only once in July from $ 189 to $139. In the e-Reader market, Amazon Kindle has long been the dominant product since its launching in 2007. Barnes & Noble released its Nook in November 2009 and Apple started selling ebooks with iPad 9

Figure 2: Kindle cumulative sales (2007-2011) in April 2010. For all of them, the prices remain unchanged most of the time. To summarize, I model both book purchase decisions and device adoption decisions.

I try to t the following observed pattern: (1) consumer's book purchase history: whether they by a particular title and which format they choose. There are 2922 consumers, with 9570 book titles purchased on 4978 shopping trips. Among them, 304 consumers have ever bought at least one ebook and 732 ebooks are bought in total; (2) their device adoption timing. I group the consumers into two groups according to the number of books they bought in the initialization period (the rst six months of year 2011, when almost no ebook has been bought). Figure 3 plots the adoption month by type: ordinary readers and avid reader.

Avid reader is dened as the consumers who bought more than 5 books in year 2011, where 5 is the 15% quantile of the number of books bought in my data set. I changed this cuto in my estimation and the results are robust. Most people do not have a Kindle before August in my data set 16. An avid reader expects that she will benet more from device adoption, so she will buy it earlier than others. Non-avid readers catch up in November and December because during holiday months they buy more books in general and device adoption will benet them more in those months. 4 Model Setup Consumers make decisions about device adoption, book purchase and book format.

Every month, those who haven't bought Kindle consider buying one or waiting for the next period given the current device price. Given their device adoption status, they also decide whether to buy books available in their choice set and in which format. They can always purchase a paperback whether they have a Kindle or not. The benet of buying a Kindle purely comes from ebook purchase: consumers will have 16This is consistent with the survey conducted by Pew Institute : less than 5% survey respondents have a Kindle by May 2011. http://pewresearch.org/databank/dailynumber/?NumberID=1275. My data sample is thus representative in this sense.

10

Figure 3: Adoption Month Plot by Type: observed Note: avid reader is dened as the consumers who bought more than 5 books in year 2011, where 5 is the 15% quantile of the number of books bought in my data set. 11

an enlarged choice set including books both paperback and ebook versions. Consumers are forward- looking on device adoption decisions and have rational expectation over future book purchase. In general, ebooks are cheaper and e-Reading brings extra utility as it is more convenient. Consumers need to trade o between the option value of buying the device and the current device price.

I make two assumptions on book purchase for feasibility and tractability reasons. First, consumers do not view book titles as substitutes for one another. This assumption is commonly used in media and content industry (e.g. Robin Lee 2012). For each book title, they choose between its ebook version and paperback version, or the outside option of not buying. They do, however, regard the same book in two formats as substitutes because paperback and ebook formats have the same content. I use a nested logit model to capture this demand structure. The rst nest is whether to purchase the book and the second nest is in which format.

Second, consumers are myopic in that they do not wait and only make static take-it-or-leave-it decisions. This assumption is reasonable for book industryunlike video games, where consumers often expect the price of a game to fall within several months, book prices are stable and consumers do not have an explicit reason for waiting. Another reason is that there are millions of books available on Amazon.com and each can be viewed as a separate market. My data set alone consists of thousands of books and consumers simply will not track every book and make dynamic decisions over time. They do have an expectation over total utility from book purchase when buying the device and I'll show it in the model setup later.

I face two challenges imposed by the features in my data. First, a consumer can buy several books in the same time period. I assume each book purchase is an independent decision because the purchase and consumption of books do not happen simultaneously. A consumer can make independent purchase decisions over time and buy them all at once for later reading. This assumption allow me to abstract from the multiple discreteness feature in my data set, in the sense that I do not model how they choose the quantity of books they buy 17. Still, the total utility from purchasing those books are taken into account when buying the device.

So people buying several books at once is more likely to adopt a device in that period. The second challenge is that consumers do not buy books every month. Hendel and Nevo (2006) model consumer's inventory of detergents to capture the purchasing arrival rate. Unlike detergents, book consumption is not that regular and mostly need-based (e.g. for leisure or for studying). It is hard to t it into an inventory model where there is a relatively constant consumption rate. Also, I do not observe consumer's inventory of books and I only look at their purchasing behaviors on Amazon.com. The focus of my paper is the format choice, so I take a static stance on the book purchase behavior and do not model either consumer inventory of books or their book purchase timing.

See Section 4.2 for modeling details.

17There are papers that focusing on modeling the multiple discreteness feature (Hendel 1999, Dube 2004). The focus of my paper is the format choice, so I abstract from the multiple discreteness problem. 12

4.1 Device Adoption The consumer has an innite horizon and discounts the future at rate δ. At some point in time, she may buy a Kindle. Let subscript 0 denote the device adoption status that the consumer does not have a Kindle and subscript1 if she has one. A consumer who does not have a Kindle at time t receives utility ui0t = σf fi0t + εi0t (1) where fi0t≡ f (Θi, Ψi0t) is the ow utility from reading paperbacks she buys in period t and εi0t is an idiosyncratic shock.

The ow utility is a function of the consumer's book purchase preference Θi and the characteristics of the paperbacks. In particular,Ψi0t =  wk, pP k k∈Ki0t denotes paperback book characteristics that are in the choice set Ki0t, which include book price pP k , and other book characteristics wk such as rating, ranking, number of comments, the time since publication and a holiday month dummy. Holiday month is dened as November and December. I describe how this ow utility is calculated in Section 4.3. The coecient σf allows the error terms for device adoption side and the book purchase side to have dierent standard deviation and is indispensable (c.f.

Train 2003).

If she buys a Kindle in the current period, she receives utility ui1t = σf fi1t − αHW i pt + εi1t (2) where fi1t ≡ f (Θi, Ψi1t) is the ow utility from reading both ebooks and paperbacks she buys in period t and εi1t is an idiosyncratic shock. Ψi1t =  wk, pP k , pE k k∈Ki1t denotes both ebook and paperback characteristics that are in the choice set Ki1t, which include book prices of the two formats  pP k , pE k , other book characteristics wk that are shared by the two formats such as rating, ranking, number of comments, the time since publication and a holiday month dummy. αHW i is the price coecient on the hardware side.

pt is the Kindle price. I also include a holiday month dummy to capture the seasonality of sales.

Consumers are forward-looking on device adoption decisions and have rational expectation over the utility from book purchase. I assume that they think the device price in the future is the same as the current one. I can allow for a richer rational expectation such as an AR(1) process (Gowrisankaran and Rysman 2012). However, Gowrisankaran and Rysman's data sample period is over 5 years, and the dynamics they observe is much richer. In my case, (1) In year 2011, Kindle price remains constant except for a price drop in July. (2) The e-Reader is a new kind of durable good in the market and it is relatively dicult for the consumers to predict the exact timing of price drop.

So I assume they have a at device price expectation. I examine another specication where consumers have perfect foresight on the device price and the estimates hardly change.

13

The state space includes her current device adoption status ιit = {0, 1} where 0 indicates not having a Kindle and 1 indicates having one, her time-invariant book purchase preference Θi, the characteristics of the books in her choice set {Ψi0t, Ψi1t}, the Kindle price pt, and the idiosyncratic shocks. Denote Ωit = {Θi, Ψi0t, Ψi1t, pt}. I drop subscript i for notation simplicity. Denote the idiosyncratic shock vector ~ εt ≡ {ε0t, ε1t}. Let V (1, Ωt, ~ εt) denote the value function of a consumer already having a Kindle at the beginning of the period.

Let V (0, Ωt, ~ εt) denote the value function of a consumer who does not. Conditional on not having a Kindle, Dt = 1 indicates that she chooses to adopt the device and Dt = 0 indicates that she waits.

The Bellman equation for a consumer already owning a Kindle at the beginning of the period is V (1, Ωt, ~ εt) = σf f1t + δE [V (1, Ωt+1, ~ εt+1) | Ωt] + ε1t (3) It is an absorbing state because the consumer keeps the Kindle in all the future periods. The Bellman equation if she doesn not have a Kindle at the beginning of the period is V (0, Ωt, ~ εt) = max{σf f0t + δE [V (0, Ωt+1, ~ εt+1) | Ωt, Dt = 0] + ε0t, σf f1t − αHW pt + δE [V (1, Ωt+1, ~ εt+1) | Ωt, Dt = 1] + ε1t} (4) The rst element of the max operator is the choice-specic value function of waiting and the second is the the choice-specic value function of buying.

Notice that both f0t and f1t are functions of Ωt. Assume ~ εt is independently distributed extreme value type I error with density g (~ ε). Let EV (·, Ω) = ´ ε V (·, Ω, ~ ε) dg~ ε denote the expectation of the value function integrated over ~ ε. Then apply the logit aggregation in Rust (1987) to equation (4): EV (0, Ωt) = ln  exp σf f0t + δE [V (0, Ωt+1, ~ εt+1) | Ωt, Dt = 0]  + exp σf f1t − αHW pt + δE [V (1, Ωt+1, ~ εt+1) | Ωt, Dt = 1 (5) where recall that owning a Kindle is an absorbing state and the expectation of the Bellman equation (3) is E [V (1, Ωt+1, ~ εt+1) | Ωt, Dt = 1] = E [V (1, Ωt+1, ~ εt+1) | Ωt] = σf f1t+δE [V (1, Ωt+1, ~ εt+1) | Ωt]+E (ε1t) (6) Because the error terms are assumed to be independently distributed type I extreme value errors with location parameter 0 and scale parameter 1, the mean of the error is the Euler constant E (ε1t) = 0.5772.

Notice that f1t is passed into the next period's value function and it aects the expectation 14

over fi,t+1. The probability of buying a Kindle, given that the consumer has not bought it before, is Pr (Dt = 1 | ιt = 0, Ωt) = exp σf f1t − αHW pt + δE [V (1, Ωt+1, ~ εt+1) | Ωt, Dt = 1]  exp (σf f0t + δE [V (0, Ωt+1, ~ εt+1) | Ωt, Dt = 0]) + exp (σf f1t − αHW pt + δE [V (1, Ωt+1, ~ εt+1) | Ω (7) Intuitively, consumers may be motivated to buy a Kindle by three reasons: current period book purchase need, a desirable current device price, and the option value of device adoption. To see this, take the dierence of the two choice-specic value functions, we can get σf (f1t − f0t) − αHW pt + δ {E [V (1, Ωt+1, ~ εt+1) | Ωt, Dt = 1] − E [V (0, Ωt+1, ~ εt+1) | Ωt, Dt = 0]} The rst term represents the benet from an enlarged book choice set in the current period.

So if the consumer have more books in her shopping basket that period and the ebook prices are much lower than the paperback's, she is more likely to buy the device in that period. The second term indicates that consumers will respond to a device price drop, which is exactly what I observe in my data. The third term is the option value which, as I show in the next section, is increasing in the general reading taste. An avid reader expects that she will benet more from the device adoption, so she will buy it earlier than others.

4.2 Book Purchase Consumers are heterogeneous in their general reading taste γi and the price coecient αSW i . I allow for an extra format utility of ebooks θ, which is an intercept interacted with ebook format. I assume that consumers share the coecients on other book characteristics β such as rating, ranking, number of comments, the time since publication and the holiday month dummy. The book purchase preference parameters are summarized in Θi = 

γi, αSW i , θ, β . The utility of consumer i purchasing book title k in ebook format is uE ik = vE ik + E ik = γi + θ + βwk − αSW i pE k + E ik The utility of consumer i purchasing book title k in paperback format is uP ik = vP ik + P ik = γi + βwk − αSW i pP k + P ik The utility of not purchasing is u0 ik = 0 ik γi is a general reading taste parameter and captures how much the consumer loves reading in general.

wk are observable book characteristics that are shared by paperback and ebook of the same 15

title. This includes the online ve-star rating, publishing date, ranking, the number of comments, and a dummy for holiday months. The only dierence between the two formats are the format utility θ for ebooks (e.g. better reading experience, more convenient to carry. I normalized the format utility of paperbacks to 0) and the prices. αSW i is the price coecient on the book purchase side which can be dierent from that on the device side. I model book purchase utility as the sum of content utility and format utility. First, the two formats have the same content and thus share book characteristics wk.

I estimate book characteristics coecients when modeling this content utility instead of doing xed eect for every title. This is because book purchases are quite disperse even for those top 30 bestsellers in my panel data and I cannot identify all the title xed eects. Second, I allow for dierent format utilities because ebooks are easier to carry and more convenient to read. Including the heterogeneous general reading taste parameterγi is important because avid readers, who bought more than 5 books a year, tend to buy more books and adopt the device earlier. The composition of the population who haven't bought a Kindle evolves over time as more and more people starting to have Kindle and buy ebooks.

On the supply side, this evolution has a strategic implication on the optimal pricing of Amazon and its competitors.

A paperback and an ebook of the same content have a dierent substitution pattern comparing to two paperbacks of dierent content. To deal with the undesirable IIA problem, I assume a nested logit structure. The rst nest is whether to buy the title and the second nest is which format to buy. Dene the book purchase decision dik = 0 if she does not purchase, dik = E if she buys it in ebook format and dik = P if she buys a paperback. Let σ denote the similarity parameter between the two formats as is dened in standard nested logit literature. It can be seen as the correlation between E ik and P ik.

The error term of not buying 0 ik is independently distributed and does not correlate with the former two errors.

Case (i): The consumer has a Kindle / choose to buy a Kindle in the current period so that her device status ιi = 1. Given consumer's type, the probability of choosing book title k in ebook format conditional on buying the title at time t is Pr (dik = E | dik 6= 0, ιi = 1) = exp  vE ik 1−σ  exp  vP ik 1−σ  + exp  vE ik 1−σ  The probability of buying title k is Pr (dik 6= 0 | ιi = 1) = h exp  vP ik 1−σ  + exp  vE ik 1−σ i1−σ 1 + h exp  vP ik 1−σ  + exp  vE ik 1−σ i1−σ 16

Thus the probability of purchasing title k in ebook format is the product of the last two probabilities: Pr (dik = E | ιi = 1) = exp  vE ik 1−σ  h exp  vP ik 1−σ  + exp  vE ik 1−σ i−σ 1 + h exp  vP ik 1−σ  + exp  vE ik 1−σ i1−σ (8) Similarly, the probability of purchasing title k in paperback format is Pr (dik = P | ιi = 1) = exp  vP ik 1−σ  h exp  vP ik 1−σ  + exp  vE ik 1−σ i−σ 1 + h exp  vP ik 1−σ  + exp  vE ik 1−σ i1−σ (9) The probability of not purchasing title k is Pr (dik = 0 | ιi = 1) = 1 1 + h exp  vP ik 1−σ  + exp  vE ik 1−σ i1−σ For each book title, the consumer choose between buying its ebook version and its paperback version, or not buying.

So the above three probabilities sum up to 1. Intuitively, the book characteristics are shared by the two formats, and thus the content utility mainly drive the rst nest decision of purchasing or not. The format utility and dierent prices across formats drive the format choice in the second nest.

Case (ii): The consumer does not have a Kindle and choose to wait in the current period so that her device status ι = 0. In this case, she can only choose to buy a paperback or not. The two nests boil down to one. The probability of purchasing title k in paperback format is Pr (dik = P | ιi = 0) = Pr (purchase | ιi = 0) = exp vP ik  1 + exp vP ik  The probability of not purchasing title k is Pr (dik = 0 | ιi = 0) = 1 1 + exp vP ik  In my data set, I do not observe people not buying books. People may go online and search for books, but end up buying nothing for that period. I deal with this problem by assuming that there is an exogenous arrival rate of searching for books 18.

Based on this rate, I get a set of search timing over 18 I choose this search frequency based on two other relevant data sets - consumer's searching and purchasing history of other product categories on Amazon.com. The details are in the Appendix.

17

the year for each consumer. I then model their purchase/format decisions given the search timing and the corresponding choice set. I keep all the observed purchase timing and the corresponding choice set as search timing, because in reality there is always a search before actual purchase. For those searches that do not end up with observed purchasing - those are actually no purchase data I generate - I assume that the choice set is representative books of the same genre and same quantity as in the last observed purchase. The book characteristics of the representative book are taken as the average of all the books of the same genre in that month.

This assumption is plausible because it is consistent with how the recommendation system on Amazon.com works. It displays similar books of the same genre based on your last purchase record.

Notice that although I abstract from the book inventory and searching rate by assuming exogeneity, I do model consumer's purchase/format choice which I observe and can t to the data. A good model t would be that in those periods when I observe people buying books, my model also predicts that they choose to purchase books. I conduct robustness check by varying the exogenous arrival rate of searching. The estimation results are quite stable. See the Appendix for further results. 4.3 Combine the device adoption and the book purchase The device adoption side and book purchase side are linked in two ways.

First, the device adoption status aects people's choice set of book purchase. Second, ow utilility from book purchase and people's expectation over future ow utility aect their current device adoption choice. Here I describe how the ow utility fi0 and fi1 that enters consumer's Bellman equation is calculated from the book purchase side.

The timing and the information available to the consumers when making decisions are as follows: For a consumer who does not have a Kindle at the beginning of the period, she rst decide whether to buy one. Her information set include the Kindle price p, the book prices 

pE k , pP k k and characteristics {wk}k in her choice set Ki0 and Ki1, and the realized device side shocks {εi0, εi1}. The book side shocks  P ik, E ik, 0 ik k are not realized yet, so the consumer calculates ex-ante ow utilities from purchasing books under two device adoption status fi0 and fi1. She also forms an expectation on the ow utilities in the next period.

Based on her information set and expectations, she buys a Kindle or waits. Then she decides whether to buy a book and in which format given the realized book side shocks  P ik, E ik, 0 ik k . If she buys a Kindle, she drops out of the device side market and only makes book purchase side decisions later on.

Let Ki0 denote the set of book titles available in paperback format for consumer i. Let Ki1 denote the set of both paperbacks and ebooks available. They can be seen as an exogeneous reading need or a booklist that are both time-varying and individual specic. The choice set includes the titles I observe they buy in the data plus the average titles of the same genre as their last purchase. Consumers can buy multiple books every period, and each decision is independent. The ex-ante ow utility from a single book purchase is the inclusive value, which can be thought of as a quality and taste adjusted 18

You can also read