# The Impact of Ebooks on Print Book Sales: Cannibalization and Market Expansion

## The Impact of Ebooks on Print Book Sales: Cannibalization and Market Expansion

**The Impact of Ebooks on Print Book Sales: Cannibalization and Market Expansion Hui Li January 22, 2013 Abstract The impact of ebooks on print book sales has received increasing attention in the publishing industry. This paper estimates a dynamic model of consumer book purchase, format choice and e-Reader adoption decisions. I combine a unique individual level purchase history panel data set with publicly available book prices and characteristics. Consumers have persistent heterogeneous general reading tastes, format utility from ebook reading, and rational expectation over book purchase when adopting an e-Reader. **

My model estimates allow me to quantify the degree of cannibalization and market expansion eect. Taking supply side prices as exogenously given, counterfactual simulation shows that 2/3 of the ebook sales come from cannibalizing print books and 1/3 purely come from market expansion. The introduction of ebooks increases consumer surplus by $709.5 million in year 2011. The model also have implications on publishers and platforms pricing strategies under dierent contract schemes. Finally, I nd that models do not take dynamic device adoption decision into account substantially underestimate the price elasticity of books.

1 Introduction Digital distribution channel inuences traditional channels in a variety of industries these days. Ebook is one of the prominent but understudied settings. Since Amazon launched its rst Kindle in 2007, there has been signicant growth in ebook market. Ebook sales in the US. is $90.3 million in 2011, an increase of 202% compared to 2010 1. Amazon.com, the largest online seller of print and ebooks, reported that Kindle book sales surpassed Amazon's total hardcover sales in July 2010, and surpassed total print sales as of April 1, 2011 2. The impact of ebook on print book sales cannibalization or market expansion remains controversial to both practitioners and academics.

Publishers worry that the potential cannibalization eect would hurt their hardcover sales, from which they traditionally 1the Association of American Publishers February 2011 Sales Report, http://www.publishers.org/press/30/ 2http://news.cnet.com/amazon-kindle-books-outselling-all-print-books/8301-1 7938_105-20064302-1.html 1

earn most prots, and tend to delay e-book publishing 3. The issue is so crucial to the publisher's and platform's (e.g. Amazon, Apple and Barnesandnoble) optimal pricing strategies that debates about who should set the ebook prices frequently make headlines 4. So far, most of what we know about the traditional and ebook channel is based on surveys. Lacking of data is the main empirical obstacle. Kanna, Pope and Jain (2009) conduct an experiment on a publisher's website and collect individual level choice data. They nd that pdf format and print format are substitutes after controlling for unobserved reading taste correlation.

Hu and Smith (2011) use aggregate sales data from a publisher who stopped releasing e-book version of new published paper books for six months as a natural experiment. They conclude that delaying the release of ebooks causes in an insignicant change in overall hardcover sales but a signicant decrease in ebook sales, total sales, and likely total revenue and prot to the publisher. Most empirical work are done in a reduced form. The only exception Kanna, Pope and Jain (2009) is conducted in a static cross-section way and do not model e-Reader device adoption decision. In general, ebook reading requires device adoption in the rst place 5.

The hardware side of the market should not be neglected because it will bias the book price elasticity estimates. It also carries important consumer taste information like general reading habit. Those who adopt a device at an earlier time are more likely to be avid readers who are crucial to the book market. Allowing for heterogeneous consumer general reading taste and price elasticity helps better understand demand composition over time. I estimate a dynamic structural model using individual level purchase panel data on book purchase and e-Reader device adoption. In particular, I try to address two research questions: Are there any cannibalization or market expansion eects imposed by the ebook channel on traditional paperbacks? How are they inuenced by the heterogeneity in consumer general reading taste, the price elasticity, and the format preference? Cannibalization here is dened as the books that could have been bought in paperback format in the absence of ebook format.

**Market expansion is dened as the sales purely created by ebooks: those books will not be bought anyway without ebook version. Ebooks may cannibalize print book sales because they are cheaper and more convenient to read. Reasons for market expansion eect are: by oering consumers a lower-priced option, platforms attract more visits, which in turn leads to higher sales of paperbacks. It also encourages consumers to try authors and genres 3Publishers tend to delay ebook release in the hope of not cannibalizing hardcover sales. For instance, in early 2010, Hachette Book Group delayed the ebook release of nearly all titles by 3 to 4 months. **

Simon & Schuster delayed ebook release of 35 major titles by 4 months. Hu and Smith (2011) studies the eectiveness of this strategic delay in a reduced form model.

4 In February 2012, Amazon.com removed more than 4,000 e-books from its site this week after it tried and failed to get them cheaper from I.P.G., one of the country's largest book distributors (http://bits.blogs.nytimes.com/2012/02/22/amazon-pulls-thousands-of-e-books -in-dispute/). On April 11, 2012, the United States Department of Justice has sued Apple and ve major book publishers, accusing them of colluding to raise e-book prices (http://mediadecoder.blogs.nytimes.com/2012/04/11/justice-les-suit-against -apple-and-publishers- over-e-book-pricing/).

5The evidence is in the Appendix.

2

they may not have otherwise tried, and save budget for more new books. Furthermore, consumers may read faster because ebooks are more convenient to read and carry. I focus on consumer demand alone and treat the supply side as exogenous. The prices of e-Reader and books are taken as given from the data. In my model, there is only one platform Amazon.com selling e-Reading device and books in both paperback and ebook formats 6. I abstract away from oine book sales such as local book stores for data availability reasons. Also, while the total book sales in 2011 is $13.7 billion, the sales on Amazon.com is $7.96 billion 7.

As Amazon.com is the major book seller both in ebook and paperback, my model can cover a large size of the market. I focus on the third generation Kindle which does not have functions other than e-Reading 8. The shopping environment, where consumers buy both ebooks and paperbacks on Amazon.com, is a good setting for studying substitution pattern. Amazon.com lists print book and its ebook version side by side on product pages, so cannibalization is most likely to occur as paperback buyers can easily become aware of the competing ebook oers. I combine three unique data sets. The rst is an individual level panel data of consumer book purchase history from January 2011 to December 2011 gathered by comScore.

It is based on a random sample of more than 2 million Internet users in the United States. There are 2922 households buying 9570 book titles on 4978 shopping trips. The data have information on the time they buy a book, the format, the price, the quantity and household demographics like family income and zip code. The second data set is the publicly available book characteristics I collect from the Amazon website. For each title ever purchased in any format, I collect data on the price, rating, number of comments, ranking, genre, publishing date, and other book characteristics (e.g. ISBN, publisher, author) of both paperback and ebook formats.

**In total, I have 15,810 pieces of title-format information. The third data set is the device adoption record gathered by comScore. It is an individual level panel data on Kindle purchase in year 2007-2011.**

I model both book purchase, format choice, and device adoption decisions. On the device adoption 6 There are also books with both hardcover and paperback versions available. I use book characteristics of the hardcover version when the paperback version is not available yet (publishers often launch hardcover rst). Once the paperback is available, I consider only paperbacks. The actual purchased hardcovers I observe in the data consist of only 10% of the total book sales so I abstract from the distinguishing hardcover and paperbacks.

7http://www.fonerbooks.com/booksale.htm 8 Kindle is the dominant e-Reader in year 2011.

According to the survey conducted by Pew Research Center in January 2012, 62% of the e-reader owners have a Kindle and 22% have a Nook. The third biggest player, Pandigital, only accounts for 2% of the market. Also, I assume that ebook reading is done on Kindle and not on other devices, so that consumers need to buy a Kindle before buying any ebooks. In practice, people may read ebooks on multiple screens - dedicated e-Readers, PCs, iPads and smartphones. But according to the survey conducted by the Book Industry Study Group in 2011, e-Reader is the dominant device: in March 2011, 60% consumers read ebooks on e-Readers, 16% on PCs, 15% on iPad, and only 9% on smartphones.

Another reason is that ebooks sold on dierent platforms (e.g. Apple iBook, Barnesandnoble Nook, and Kindle Amazon) are not compatible and subject to digital right management (DRM) restrictions. As the dominant ebook seller, ebooks sold on Amazon.com are mostly likely read on Kindle. I show more supporting evidence in the Appendix.

3

side, every period dynamically optimizing consumers may choose to buy a Kindle if they have not bought one or wait. The ow utility of having a device purely comes from book purchase: consumers will have an enlarged choice set including both paperbacks and ebooks. Consumers are forward- looking on device adoption decisions and have rational expectation over future book purchase. In general, ebooks are cheaper and e-Reading brings extra utility as it is more convenient. Consumers need to trade o between the option value of buying the device and the current device price.

**On the book purchase side, I model both consumer's decision of buying a title and the format choice using a static nested logit model. The two formats share the same content-related characteristics while dier in price and format utility. The device adoption side and book purchase side are linked because (1) consumers take current and future book purchase utility into account when buying the device; (2) device adoption status aects consumer's book choice set in that they cannot read an ebook unless they have bought the device.**

The structural estimation follows the nested algorithm proposed by Rust (1987). I make one adjustment. The state space includes prices and other book characteristics of all the book titles in two formats and therefore is too large for practical estimation. To reduce the dimensionality, I use a single inclusive value for each format from the static book purchase side when solving the Bellman equation of the dynamic device adoption decision. Similar approach has been adopted in Hendel and Nevo (2006). For each iteration, I calculate the expected value functions in the inner loop and use MLE in the outside loop.

To get an initial guess for the parameters, I estimate a static discrete choice model only with book purchase side data. This estimation yields consistent, but potentially inecient, estimates of the parameters. In the second step, using the estimates from the rst stage, I compute the inclusive values associated with each format and their transition probabilities. Finally, I solve the simplied dynamic problem which involves device purchase choice exclusively. Rather than having the state space include prices and other characteristics of all the books, it includes only a single inclusive value for each format.

My model estimates allow me to quantify the degree of cannibalization and market expansion eect. Taking supply side prices as exogenously given, counterfactual simulation shows that 2/3 of the ebook sales come from cannibalizing print books and 1/3 purely come from market expansion. Interestingly, I nd that the substitution patterns for popular and niche books are dierent. People tend to view ebooks as a stronger substitute for print books when they buy a niche book. In terms of welfare, consumer surplus increases by $709.5 million in year 2011 because of the introduction of ebooks. Counterfactuals show that under the agency contract, where publishers set ebook prices, publishers have dierent optimal ebook pricing strategies given dierent e-Reader prices set by the platform.

In particular, if Kindle price is low, it is better for the publishers to set a high ebook price to recover from the cannibalization loss. If the Kindle price is high, publishers can safely set a relatively low ebook price and still benet from selling both ebooks and print books. This is because in that region, market expansion eect dominates cannibalization loss. Under the wholesale contract, where 4

platforms set ebook prices, platform need to trade o between revenues from device sales and book sales. This paper is the rst to use representative individual level observations of actual purchasing data and structurally estimate the degree of cannibalization and market expansion. I start from the micro foundation of utility maximizing at the individual level and allow for consumer's heterogeneous reading taste, dierent price elasticity on both device adoption and book purchase, and extra format utility from e-reading. The data cover a broader range of consumers, book titles, and genres, comparing to the extant literature where only one publisher in a particular policy setting is studied.

My model also contribute to the literature by taking into account device adoption decision. Extant literature all abstract from this fact and only concentrate on the format choice. However, buying the device before buying an ebook is common practice in this industry and cannot be ignored. It is also one of the strategic mix which can be used to lock-in consumers in platform competition. Moreover, estimation results show that models that do not take dynamic device adoption decision into account substantially underestimate the price elasticity of books.

The device adoption and book purchase setting relates to the literature on two-sided market. Robin Lee (2012) builds a dynamic structural model on video game and console demand using aggregate price and sales panel data. Several features dier in the video game industry and book industry. First, most of the video games are exclusive to a particular console. So the main benet of buying a console is a larger choice set of games. For the ebook market, most of the book titles are available in both ebook and paperback format. Thus channel advantages in price and reading experience are the driving force of e-Reader purchase rather than content exclusivity.

**Second, people tend to wait for game or console price drop, which is less obvious for books 9. Third, people face far more titles of books than games and the distribution is quite disperse 10. I deal with this problem by assuming that each book purchase is an independent choice and only the two formats with the same content are in their choice set. Consumers also have the outside option of not buying.**

In the next section I present the literature review. In Section 3, I describe the data set and the ebook industry. Section 4 presents the dynamic demand model of both e-Reading device and books. Estimation method is discussed in Section 5. I present the estimation results and model t in Section 6. I evaluate own- and cross- elasticities, calculate the degree of cannibalization and market expansion, and conduct supply side counterfactuals in Section 7. Section 8 concludes. 2 Literature Review My research is related to several streams of extant work. The rst stream studies the interactions 9The average book price is lower than game's.

Price drop often happens along with the launching of a new edition of books. Otherwise, price remains at.

10The number of games available is in thousands while books are in millions. Besides, a hit game can drive console sales (Robin Lee 2012) which is rarely the case for ebooks and e-Readers. 5

between the Internet and brick-and-mortar economies. There is a large marketing literature analyzing cannibalization and release timing in media channels (e.g., Lehman and Weinberg 2000, Luan and Sudhir 2006, and Prasad et al. 2004). Balasubramanian (1998) models a horizontally-dierentiated traditional channel and analyzes how this channel changes in the presence of an Internet retailer.

He shows that the e-retailer acts as a wedge between the competing retailers and the retailers compete with the e-retailer instead of competing with each other. Yoo and Lee (2011) extend the Balasubramanian model to account for heterogeneous customer preferences for the e-channel, and show that introduction of the e-channel does not necessarily intensify competition, a result contrary to common intuition. In an empirical context, Deleersnyder et al. (2002) nd that the introduction of online newspapers results in a relatively small cannibalization of physical newspaper sales, Biyalogorsky and Naik (2003) nd that the introduction of online storefronts for music does not signicantly cannibalize physical record sales, Waldfogel (2007) shows that Youtube viewing has only a small negative impact on television viewing, and Danaher et al.

(2011) show that the presence of the iTunes distribution channel has no statistical impact on DVD sales, but results in a large reduction in digital piracy. In terms of the online sales of books, Hu and Smith (2011) use data from the publisher under a natural experiment setting and nd that delaying the release of ebooks causes in an insignicant change in overall hardcover sales, but a signicant decrease in ebook sales. Structural models demonstrate further evidence. Gentzkow (2007) studies online newspapers and print ones; Kanna, Pope and Jain (2009) conduct an experiment in pdf and print format choice and analyze optimal pricing strategies for dual channel print and ebook publishers.

**They all nd a strong substitution pattern after controlling for the complementarity from positively correlated taste. Ghose, Smith, and Telang (2006) show that Internet channels for used books result in a relatively small cannibalization of new book sales.**

The second stream of literature is the two-sided market. Two-sided (or more generally, multi- sided) markets are dened as markets in which one or several platforms enable interactions between end-users, and try to get the two (or multiple) sides on board by appropriately charging each side. There is indirect network externality between the two sides in that the number of end-users on one side aects the number on the other side. The ebook market is two-sided because platforms like Amazon and Barnesandnoble get books from publishers and sell books to the consumers. On the demand side, Robin Lee (2012) builds a dynamic structural model on video game and console demand using aggregate price and sales panel data.

On the supply side, a key question is how platforms price to the users on the two sides. Armstrong (2005), Caillaud and Jullien (2003), and Rochet and Tirole (2003) each provide theoretical frameworks of two-sided markets to explain how the structure of prices is determined, with a monopoly platform sets prices or two competing platforms. Evans (2003a, 2003b) provide more examples and discussion of such markets. But the supply side optimal strategies are crucially built on the demand side properties, in particular, the degree of cannibalization and market expansion. My paper serves to ll the gap.

6

The third stream of literature studies ebook market in general. Bounie et al. (2011) nd that books of dierent genre have dierent success possibilities in ebook market and ebook format helps save old books. Oestreicher-Singer and Sundararajan (2010) build an analytical model and conduct reduced form regressions on copyright's impact on e-book prices. Hu and Smith (2011) nd that delaying the release of ebooks aects hardcopy sales of popular books signicantly, but not the niche ones. Since these ndings all come from reduced form regressions, the mechanism behind, however, is not clear.

The device adoption part of my demand side model is also related to the literature on dynamic con- sumer choice models for durable goods (Melnikov 2001; Gowrisankaran, Rysman 2012; Gowrisankaran, Park, Rysman 2010). The dierence is that my model deals with a two-sided market. 3 Industry and Data Description 3.1 The U.S. ebook market Since Amazon launched its rst Kindle in 2007, there is signicant growth in ebook market. Ebook sales in the US. is $90.3 million in 2011, an increase of 202% compared to 2010 11. Amazon.com, the largest online seller of print and ebooks, reported that Kindle book sales surpassed Amazon's total hardcover sales in July 2010, and surpassed total print sales as of April 1, 2011.

The advantage of ebooks are 1) instant delivery: there is no shipping cost, which enables frequent shopping at a low cost; 2) lower management and storage cost, which makes purchasing and storing multiple books easier; 3) e-Readers are easy to carry and convenient to read, especially while traveling, which potentially makes people read more and faster. The limitation is that consumers need to buy an e-Reader rst.

**Below I list some stylized facts about the ebook industry. The gures, otherwise indicated, are calculated from my data set which are consistent with recent surveys. (1) The existence of dual channels: traditional paperbacks and ebooks. The sales of ebooks grow at three-digit rate while paperback sales almost remain the same over the past few years 12. The books that make to the bestsellers are also dierent in the two channels. Among the bestselling paper books, 27% are ctions, 23% non-ctions and 24% practicals. The numbers in e-book market are 70%, 12% and 8% (Bounie et al. 2011).**

(2) Signicant Consumer Heterogeneity: In my data sample, the avid readers (13.8%) account for nearly half (46.8%) of total book purchase.

304 out of 2922 consumers have ever bought an ebook in year 2011, which is consistent with the survey result: 21% of American adults have read e-books. These consumers are avid readers in all formats: 88% of those who read e-books in the past 12 months 11the Association of American Publishers February 2011 Sales Report, http://www.publishers.org/press/30/ 12According to May 2012 AAP report (http://www.publishers.org/press/68/), total ebook net sales revenue for 2011 was $21.5 million, a gain of 332.6% over 2010; this represents 3.4 million ebook units sold in 2011, which increased by 303.3 %.

As a comparison, print formats (hardcover, paperback and mass market paperback) increased by 2.3% to $335.9 million in 2011.

7

Table 1: Summary Statistics of Book Purchase History household characteristics number of observations mean s.d. min max number of trips 2922 1.7 2.1 1 52 number of genres ever bought 2922 1.6 1.1 1 9 household income 2922 4.7 1.9 1 7 number of books within trip 4978 1.9 1.8 1 46 also read print books. They also tend to read more and buy books more frequently 13. Consumer heterogeneity in general reading taste can also be reected in the device adoption decision. Intuitively, those who adopt a device are more likely to be book reading fans and purchase more books on a more frequent basis.

My model is able to capture this pattern.

**(3) Seasonality: The book industry exhibits considerable seasonality in book purchase. November, December and January are the holiday months when people buy more books. The situation is the same for Kindle sales. (4) Prices: For 75.2% of the book titles, ebook price is lower than paperback price. (5) Availability: ebook format availability increases over time. The number of ebooks available in Kindle store increases from 796,131 in January 2011 to 1,112,876 in December 2011 14. The percentage of non-availability of e-format in my data set drops from 45% to 30% (subject to small sample error). **

This number is around 10% - 15% in general according to the survey 15.

3.2 Data description I focus on Kindle Amazon only because it is the dominant e-Reader in the year 2011. I combine three unique data sets. The rst is an individual level panel data of consumer's book purchase history from January 1, 2011 to December 31, 2011 gathered by comScore. It is based on a random sample of more than 2 million Internet users in the United States. Each consumer is identied by a machine id which indicates the machine he uses to access the website. For each consumer, each access of the website is recorded and identied by a session id, whether she bought something or not.

A consumer can have several shopping trips over the year. I observe the time they buy a book, the book title, the price and the format. I also observe demographics such as household income, family size, zip code, etc. There are 2922 consumers, with 9570 book titles purchased on 4978 shopping trips. Among them, 304 consumers have ever bought at least one ebook and 732 ebooks are bought in total. Table 1 shows the summary statistics.

The second data set is the publicly available book characteristics I collect from the Amazon website. For each title ever purchased in any format, I collect data on the price, rating, number of comments, 13Pew Research Center's Internet & American Life Project, April 2012 http://libraries.pewinternet.org/2012/04/04/the- rise-of-e-reading/ 14http://ilmk.wordpress.com/category/analysis/snapshots/ 15the Association of American Publishers February 2011 Sales Report, http://www.publishers.org/press/30/ 8

Table 2: summary statistics of book characteristics book characteristics (#obs=15810) mean s.d.

min max rating 4.2 0.6 1 5 # comments 171 5320 2 11,826 price 22 60 0 3,451 ranking 342,475 891,584 3 1.21e07 time since publishing date 17,109 2,728 -18,262 19,176 (in Sta ta date format, days since 1-Jan-1960) (yr=2006) (7.5yrs) (yr=1910) (yr=2011) Figure 1: Kindle monthly sales (2007-2011) ranking, genre, publishing date, and other book characteristics (e.g. ISBN, publisher, author) of both formatspaperbacks and ebooks. In total, I have 15,810 pieces of title-format information. Table 2 summarizes the book characteristics.

The third data set is the device adoption record gathered by comScore. It is an individual level panel data on Kindle purchase in year 2007-2011. I observe the time they buy a Kindle, the price and the quantity. I plot the monthly sales and cumulative sales of Kindle in my sample in Figure 1 and Figure 2. It follows a typical durable good sales pattern. The holiday months particularly drive sales. The rst and the third data set sample dierent groups of consumers some of which overlap. From the overlapping part of the sample, I can see that the the month they adopt a device and the month they start buying ebooks are the same most of the time.

**So for each consumer in the rst book purchase data set, I observe the time they start buying ebooks and I assume that they adopt a device in that period.**

I also observe the Kindle price throughout the year. The price path is a step function, where the price of the third generation Kindle with wi and 3G dropped only once in July from $ 189 to $139. In the e-Reader market, Amazon Kindle has long been the dominant product since its launching in 2007. Barnes & Noble released its Nook in November 2009 and Apple started selling ebooks with iPad 9

Figure 2: Kindle cumulative sales (2007-2011) in April 2010. For all of them, the prices remain unchanged most of the time. To summarize, I model both book purchase decisions and device adoption decisions.

I try to t the following observed pattern: (1) consumer's book purchase history: whether they by a particular title and which format they choose. There are 2922 consumers, with 9570 book titles purchased on 4978 shopping trips. Among them, 304 consumers have ever bought at least one ebook and 732 ebooks are bought in total; (2) their device adoption timing. I group the consumers into two groups according to the number of books they bought in the initialization period (the rst six months of year 2011, when almost no ebook has been bought). Figure 3 plots the adoption month by type: ordinary readers and avid reader.

Avid reader is dened as the consumers who bought more than 5 books in year 2011, where 5 is the 15% quantile of the number of books bought in my data set. I changed this cuto in my estimation and the results are robust. Most people do not have a Kindle before August in my data set 16. An avid reader expects that she will benet more from device adoption, so she will buy it earlier than others. Non-avid readers catch up in November and December because during holiday months they buy more books in general and device adoption will benet them more in those months. 4 Model Setup Consumers make decisions about device adoption, book purchase and book format.

Every month, those who haven't bought Kindle consider buying one or waiting for the next period given the current device price. Given their device adoption status, they also decide whether to buy books available in their choice set and in which format. They can always purchase a paperback whether they have a Kindle or not. The benet of buying a Kindle purely comes from ebook purchase: consumers will have 16This is consistent with the survey conducted by Pew Institute : less than 5% survey respondents have a Kindle by May 2011. http://pewresearch.org/databank/dailynumber/?NumberID=1275. My data sample is thus representative in this sense.

10

**Figure 3: Adoption Month Plot by Type: observed Note: avid reader is dened as the consumers who bought more than 5 books in year 2011, where 5 is the 15% quantile of the number of books bought in my data set. 11**

an enlarged choice set including books both paperback and ebook versions. Consumers are forward- looking on device adoption decisions and have rational expectation over future book purchase. In general, ebooks are cheaper and e-Reading brings extra utility as it is more convenient. Consumers need to trade o between the option value of buying the device and the current device price.

I make two assumptions on book purchase for feasibility and tractability reasons. First, consumers do not view book titles as substitutes for one another. This assumption is commonly used in media and content industry (e.g. Robin Lee 2012). For each book title, they choose between its ebook version and paperback version, or the outside option of not buying. They do, however, regard the same book in two formats as substitutes because paperback and ebook formats have the same content. I use a nested logit model to capture this demand structure. The rst nest is whether to purchase the book and the second nest is in which format.

Second, consumers are myopic in that they do not wait and only make static take-it-or-leave-it decisions. This assumption is reasonable for book industryunlike video games, where consumers often expect the price of a game to fall within several months, book prices are stable and consumers do not have an explicit reason for waiting. Another reason is that there are millions of books available on Amazon.com and each can be viewed as a separate market. My data set alone consists of thousands of books and consumers simply will not track every book and make dynamic decisions over time. They do have an expectation over total utility from book purchase when buying the device and I'll show it in the model setup later.

I face two challenges imposed by the features in my data. First, a consumer can buy several books in the same time period. I assume each book purchase is an independent decision because the purchase and consumption of books do not happen simultaneously. A consumer can make independent purchase decisions over time and buy them all at once for later reading. This assumption allow me to abstract from the multiple discreteness feature in my data set, in the sense that I do not model how they choose the quantity of books they buy 17. Still, the total utility from purchasing those books are taken into account when buying the device.

So people buying several books at once is more likely to adopt a device in that period. The second challenge is that consumers do not buy books every month. Hendel and Nevo (2006) model consumer's inventory of detergents to capture the purchasing arrival rate. Unlike detergents, book consumption is not that regular and mostly need-based (e.g. for leisure or for studying). It is hard to t it into an inventory model where there is a relatively constant consumption rate. Also, I do not observe consumer's inventory of books and I only look at their purchasing behaviors on Amazon.com. The focus of my paper is the format choice, so I take a static stance on the book purchase behavior and do not model either consumer inventory of books or their book purchase timing.

See Section 4.2 for modeling details.

**17There are papers that focusing on modeling the multiple discreteness feature (Hendel 1999, Dube 2004). The focus of my paper is the format choice, so I abstract from the multiple discreteness problem. 12**

4.1 Device Adoption The consumer has an innite horizon and discounts the future at rate δ. At some point in time, she may buy a Kindle. Let subscript 0 denote the device adoption status that the consumer does not have a Kindle and subscript1 if she has one. A consumer who does not have a Kindle at time t receives utility ui0t = σf fi0t + εi0t (1) where fi0t≡ f (Θi, Ψi0t) is the ow utility from reading paperbacks she buys in period t and εi0t is an idiosyncratic shock.

The ow utility is a function of the consumer's book purchase preference Θi and the characteristics of the paperbacks. In particular,Ψi0t = wk, pP k k∈Ki0t denotes paperback book characteristics that are in the choice set Ki0t, which include book price pP k , and other book characteristics wk such as rating, ranking, number of comments, the time since publication and a holiday month dummy. Holiday month is dened as November and December. I describe how this ow utility is calculated in Section 4.3. The coecient σf allows the error terms for device adoption side and the book purchase side to have dierent standard deviation and is indispensable (c.f.

Train 2003).

If she buys a Kindle in the current period, she receives utility ui1t = σf fi1t − αHW i pt + εi1t (2) where fi1t ≡ f (Θi, Ψi1t) is the ow utility from reading both ebooks and paperbacks she buys in period t and εi1t is an idiosyncratic shock. Ψi1t = wk, pP k , pE k k∈Ki1t denotes both ebook and paperback characteristics that are in the choice set Ki1t, which include book prices of the two formats pP k , pE k , other book characteristics wk that are shared by the two formats such as rating, ranking, number of comments, the time since publication and a holiday month dummy. αHW i is the price coecient on the hardware side.

pt is the Kindle price. I also include a holiday month dummy to capture the seasonality of sales.

Consumers are forward-looking on device adoption decisions and have rational expectation over the utility from book purchase. I assume that they think the device price in the future is the same as the current one. I can allow for a richer rational expectation such as an AR(1) process (Gowrisankaran and Rysman 2012). However, Gowrisankaran and Rysman's data sample period is over 5 years, and the dynamics they observe is much richer. In my case, (1) In year 2011, Kindle price remains constant except for a price drop in July. (2) The e-Reader is a new kind of durable good in the market and it is relatively dicult for the consumers to predict the exact timing of price drop.

**So I assume they have a at device price expectation. I examine another specication where consumers have perfect foresight on the device price and the estimates hardly change.**

13

The state space includes her current device adoption status ιit = {0, 1} where 0 indicates not having a Kindle and 1 indicates having one, her time-invariant book purchase preference Θi, the characteristics of the books in her choice set {Ψi0t, Ψi1t}, the Kindle price pt, and the idiosyncratic shocks. Denote Ωit = {Θi, Ψi0t, Ψi1t, pt}. I drop subscript i for notation simplicity. Denote the idiosyncratic shock vector ~ εt ≡ {ε0t, ε1t}. Let V (1, Ωt, ~ εt) denote the value function of a consumer already having a Kindle at the beginning of the period.

Let V (0, Ωt, ~ εt) denote the value function of a consumer who does not. Conditional on not having a Kindle, Dt = 1 indicates that she chooses to adopt the device and Dt = 0 indicates that she waits.

The Bellman equation for a consumer already owning a Kindle at the beginning of the period is V (1, Ωt, ~ εt) = σf f1t + δE [V (1, Ωt+1, ~ εt+1) | Ωt] + ε1t (3) It is an absorbing state because the consumer keeps the Kindle in all the future periods. The Bellman equation if she doesn not have a Kindle at the beginning of the period is V (0, Ωt, ~ εt) = max{σf f0t + δE [V (0, Ωt+1, ~ εt+1) | Ωt, Dt = 0] + ε0t, σf f1t − αHW pt + δE [V (1, Ωt+1, ~ εt+1) | Ωt, Dt = 1] + ε1t} (4) The rst element of the max operator is the choice-specic value function of waiting and the second is the the choice-specic value function of buying.

Notice that both f0t and f1t are functions of Ωt. Assume ~ εt is independently distributed extreme value type I error with density g (~ ε). Let EV (·, Ω) = ´ ε V (·, Ω, ~ ε) dg~ ε denote the expectation of the value function integrated over ~ ε. Then apply the logit aggregation in Rust (1987) to equation (4): EV (0, Ωt) = ln exp σf f0t + δE [V (0, Ωt+1, ~ εt+1) | Ωt, Dt = 0] + exp σf f1t − αHW pt + δE [V (1, Ωt+1, ~ εt+1) | Ωt, Dt = 1 (5) where recall that owning a Kindle is an absorbing state and the expectation of the Bellman equation (3) is E [V (1, Ωt+1, ~ εt+1) | Ωt, Dt = 1] = E [V (1, Ωt+1, ~ εt+1) | Ωt] = σf f1t+δE [V (1, Ωt+1, ~ εt+1) | Ωt]+E (ε1t) (6) Because the error terms are assumed to be independently distributed type I extreme value errors with location parameter 0 and scale parameter 1, the mean of the error is the Euler constant E (ε1t) = 0.5772.

Notice that f1t is passed into the next period's value function and it aects the expectation 14

over fi,t+1. The probability of buying a Kindle, given that the consumer has not bought it before, is Pr (Dt = 1 | ιt = 0, Ωt) = exp σf f1t − αHW pt + δE [V (1, Ωt+1, ~ εt+1) | Ωt, Dt = 1] exp (σf f0t + δE [V (0, Ωt+1, ~ εt+1) | Ωt, Dt = 0]) + exp (σf f1t − αHW pt + δE [V (1, Ωt+1, ~ εt+1) | Ω (7) Intuitively, consumers may be motivated to buy a Kindle by three reasons: current period book purchase need, a desirable current device price, and the option value of device adoption. To see this, take the dierence of the two choice-specic value functions, we can get σf (f1t − f0t) − αHW pt + δ {E [V (1, Ωt+1, ~ εt+1) | Ωt, Dt = 1] − E [V (0, Ωt+1, ~ εt+1) | Ωt, Dt = 0]} The rst term represents the benet from an enlarged book choice set in the current period.

**So if the consumer have more books in her shopping basket that period and the ebook prices are much lower than the paperback's, she is more likely to buy the device in that period. The second term indicates that consumers will respond to a device price drop, which is exactly what I observe in my data. The third term is the option value which, as I show in the next section, is increasing in the general reading taste. An avid reader expects that she will benet more from the device adoption, so she will buy it earlier than others.**

4.2 Book Purchase Consumers are heterogeneous in their general reading taste γi and the price coecient αSW i . I allow for an extra format utility of ebooks θ, which is an intercept interacted with ebook format. I assume that consumers share the coecients on other book characteristics β such as rating, ranking, number of comments, the time since publication and the holiday month dummy. The book purchase preference parameters are summarized in Θi =

γi, αSW i , θ, β . The utility of consumer i purchasing book title k in ebook format is uE ik = vE ik + E ik = γi + θ + βwk − αSW i pE k + E ik The utility of consumer i purchasing book title k in paperback format is uP ik = vP ik + P ik = γi + βwk − αSW i pP k + P ik The utility of not purchasing is u0 ik = 0 ik γi is a general reading taste parameter and captures how much the consumer loves reading in general.

wk are observable book characteristics that are shared by paperback and ebook of the same 15

title. This includes the online ve-star rating, publishing date, ranking, the number of comments, and a dummy for holiday months. The only dierence between the two formats are the format utility θ for ebooks (e.g. better reading experience, more convenient to carry. I normalized the format utility of paperbacks to 0) and the prices. αSW i is the price coecient on the book purchase side which can be dierent from that on the device side. I model book purchase utility as the sum of content utility and format utility. First, the two formats have the same content and thus share book characteristics wk.

I estimate book characteristics coecients when modeling this content utility instead of doing xed eect for every title. This is because book purchases are quite disperse even for those top 30 bestsellers in my panel data and I cannot identify all the title xed eects. Second, I allow for dierent format utilities because ebooks are easier to carry and more convenient to read. Including the heterogeneous general reading taste parameterγi is important because avid readers, who bought more than 5 books a year, tend to buy more books and adopt the device earlier. The composition of the population who haven't bought a Kindle evolves over time as more and more people starting to have Kindle and buy ebooks.

On the supply side, this evolution has a strategic implication on the optimal pricing of Amazon and its competitors.

**A paperback and an ebook of the same content have a dierent substitution pattern comparing to two paperbacks of dierent content. To deal with the undesirable IIA problem, I assume a nested logit structure. The rst nest is whether to buy the title and the second nest is which format to buy. Dene the book purchase decision dik = 0 if she does not purchase, dik = E if she buys it in ebook format and dik = P if she buys a paperback. Let σ denote the similarity parameter between the two formats as is dened in standard nested logit literature. It can be seen as the correlation between E ik and P ik. **

The error term of not buying 0 ik is independently distributed and does not correlate with the former two errors.

Case (i): The consumer has a Kindle / choose to buy a Kindle in the current period so that her device status ιi = 1. Given consumer's type, the probability of choosing book title k in ebook format conditional on buying the title at time t is Pr (dik = E | dik 6= 0, ιi = 1) = exp vE ik 1−σ exp vP ik 1−σ + exp vE ik 1−σ The probability of buying title k is Pr (dik 6= 0 | ιi = 1) = h exp vP ik 1−σ + exp vE ik 1−σ i1−σ 1 + h exp vP ik 1−σ + exp vE ik 1−σ i1−σ 16

Thus the probability of purchasing title k in ebook format is the product of the last two probabilities: Pr (dik = E | ιi = 1) = exp vE ik 1−σ h exp vP ik 1−σ + exp vE ik 1−σ i−σ 1 + h exp vP ik 1−σ + exp vE ik 1−σ i1−σ (8) Similarly, the probability of purchasing title k in paperback format is Pr (dik = P | ιi = 1) = exp vP ik 1−σ h exp vP ik 1−σ + exp vE ik 1−σ i−σ 1 + h exp vP ik 1−σ + exp vE ik 1−σ i1−σ (9) The probability of not purchasing title k is Pr (dik = 0 | ιi = 1) = 1 1 + h exp vP ik 1−σ + exp vE ik 1−σ i1−σ For each book title, the consumer choose between buying its ebook version and its paperback version, or not buying.

So the above three probabilities sum up to 1. Intuitively, the book characteristics are shared by the two formats, and thus the content utility mainly drive the rst nest decision of purchasing or not. The format utility and dierent prices across formats drive the format choice in the second nest.

Case (ii): The consumer does not have a Kindle and choose to wait in the current period so that her device status ι = 0. In this case, she can only choose to buy a paperback or not. The two nests boil down to one. The probability of purchasing title k in paperback format is Pr (dik = P | ιi = 0) = Pr (purchase | ιi = 0) = exp vP ik 1 + exp vP ik The probability of not purchasing title k is Pr (dik = 0 | ιi = 0) = 1 1 + exp vP ik In my data set, I do not observe people not buying books. People may go online and search for books, but end up buying nothing for that period. I deal with this problem by assuming that there is an exogenous arrival rate of searching for books 18.

Based on this rate, I get a set of search timing over 18 I choose this search frequency based on two other relevant data sets - consumer's searching and purchasing history of other product categories on Amazon.com. The details are in the Appendix.

17

the year for each consumer. I then model their purchase/format decisions given the search timing and the corresponding choice set. I keep all the observed purchase timing and the corresponding choice set as search timing, because in reality there is always a search before actual purchase. For those searches that do not end up with observed purchasing - those are actually no purchase data I generate - I assume that the choice set is representative books of the same genre and same quantity as in the last observed purchase. The book characteristics of the representative book are taken as the average of all the books of the same genre in that month.

**This assumption is plausible because it is consistent with how the recommendation system on Amazon.com works. It displays similar books of the same genre based on your last purchase record.**

Notice that although I abstract from the book inventory and searching rate by assuming exogeneity, I do model consumer's purchase/format choice which I observe and can t to the data. A good model t would be that in those periods when I observe people buying books, my model also predicts that they choose to purchase books. I conduct robustness check by varying the exogenous arrival rate of searching. The estimation results are quite stable. See the Appendix for further results. 4.3 Combine the device adoption and the book purchase The device adoption side and book purchase side are linked in two ways.

First, the device adoption status aects people's choice set of book purchase. Second, ow utilility from book purchase and people's expectation over future ow utility aect their current device adoption choice. Here I describe how the ow utility fi0 and fi1 that enters consumer's Bellman equation is calculated from the book purchase side.

The timing and the information available to the consumers when making decisions are as follows: For a consumer who does not have a Kindle at the beginning of the period, she rst decide whether to buy one. Her information set include the Kindle price p, the book prices

pE k , pP k k and characteristics {wk}k in her choice set Ki0 and Ki1, and the realized device side shocks {εi0, εi1}. The book side shocks P ik, E ik, 0 ik k are not realized yet, so the consumer calculates ex-ante ow utilities from purchasing books under two device adoption status fi0 and fi1. She also forms an expectation on the ow utilities in the next period.

Based on her information set and expectations, she buys a Kindle or waits. Then she decides whether to buy a book and in which format given the realized book side shocks P ik, E ik, 0 ik k . If she buys a Kindle, she drops out of the device side market and only makes book purchase side decisions later on.

Let Ki0 denote the set of book titles available in paperback format for consumer i. Let Ki1 denote the set of both paperbacks and ebooks available. They can be seen as an exogeneous reading need or a booklist that are both time-varying and individual specic. The choice set includes the titles I observe they buy in the data plus the average titles of the same genre as their last purchase. Consumers can buy multiple books every period, and each decision is independent. The ex-ante ow utility from a single book purchase is the inclusive value, which can be thought of as a quality and taste adjusted 18

**index: fi0k = log exp γi + βwk − αSW i pP k + 1 fi1k = log ( exp γi + θ + βwk − αSW i pE k 1 − σ + exp γi + βwk − αSW i pP k 1 − σ 1−σ + 1 ) Summing over all the book titles in the choice set for that period, we get the ex-ante ow utility that enters the device adoption side equations: fi0 = X k∈Ki0 log exp γi + βwk − αSW i pP k + 1 (10) fi1 = X k∈Ki1 log ( exp γi + θ + βwk − αSW i pE k 1 − σ + exp γi + βwk − αSW i pP k 1 − σ 1−σ + 1 ) (11) Given parameter values and book characteristic information, fi0 and fi1 can be computed directly. Consumers form expectation about the future ow utility from reading books. **

The uncertainty comes from the choice set next period. Notice that there are titles available only in paperback format or ebook format, so Ki0 and Ki1 are not necessarily the same set of titles. 4.4 The likelihood function The probability that consumer i's device status history ιi = {ιit} T t=1 is Pr (ιit = 0) = t Y τ=1 Pr (Diτ = 0 | ιi,τ−1 = 0, Ω) Pr (ιit = 1) = 1 − Pr (ιit = 0) where Dit is the device adoption decision at time t. The probability of having no Kindle by time t is the product of the probabilities of not buying Kindle in all the past periods τ = 1 , t − 1. The probability of having and not having a Kindle sum up to 1.

Given observed device status history {ιit}i,t and book purchase/format choice {dikt}i,k,t, the like- lihood function consists of two kinds of probabilities: book purchase probability and device status probability. For consumer i, title k at time t, there are ve cases: 19

(i) Having Kindle, purchase in ebook format: Pr (dikt = E | ιit = 1) Pr (ιit = 1); (i) Having Kindle, purchase in paperback format: Pr (dikt = P | ιit = 1) Pr (ιit = 1); (i) Having Kindle, do not purchase: Pr (dikt = 0 | ιit = 1) Pr (ιit = 1); (i) No Kindle, purchase in paperback format: Pr (dikt = P | ιit = 0) Pr (ιit = 0); (i) No Kindle, do not purchase: Pr (dikt = 0 | ιit = 0) Pr (ιit = 0).

The total likelihood function can be written as L = Y i,k,t [Pr (dikt = E | ιit = 1) Pr (ιit = 1)] ιit1{dikt=E} · [Pr (dikt = P | ιit = 1) Pr (ιit = 1)] ιit1{dikt=P } · [Pr (dikt = 0 | ιit = 1) Pr (ιit = 1)] ιit1{dikt=0} · [Pr (dikt = P | ιit = 0) Pr (ιit = 0)] (1−ιit)1{dikt=P } · [Pr (dikt = 0 | ιit = 0) Pr (ιit = 0)] (1−ιit)1{dikt=0} 5 Estimation I use discrete types to capture consumer heterogeneity. In particular, the price coecients on the de- vice adoption side and book purchase side can each take two values both high αHW,High , αSW,High or both low αHW,Low , αSW,Low so is the general reading taste γi :

γHigh , γLow .

Thus there are four time-invariant types of consumers which depend on observed household characteristics such as household income and the number of books bought in the initialization period 19. The parameters to be estimated are αHW,High , αHW,Low , αSW,High , αSW,Low , θ, γHigh , γLow , β1, β2, β3, β4, β5, σf , σ, βx The rst four parameters are price coecients of device and book purchase. θ is the e-format utility. γ is the general reading taste. β's are the coecients for book characteristics: ranking, rating , the number of comments, the time since publication, and a holiday dummy. σf is the coecient of the ow utility when it enters the device adoption utility.

**σ is the nested logit parameter. βx is the coecient 19In my data set, most consumers start to buy ebooks since August, 2011. As consumers may read faster and more frequently because of the convenience from e-reading, I safely take the rst six months as the initialization period so that they are not inuenced during that period.**

20

for holiday dummy in device adoption utility. The structural estimation follows the nested algorithm proposed by Rust (1987). I make one adjustment. The state space includes prices and other book characteristics of all the book titles in two formats and therefore is too large for practical estimation. To reduce the dimensionality, I use a single inclusive value for each format from the static book purchase side when solving the Bellman equation of the dynamic device adoption decision. Similar approach has been adopted in Hendel and Nevo (2006). I describe the assumption made for this approach below.

5.1 A Simplication Assumption on Flow Utility The individual level panel data allow me to estimate the model using MLE. To get the likelihood function, I need to calculate two kinds of probabilities: the device adoption probability Pr (Dit | ιi,t−1 = 0, Ω) and the book purchase probability Pr (dikt | ιit). The book purchase probability is only a static prob- lem while the device adoption decision requires solving a dynamic programming problem. To solve the dynamic problem, I need to calculate the expected value function in the inner loop given the parame- ter guess. It is a xed point solution to the expected value function equation (5).

Notice that f is a function of consumer type Θi and the book characteristics {Ψi0t, Ψi1t}. Consumers have expectation on the transition probability of the state space. Since Θi is time-invariant, the transition probability of {Ψi0t, Ψi1t} is the only relevant argument to get the evolution of ft ≡ {f0t, f1t}. In practice, the book characteristics information {Ψi0t, Ψi1t} is a large dimensional vector and the dimension is proportional to the number of books available. I proceed by making use of the fact that the inclusive value f, rather than {Ψi0t, Ψi1t}, is directly relevant in the dynamic programming problem.

I use an approach similar to Hendel and Nevo (2006) so that the dynamic problem can be rewritten in terms of the inclusive values and the state space collapses to a single index per format. For this to work, I make the following assumptions: Assumption : Inclusive V alue Sufficienty (IV S) : F (ft | Ωt) can be summarized by F (ft | ft−1) The IVS assumption and the expected value function equation (5) imply that all states with the same f have the same expected value function. Instead of keeping track of all the book characteristics and forming a multidimensional expectation, the consumer follows only two quality adjusted indices f0t and f1t.

While this assumption can be interpreted as a literal assumption on how the industry evolves, it is perhaps more attractive to think of it as an assumption on how bounded rational consumers perceive this market. The IVS assumption is valuable, since I can now replace Ωt with ft (along with device price pt) in the state space, rewriting (5) as 21

**EV (0, ft, pt) = ln exp σf f0t + δE [V (0, ft+1, pt+1) | ft, Dt = 0] + exp σf f1t − αHW pt + δE [V (1, ft+1, pt+1) | ft, Dt = (12) providing a tractable three-dimensional state space. Gowrisankaran and Rysman (2011) and Mel- nikov (2001) formally prove that Bellman equations (5) and (12) are equivalent. Notice that the tastes Θi are heterogeneous and household-specic, so the inclusive values and their future distribu- tions F (ft+1 | ft) are calculated separately by type. Also, I solve the dynamic program and get the expected value functions separately for each household type.**

Within this context, I assume rational expectations on book purchase in that the consumer is on average correct about the future.

I let F (ft+1 | ft) be its actual empirical density tted to a simple AR(1) process: ln ft+1 = (1 − ρ) µ + ρ ln ft + υt (13) where υt is normally distributed with mean 0. I allow for dierent parameter values for f1t and f0t. In the Appendix, I check the validity of this assumption by plotting the tted error terms over time. No serial correlation is found, indicating that this is an acceptable assumption. Similar assumptions have been used in the existing dynamic literature (Melnikov, 2001; Hendel & Nevo, 2006; Gowrisankaran & Rysman 2012) 20. With the IVS and rational expectations, the optimal consumer decisions given an industry environment are dened by the joint solution to the expectation Bellman equations (5) and (6), the logit inclusive value (10) and (11), and the industry evolution regression (12).

While the following subsection discusses the computation in detail, note for now that to compute optimal device adoption decisions, it is necessary to jointly solve these three set of equations and not just the Bellman equation. The reason is that a dierent Bellman equation (as would occur under a counterfactual policy environment) implies dierent values of f which imply dierent AR(1) coecients, which in turn imply a dierent Bellman equation.

5.2 Computation Algorithm The book purchase side is a static model where people make purchase and format choices on the spot. The device adoption side is a dynamic model and I need to solve the Bellman equation. For each iteration, I calculate the expected value functions in the inner loop and use MLE in the outside loop. 20 The IVS assumption is restrictive. For example, the inclusive value could be high either because there are many books bought all with low qualities and high prices, or because there is a single book bought with a high quality and low price. While dynamic prot maximization might lead these two states to have dierent patterns of industry evolution, consumers in my model will lump them into the same state.

Hendel & Nevo (2006) and Gowrisankaran & Rysman (2012) provide a similar discussion of the implication of IVS assumption. As I assume exogeneity of supply side pricing and the book market is highly disperse, this limitation is not that severe in my model. 22

The rst step consists of maximizing the likelihood of observed book format choice conditional on the e-Reader adoption status to get the book reading taste parameters β, price elasticity αSW i , and format preference θ. I do not need to solve the dynamic programming problem to compute this probability. To get an initial guess for the parameters, I estimate a static discrete choice model only with book purchase side data. This estimation yields consistent, but potentially inecient, estimates of the book purchase side parameters. In the second step, using the estimates from the rst stage, I compute the inclusive values associated with each format and their transition probabilities.

**Finally, I solve the simplied dynamic problem using value function iteration, which involves device purchase choice exclusively. Rather than having the state space include prices and other characteristics of all the books, it includes only a single inclusive value for each device adoption status. The estimation proceeds as follows: (i) Start with the book purchase side. Given a set of parameter guess, calculate the the book purchase probabilities Pr (dkt | ιt) and the inclusive values associated with dierent book formats f0t and f1t . The transition probability matrix F (ft+1 | ft) is also calculated using Tauchen method. **

The coecients of the AR(1) process are directly calculated, instead of estimated, by tting the values of f over time 21.

(ii) Feed the ow utility to the device adoption side. Given the inclusive values f0t and f1t , I solve the simplied dynamic programming problem and get the expected value functions. The tastes are heterogeneous and household-specic, so the inclusive values and their future distribution F (ft+1 | ft) are calculated separately by type. (iii) Given the expected value function, calculate the device adoption probability Pr (Dt = 1 | ιt = 0, Ωt) = exp σf f1t − αHW pt + δE [V (1, ft+1, pt+1) | ft, Dt = 1] exp (σf f0t + δE [V (0, ft+1, pt+1) | ft, Dt = 0]) + exp (σf f1t − αHW pt + δE [V (1, ft+1, pt+1) | ft (iv) Combine the probabilities from book purchase and device adoption to form the total likelihood.

Search over the parameter space using simplex algorithm in Matlab to get the maximizer of the likelihood function.

23

Table 3: Parameter Estimates of Demand System variable (i) (ii) (iii) Estimate s.e. Estimate s.e. Estimate s.e. Device αHW,High - -0.1448** 0.0084 -0.1503** 0.0079 αHW,Low - -0.1544** 0.0113 - σf ow utility coecient - 2.3982** 0.1636 2.4754** 0.0968 βx holiday_dummy - 1.1270** 0.2506 1.0713** 0.1885 Book β1ranking -0.0013** 6.5275e-5 -0.0010** 4.0469e-6 -0.0010** 9.9125e-6 β2 rating 0.0673** 0.0140 0.0466** 0.0002 0.0455** 0.0059 β3 comments -0.0141 0.0173 0.0029 0.0048 0.0029 0.0124 β4 since publication -0.0624** 0.0052 -0.0029** 1.467e-5 -0.0028** 9.847e-6 β5 holiday_dummy 1.4280** 0.0527 1.1169** 0.0157 1.1733** 0.0351 γHigh general reading taste 1.6889** 0.0610 1.3897** 0.0542 1.4266** 0.0030 γLow 0.3242** 0.0316 0.0006** 9.2702e-5 0.0003** 2.0822e-5 θ e-format utility 2.3637** 0.1995 3.6176** 0.1489 3.6199** 0.1629 αSW,High -0.0001** 7.5431e-6 -0.0037** 1.1244e-5 -0.0034* 2.2359e-3 αSW,Low - -0.0031** 1.9940e-5 - σ nested logit parameter 0.0018 0.0829 0.0000 0.0436 0.0027 0.0454 loglikelihood -10330.1 -10294.2 # book Obs.

14,230 14,230 # device Obs. 35,064 35,064 ** signicant at 5% level; * signicant at 10% level. 6 Estimation Results 6.1 Parameter Estimates I use a monthly discount rate δ = 0.98522. Parameter estimates are presented in Table 3. In model specication (i), I estimate only the book purchase side parameters by taking the device inventory as exogenously given. This result serves as a baseline when comparing with the full model. Later I'll show the importance of building a model where device adoption decisions, book purchase and format choices are made simultaneously over time. The coecients have expected signs: A higher 21The Tauchen coverage is 2.5 around the mean of f0 and f1.

**I discretize the state space of f into 21 grids. Since the value function is almost a straight line and does not have much curvature, 21 grid points can capture the shape well. I vary the number of grid points and get robust results.**

22Consumers do not buy books every month. I assume they have rational expectation over future ow utility from book purchase which is captured by the AR(1) process ln ft+1 = (1 − ρ) µ + ρ ln ft + υt. Furthermore, I assume that they are also on average correct about the arrival rate of exogeneous search timing, which is every 4 months. A consumer who has a non-zero ow utility today will expect that she may have a zero ow utility next period. How does that aect the device adoption decision? The device adoption Bellman equation is the summation of all the ow utilities P∞ τ=1 fτ . Since f + δ · 0 + δ2 · 0 + δ3 · 0 + δ4 · f0 = f + δ4 · f0 , I use δ̃ = δ4 = 0.95 in the Bellman equation to get the estimation results.

Neglecting the rational expectation on the arrival rate of searching will bias upwards consumer's tendency to buy a Kindle, because it overestimates consumer's benet from ebook purchase. 24

rating increases the probability of book purchase. Lower ranking (higher popularity) and a more recent publishing date help the book sales. Holiday dummy has a positive coecient, which is consistent with the fact that people have more time reading during holiday months (November, December, and January) and thus buy more books. The coecient on the number of comments is not signicantly dierent from zero. This is reasonable because the comments can be good or bad, which may not necessarily help sales. For all types, there is a positive format utility for ebooks θ that captures the fact that e-Reading is more convenient.

The similarity parameter of the nested logit model σ is not signicantly dierent from zero. There is only heterogeneity in general reading taste γ and no heterogeneity in price elasticity in this model specication 23. Avid readers have a much larger γ and thus a higher probability of buying books holding everything else the same. The price elasticity is extremely small. As we can see from the full model estimates later, this static model underestimates the price elasticity because the device adoption side is ignored.

Column 2 reports a full model estimation with both heterogeneity in general reading taste and price elasticity. I group the consumers into four groups based on their income level and number of books bought in the initialization period. The type grouping criteria is in the Appendix. Wald's test shows that the price elasticities for the high and low income groups are not statistically dierent from each other. So I drop the heterogeneity in price elasticities in the model specication (iii) 24. Column 3 reports the full model specication that I nally used. There is only heterogeneity in reading taste 25.

For the parameters on the book purchase side, the magnitudes and signs are close to what I get from model specication (i) except for the price elasticity. Ignoring the device adoption decision seriously biases downwards the price elasticity estimates because it shuts down a channel where consumers could respond to price changes, suggesting the importance of modeling device adoption and book purchase at the same time. For the parameters on the device adoption side, (1) the price elasticities for the device adoption αHW is much higher than those for book purchase αSW ; (2)σf is larger than 1, indicating that the error term on the device adoption side has higher variance than the error terms on the book purchase side.

Idiosyncracies in preferences over any particular book title has less variance over device. Both of the two results are consistent with device representing substantially a larger purchase decision than any particular book purchase. There are several results worth noticing. First, ignoring the device adoption side will seriously bias downwards the price elasticity estimates on book purchase. Second, the price elasticities for the device adoption αHW is much higher than those for book purchase αSW . Third, there is considerable 23I also estimate a model with price elasticity heterogeneity. It turns out that the two price elasticities are not statistically dierent, so I do not report it here.

24In the Appendix, I plot the observed device adoption timing by dierent income levels (1-7). The timing does not carry any positive or negative correlation with the income level, which means that no matter how I group the consumers by income level, the price elasticities across groups will not be statistically dierent from each other. This is reasonable to the extent that reading habit in general is not much correlated with income level. 25I tried dierent cutos for grouping consumers into avid readers and general readers. The results are robust, so I do not report them here. The grouping criteria I adopted in the end is the one that has the largest loglikelihood, which is consistent with AIC and BIC rules.

25

**Table 4: Model Fit: Device Adoption and Book Purchase # people with a device # Ebook purchased # Paperback purchased observed 304 995 8575 predicted 309 899 8671 dierence in percentage 1.6% 9.6% 1.1% Table 5: Model Fit: Book Format Choice predicted: format=paperback format=ebook observed: format=paperback 4327 (98.4%) 70 (1.6%) format=ebook 306 (26.9%) 689 (73.1%) heterogeneity in general reading taste among consumers, which is informative to the supply side as those avid readers are the core consumers in this market: they read more books and buy device earlier. **

Because of incompatibility of ebook format across e-Readers (for instance, Kindle ebooks cannot be read on Nook), consumers are locked-in to a particular e-Reader once she buys the device. The number of avid readers who haven't bought any device shrinks over time. This demand composition evolution will aect the optimal pricing strategy for competing platforms such as Amazon.com and Barnesandnoble.com. My model allows for heterogeneity in general reading tastes and quanties the magnitude, providing a applicable foundation for supply side counterfactuals. 6.2 Model Fit 6.2.1 Fit device adoption and book format choice In my data, I observe both book format choice and device adoption decision over time.

Table 4 and Table 5 show my model t in these two aspects. Recall that the shared book characteristics drive the book purchase decision, while the format utility and dierent prices drive format choice. The model is able to predict device adoption and book purchase within 5% error. For the book format choice, the model predicts correctly 98.8% of the time if the observed format is paperback, and 98.2% of the time if the observed format is ebook.

I also present the purchase hazard rate model t in Figure 4. It is the hazard rate of purchasing books by duration month from last purchase. The hazard rate is the probability that one purchases if she has not purchased up to now. In my model, the arrival rate of searching for books is assumed to be exogenous. I conduct robustness check for this exogenous rate in the Appendix. Although my model abstracts from book inventory and the arrival rate of searching, I do allow the consumers to make endogenous purchase decisions. I use the model to t the observed purchase rate. The data display a clear pattern: the probability of purchase gradually increases as the number of months increases.

My 26

Figure 4: Model Fit: Book Purchase Hazard Rate model is able to capture this property. A good purchase rate t gives a good hazard rate t in Figure 4 26. Figure 5 plots the device adoption time over the year 2011. The x-axis is month. My model can capture the pattern in the data that device adoption increases after Kindle price drop in July and during holiday months. 6.2.2 Fit heterogeneous consumer behavior I plot the device adoption month by type both in the data and my model prediction in Figure 6 and Figure 7. My model is able to predict that the avid readers adopt the device earlier.

Intuitively, consumers may be motivated to buy the device because of three reasons: current period book purchase need, a desirable current device price, and the option value of device adoption. Take the dierence of the two choice-specic value functions, we can get σf (f1 − f0) − αHW p + δ {E [V (1, Ω0 , ~ ε0 ) | Ω, D = 1] − E [V (0, Ω0 , ~ ε0 ) | Ω, D = 0]} The rst term represents the benet of the enlarged book choice set in the current period. So if the consumer have more books in her shopping basket that period and the ebook prices are much lower than paperbacks, she is more likely to buy the device in that period.

**The second term indicates that consumers will respond to a device price drop, which is exactly what I observe in my data. The third term is the option value, which is increasing in the general reading taste. An avid reader, who bought over 5 books a year, expects that she will benet more from device adoption, so she will buy it early 26In my data, not all consumers buy books more than once a year. 46% of the consumers buy books only once. So the observed hazard rate of purchase is calculated based on the remaining 54% of consumers. 27**

Figure 5: Model Fit: Device Adoption Month Comparison *Note: 1-12 are months from January 2011 to December 2011. than others. Non-avid readers catch up in November and December because during holiday months they buy more books in general and device adoption will benet them more in those months. 7 Model Implications The central research question of this paper is the impact of ebook channel on traditional print book sales. My dynamic discrete choice model on both device adoption and book purchase, format choices helps quantify the degree to which ebook channel cannibalize print book sales. It also enables the calculation of any market expansion eect as ebook serves as a lower-priced and higher-quality option.

Ebooks may cannibalize print book sales because they are cheaper and convenient to read. There are also reasons for market expansion eect: by oering consumers a lower-priced option, platforms attract more visits, which in turn leads to higher sales of paperbacks. It also encourages consumers to try authors and genres they may not have otherwise tried, and save budget for more new books. Consumers may read faster because ebooks are more convenient to read and carry. Before I display the supply side implications, rst let me describe some background of this industry. Publishers and platforms care about dierent things: publishers maximize only book salesboth ebook and paperbackwhile platforms care about both device sales and book sales.

The contract between publishers and platforms have shifted from the original wholesale model initiated by Amazon in 2007 to the agency model proposed by Apple in 2010. Under the agency contract, publishers take 70% of the revenue and Apple takes 30%. Under the wholesale contract, platforms only pay a xed amount to the publishers and set the ebook prices themselves. Dierent contract schemes provide dierent 28

Figure 6: Adoption Month Plot by Type: observed Figure 7: Adoption Month Plot by Type: model predicted 29

and to some extent conicting pricing incentives for the two sides. So publishers and platforms are competing for ebook pricing rights these years. In February 2012, Amazon.com removed more than 4,000 e-books from its site this week after it tried and failed to get them cheaper from I.P.G., one of the country's largest book distributors 27. On April 11, 2012, the United States Department of Justice has sued Apple and ve major book publishers, accusing them of colluding to raise e-book prices 28.

Department of Justice forces publishers to sign agency contracts and gives pricing freedom to platforms for the next two years, in the hope that the average price of ebooks will drop back. Whether the market will respond as expected depends on the demand side characteristics that I try to explore in this paper.

In the simulations I show below, I do not assume any specic pricing contract. Instead, I try dierent pricing possibilities by changing Kindle and ebook prices. 7.1 Own- and Cross- Price Elasticities I rst calculate the own- and cross- elasticity of demand for ebooks. I look at the multinomial logit model of book purchase without the nested structure, because the nested parameter estimate σ is not signicantly dierent from zero in all the model specications. The own-price elasticity of demand is given by ηown ik = αSW i pE k (1 − Pr (dik = E)) where αSW i is the estimated parameter of book price, pE k is the ebook price, and Pr (dik = E) is the probability of purchasing title k in ebook format.

I calculate the average own-price elasticity imputed for books in each time period and take the average of this across ebooks to obtain an own-price elasticity of -0.0018. This elasticity is high, suggesting that the ebook market is competitive and small price changes have large impact on the probability of a book being sold. Indeed, consumers interpret ebook production as almost at zero marginal cost and are very sensitive to price change. The cross-price elasticity of print book sales on ebook price is similarly given by ηcross ik = αSW i pE k Pr (dik = E) which is 0.0378. This result is much higher than the ebook own-elasticity, indicating that comparing with ebook sales, print book sales is more sensitive to ebook price change.

**An interesting thing to look at is how the own- and cross- price elasticity dier by book popularity. I use the 25% quantile of book ranking as the cuto for popular and niche books. As I observe in the data, popular books and niche books are not systematically dierent in book characteristics other 27http://bits.blogs.nytimes.com/2012/02/22/amazon-pulls-thousands-of-e-book s-in-dispute/ 28http://mediadecoder.blogs.nytimes.com/2012/04/11/justice-les-suit-agains t-apple-and-publishers-over-e-book- pricing/ 30**

Table 6: Own- and Cross- Price Elasticities own-price elasticity cross-price elasticity in total -0.0018 0.0378 popular book -0.0006 0.0155 niche book -0.0020 0.0412 than ranking. The own-price elasticity for popular ebook is only -0.0006, while the number is -0.0020 for niche book. The cross-price elasticity for popular book is 0.0155, while the number is 0.0412 for niche book. The results are summarized in Table 6. These results indicate that the substitution pattern for popular and niche books are dierent. Hu and Smith (2011) also nd that popularity can moderate the cross-channel eect between ebooks and print books.

Print books can be relatively easily substituted by ebooks for niche books, as the value of keeping a paperback is lower. The own-price elasticity comparison is consistent with the fact that people are less price sensitive when buying a bestseller.

7.2 Cannibalization and Market Expansion Dene cannibalization as those books that could have been bought in paperback format if ebook does not exist. In particular, denote vE ik = γi + θ + βwk − αSW i pE k , and vP ik = γi + βwk − αSW i pP k , then cannibalization means that vE ik + εE ik > vP ik + εP ik and vP ik + εP ik > ε0 ik. Dene market expansion as those books that will not be bought in paperback anyway if ebook does not exist. These extra sales in the existence of ebook come from the fact that ebooks are generally cheaper and bear positive format utility.

Before calculating those two eects, it is note-worthy that the magnitude depends on the device adoption status.

We can expect that a higher rate of device adoption will enhance both the cannibal- ization and market expansion eects. In Figure 8, I rst plot the situation where the device adoption status is taken from the observed data. Among 9% ebook purchases, 6% comes from cannibalizing paperbacks while 3% purely comes from market expansion. I analyze the implications to the publishers and platform company in the next section.

In Figure 9, I plot the simulated situations using the demand side estimates. Holding the ebook prices as exogenously xed, I change the prices of Kindle from $0 to $ 200. This counterfactual is motivated by the platform's pricing incentive in that during the data year 2011, platforms like Amazon.com do not have pricing right over ebooks and can only control device prices. On the x-axis is the Kindle price. For the upper two graphs, the number of books sold is plotted. For the lower two graphs, the revenue is plotted. We can see that an increase in the Kindle price decreases device adoption rate, in turn hurts the ebook sales.

**Both cannibalization and market expansion eects shrink. The total number of books bought also shrinks. In terms of dollars spent on books, though, there is a total revenue increase from both ebook and paperback sales. As ebooks are priced much cheaper 31**

Figure 8: Cannibalization and Market Expansion under observed device adoption status in this time period, total revenue actually drops when ebook sales increase. This is consistent with publisher's worry that ebook sales will hurt total revenue. 7.3 Who Benets from Introducing Ebooks? In this section, I analyze the benet and loss in detail from three perspectives: the consumers, the publishers, and platform company Amazon.com. Again, I do not assume any specic pricing contract between publishers and platforms. Instead, I try dierent pricing possibilities and see how that would aect publishers and platforms based on the price elasticity of consumers for device and books, the format preference and the heterogeneous composition of consumers overtime.

7.3.1 Consumer In my sample, the consumer surplus in dollar terms can be calculated as 4CS = − E max uE ik, uP ik, u0 ik − max uP ik, u0 ik αSW i The number is $9.18 million. To extrapolate this number to all the ebooks sold on Amazon, given that the ebook sales in my sample is $21,828 and the ebook sales on Amazon is $1.687 million in 2011 29, my estimate for the total consumer surplus increase is $ 709.5 million. Consumers do benet from the introduction of this low-priced and high-quality option.

29Citi investment and research report 2011, http://techcrunch.com/2011/06/07/kindle-10-percent-sales-amazon/ 32

Figure 9: Cannibalization and Market Expansion when changing Kindle prices 33

7.3.2 Publisher Publishers sign contracts with Amazon and take a xed percentage of revenue (under the agency contract signed by Apple and the publishers in 2010, publishers take 70% and Apple takes 30%. This is applied to other platforms like Amazon). Publishers only care about total book revenue. How does total book revenue vary by ebook price and Kindle price? I plot the total book revenue in Figure 10 with Kindle price and ebook price (as a percentage of paperback, from 0.1 to 1) on the x- and y-axis.

The print book prices are xed as the baseline listed prices, and only ebook price as a percentage of this listed price changes. We can see that if publishers can set ebook prices in accordance with Kindle price set by Amazon, then the optimal price diers across dierent Kindle prices. In particular, if Kindle price is low, it is better for the publishers to set a high ebook price to recover from the cannibalization loss. If the Kindle price is high, publishers can safely set a relatively low ebook price and still benet from selling both ebooks and print books. This is because in that region, market expansion eect dominates cannibalization loss.

7.3.3 Platform Amazon cares more than publishers. Not only does it get prot from book sales, it also earns from device sales. Under current agency model contract, Amazon get 30% of the book revenue. Let pkindle , Qkindle , ckindle denote Kindle price, sales and production cost, pE k , qE k the price and sales of title k in ebook format, pP k , qP k the sales of title k in paperback format, then Amazon's prot can be expressed as ΠAmazon = Qkindle (pkindle − ckindle) + 0.3 X k pE k qE k + X k pP k qP k ! I simulate the market evolution given the demand side estimation by changing device prices and ebook price (as a percentage of paperbacks).

**First, holding ebook and paperback prices and changing the device price ($0-$200). I plot the number of books sold, revenue from books and device, and total revenue in Figure 11. On the x-axis is the device price. We can see that the total revenue is not a monotonic function of the device price, indicating a trade-o between increasing revenue from device sales and book sales. This is exactly the typical inverse-U shape relationship between revenue and sales.**

Second, changing both the device price ($0-$200) and ebook price (as a percentage of paperback, from 0.1 to 1). Figure 12 shows that the optimal device price changes as ebook price changes, which indicates that the two variables have strategic interactions. In the analysis below, I introduce further the cost information and try to present a picture of the protability. From the news, the cost of Kindle is $201.7 30. The prot sharing contract between the 30http://www.isuppli.com/Teardowns/News/Pages/Amazon-Kindle-Fire-Costs-$201 -70-to-Manufacture.aspx 34

Figure 10: Total Book Revenue as a Function of Kindle Price and Ebook Price 35

Figure 11: Number of books sold, revenue from books and device, and total revenue Figure 12: Total Revenue as a function of device price and ebook price 36

Figure 13: Number of books sold, prot from books and device, and total prot publishers and Apple is an agency model where publishers set the book prices and Apple kept 30% of the revenue starting from 2010 when iPad started to sell ebooks in iBook 31. Taking into account these cost information, I recalculate to get the total prot. Figure 13 plots the number of books sold, prot from books and device, and total prot.

Figure 14 plots the total prot by changing device and ebook prices. The total prot is a monotonically increasing function of the device price. However, as I am plotting the prot for only year 2011 and device sales can be viewed as an early investment of the platform, we can expect that Amazon does have an optimal device price in the long-run. I do not simulate longer time periods because my demand side model only considers a monopoly platform. When introducing platform competition, platforms like Amazon and Barnesandnoble are competing for those avid readers, and the supply side should have more strategic interactions which I plan to explore later.

31http://online.wsj.com/article/SB10001424052970203961204577267831767489216 .html 37

**Figure 14: Total prot as a function of device price and ebook price 8 Conclusion In this paper, I analyze the impact of ebooks on print book sales on Amazon.com. I estimate a dynamic model of consumer book purchase, format choice and e-Reader adoption decisions. I combine a unique individual level purchase history panel data set with publicly available book prices and characteristics. Consumers have persistent heterogeneous general reading tastes, format utility from ebook reading, and rational expectation over book purchase when adopting a e-Reader. **

My model estimates allow me to quantify the degree of cannibalization and market expansion eect. Taking supply side prices as exogenously given, counterfactual simulation shows that 2/3 of the ebook sales come from cannibalizing print books and 1/3 purely come from market expansion. The introduction of ebooks increases consumer surplus by $ 709.5 million in the U.S. in year 2011. Finally, I nd that models do not take dynamic device adoption decision into account substantially underestimate the price elasticity of books.

The implications for the publishers and platform companies are dierent. For the publishers, only revenue from ebook and paperback sales matters. Counterfactuals show that publishers have dierent optimal ebook pricing strategies under dierent e-Reader prices set by the platform. If publishers can set ebook prices in accordance with Kindle price set by Amazon, then the optimal price diers across dierent Kindle prices. In particular, if Kindle price is low, it is better for the publishers to set a high ebook price to recover from the cannibalization loss. If the Kindle price is high, publishers can safely set a relatively low ebook price and still benet from selling both ebooks and print books.

This 38

is because in that region, market expansion eect dominates cannibalization loss. For the platform company, on the other hand, it needs to trade o between revenue from device sales and book sales. Optimal device price changes as ebook price changes, which indicates that the two variables have strategic interactions. Publishers and platforms are competing for ebook pricing rights these years. In February 2012, Amazon.com removed more than 4,000 e-books from its site this week after it tried and failed to get them cheaper from I.P.G., one of the country's largest book distributors 32. On April 11, 2012, the United States Department of Justice has sued Apple and ve major book publishers, accusing them of colluding to raise e-book prices 33.

The contract between publishers and platforms have shifted from the original wholesale model initiated by Amazon in 2007 to the agency model proposed by Apple in 2010. Dierent contract schemes provide dierent pricing incentives for the two sides. Department of Justice forces publishers to sign agency contracts and gives pricing freedom to platforms for the next two years, in the hope that the average price of ebooks will drop back. Whether the market will respond as what it hopes crucially depends on the demand side characteristics that I try to explore in this paper.

This paper is the rst to use representative individual level observations of actual purchasing data and structurally estimate the degree of cannibalization and market expansion. I start from the micro foundation of utility maximizing at individual level and allow for consumer's heterogeneous reading taste, price elasticity on both device adoption and book purchase, extra format utility from e-reading. The data cover a broad range of consumers, book titles, and genres, comparing to the extant literature most of which only deal with one publisher in a particular policy setting. My model also contribute to the literature by taking into account device adoption decision.

**The estimation results show that models that do not take dynamic device adoption decision into account substantially underestimates the price elasticity of books. The demand side estimation helps build a solid foundation for supply side policy evaluations.**

There are also limitations. First, this is a demand side paper, and I take supply side prices as exogenously given. Publishers and platforms have dierent optimal pricing strategy with and without ebooks. In particular, print book prices, which I hold xed all through my counterfactuals, could have been dierent when publishers are aware of the impact from ebooks. A better counterfactual analysis would be to calculate a full equilibrium, not just a partial equilibrium. Incorporating the supply side story can be an interesting path for future research. Second, platforms face competition in practice.

Amazon and Barnesandnoble compete for device owners and book buyers. As I present in the estimation results, there is considerable heterogeneity in general reading taste among consumers, which is informative to the supply side as those avid readers are the core consumers in this market: they read more books and buy device earlier. Because of incompatibility of ebook format across e-Readers 32(http://bits.blogs.nytimes.com/2012/02/22/amazon-pulls-thousands-of-e-boo ks-in-dispute/) 33(http://mediadecoder.blogs.nytimes.com/2012/04/11/justice-les-suit-again st-apple-and-publishers-over-e-book- pricing/) 39

(for instance, Kindle ebooks cannot be read on Nook), consumers are locked-in to a particular e- Reader once she buys the device. The number of avid readers who haven't bought any device shrinks over time. This composition evolution will aect the optimal pricing strategy for Amazon.com and Barnesandnoble.com. My model allows for heterogeneity in general reading tastes and quanties the magnitude, providing a applicable foundation for further supply side analysis. Reference [1] Yu (Jerey) Hu and Michael D. Smith 2011. The Impact of Ebook Distribution on Print Sales: Analysis of a Natural Experiment.

mimeo.

[2] Matthew A. Gentzkow. 2007. Valuing New Goods in a Model with Complementarity: Online Newspapers. American Economic Review, 97(3), pp. 713 - 44. [3] Kannan, P. K., Barbara Kline Pope, Sanjay Jain. 2009. Pricing Digital Content Product Lines: A Model and Application for the National Academies Press, Marketing Science, Lead Article, Vol. 28, No. 4, July-August, (2009) pp. 620-636. [4] Oestreicher-Singer, A. Sundararajan. 2010. Are Digital Rights Valuable? Theory and Evidence from Ebook Pricing . CeDER Working Paper No. 06-01 Working Paper Series. [5] F. Oberholzer-Gee, K. Strumpf. 2007. The Eect of File Sharing on Record Sales: An Empirical Analysis.

Journal of Political Economy, Vol. 115, No. 1, pp.1-42.

**[6] Moorthy, K. S., I. P. L. Png. 1992. Market segmentation, cannibalization, and the timing of product introductions. Management Science. 38(3) 345359. [7] Venkatesh, R., R. Chatterjee. 2006. Bundling, unbundling and pricing of hybrid products: The case of magazine content. Journal of Interactive Marketing 20(2) 2140. [8] David Bounie, B. Eang, Marvin A. Sirbu, Patrick Waelbroeck. 2012. Superstars and Out- siders in Online Markets: An Empirical Analysis of Electronic Books. mimeo. Available at SSRN: http://ssrn.com/abstract=1967426 [9] Igal Hendel, Aviv Nevo. 2006. Measuring the Implications of Sales and Consumer Inventory Behavior. **

Econometrica, Vol. 74, No.6, pp. 1637-1673 [10] Robin Lee. 2012. Vertical Integration and Exclusivity in Platform and Two-Sided Markets. working paper.

[11] Gautam Gowrisankaran, Marc Rysman. Dynamics of Consumer Demand for New Durable Goods. November 2012. Journal of Political Economy Appendix A. Grouping criteria for heterogeneous consumers I group the consumers according to the criteria in Table 7 in model specication (ii), where both 40

Table 7: Heterogeneity Grouping Criteria income level numbook coecients percentage in consumers percentage in books type1 0-4 0-5 αHW,High , αSW,High , γLow 32.89 20.80 type2 0-4 >5 αHW,High , αSW,High i , γHigh 5.00 15.66 type3 5-7 0-5 αHW,Low , αSW,Low , γLow 53.25 32.36 type4 5-7 >5 αHW,Low , αSW,Low , γHigh 8.86 31.17 * numbook: number of books bought in the initialization period which is the rst six months in year 2011.

heterogeneity in general reading taste and price elasticity are allowed. B. Reason for dropping price elasticity heterogeneity In Figure 15, I plot the device adoption timing by dierent income levels (1-7). The timing does not have any positive or negative correlation with the income level, which means that no matter how I group the consumers by income level, the price elasticities across groups will not be statistically dierent from each other. This is reasonable to the extent that reading habit in general is not much correlated with income level.

C. How are the not purchase data constructed I do not observe people not buying books in my data set. They may visit Amazon.com, search for some book titles and end up not buying them. To gure out in general how frequently they search for books and construct the outside option data, I supplement the actual book purchase data with two other streams of data from comScore. For the same sample of consumers as in my main data set, I observe their searching and buying behaviors other than books on Amazon.com. (a) The search record data: every search that consumers conducted on Amazon are recorded, whether it ended up with a transaction or not.

I observe the time they visit Amazon.com and the duration of their stay. All through the year 2011, there are 3453 consumers with 14,156 search trips in total. What I need is the number of search trips where they actually search for books. However, I do not observe what they search for unless the search ended up with a transaction. (b) The shopping record data besides book purchase: all the purchase records are documented, whether it is a book purchase or other good purchase. I observe the time they buy a good, the category, the price and the quantity. All through the year 2011, total number of shopping trips is 11999 and total number of shopping trips with book purchased is 5940.

**I need the number of search trips where they actually search for books. So far, I know (1) the number of search trips in general, (2) the number of shopping trips in general, and (3) the number of 41**

Figure 15: Observed Device Adoption Month by Income Level 42

shopping trips with book purchased. If we are willing to assume that # search trips with books # search trips = # shop trips with books # shop trips then 3453x 14156 = 5940 11999 where the average number of months that a household searching for books is x = 2.0 Thus I assume that households search for books at least every six months. This is the baseline exogenous arrival rate I use. I conduct robustness check for this arrival rate and the results are stable. For each trip added, the number of books searched and the genre are the same as the reference monththe closest actual shopping trip.

This is consistent with the Amazon recommendation system on the product webpage. The book characteristics used are the average characteristics of the same genre in the same month.

D. Robustness check for the exogenous arrival rate of book purchases In my data set, I do not observe people not buying books. People may go online and search for books, but end up buying nothing for that period. I deal with this problem by assuming that there is an exogenous arrival rate of searching for books. Based on this rate, I get a set of search timing over the year for each consumer. I then model their purchase/format decisions given the search timing and the corresponding choice set. I keep all the observed purchase timing and the corresponding choice set as search timing, because in reality there is always a search before actual purchase.

For those searches that do not end up with observed purchasing - those are actually no purchase data I generate - I assume that the choice set is representative books of the same genre and same quantity as in the last observed purchase. The book characteristics of the representative book are taken as the average of all the books of the same genre in that month. This assumption is plausible because it is consistent with how the recommendation system on Amazon.com works. It displays similar books of the same genre based on your last purchase record.

Notice that although I abstract from the book inventory and searching rate by assuming exogeneity, I do model consumer's purchase/format choice which I observe and can t to the data. A good model t would be that in those periods when I observe people buying books, my model also predicts that they choose to purchase books. I conduct robustness check by varying the exogenous arrival rate of searching. In Table 8, I list the estimation results. People search for books every 5, 4, and 3 months on average in the three columns. The estimates are quite stable across dierent exogenous searching time intervals.

**I run Wald's test and the result shows that the coecients across columns are not 43**

Table 8: Parameter Estimates of Demand System variable every 5 months every 4 months every 3 months Estimate s.e. Estimate s.e. Estimate s.e. Device αHW,High -0.1564** 0.0114 -0.1448** 0.0084 -0.1866** 0.0032 αHW,Low -0.1598** 0.0128 -0.1544** 0.0113 -0.1872** 0.0268 σf ow utility coecient 2.2807** 0.1874 2.3982** 0.1636 2.8008** 0.0967 βx holiday_dummy 1.1277** 0.1447 1.1270** 0.2506 1.0659** 0.2981 Book β1ranking -0.0010** 4.9904e-6 -0.0010** 4.0469e-6 -0.0010** 1.0546e-6 β2 rating 0.0453** 0.0050 0.0466** 0.0002 0.0485** 0.0012 β3 comments 0.0029 0.0013 0.0029 0.0048 0.0029 0.0057 β4 since publication -0.0029** 3.4856e-5 -0.0029** 1.467e-5 -0.0029** 1.2471e-5 β5 holiday_dummy 1.1427** 0.0132 1.1169** 0.0157 0.9815** 0.0332 γHigh general reading taste 1.5427** 0.0418 1.3897** 0.0542 1.7281** 0.0414 γLow 0.0003** 2.8513e-5 0.0006** 9.2702e-5 0.0662** 6.9825e-6 θ e-format utility 3.5533** 0.1612 3.6176** 0.1489 4.2406** 0.0242 αSW,High -0.0036** 2.6764e-5 -0.0037** 1.1244e-5 -0.0029* 5.4323e-5 αSW,Low -0.0026 1.3294e-5 -0.0031** 1.9940e-5 -0.0035** 1.1185e-5 σ nested logit parameter 0.0001 0.0459 0.0000 0.0436 0.0035 0.0313 loglikelihood -10012.7 -10330.1 -9682.5 # book Obs.

14,325 14,230 13,160 # device Obs. 35,064 35,064 35,064 ** signicant at 5% level; * signicant at 10% level. statistically dierent. The actual number of purchase records is 9570. The numbers of search records I add based on 5-month, 4-month, and 3-month criteria do not dier substantially - 4755, 4660, and 3590 - a potential reason for the robustness of the results. I use the second column, where I assume consumers search for books at least every 6 months, in my analysis.

E. Assumption Validity Check: AR(1) process of the book characteristics I plot the tted error term of the AR(1) process ˆ υt. ln ft = (1 − ρ) µ + ρ ln ft−1 + υt The mean of the error term is zero and there is no serial correlation over time. The assumption is thus supported. F. Industry facts I am focusing on Kindle Amazon only because it is the dominant e-Reader in the year 2011. According to the survey conducted by Pew Research Center in January 2012, 62% of the e-reader owners have Kindle and 22% for Nook. The third biggest player, Pandigital, only accounts for 2% of the market. Also, I assume that ebook reading is done on Kindle and not on other devices, so that 44

Figure 16: AR(1) residual consumers have to buy a Kindle before buying any ebooks. This is based on the survey results that consumers do buy and read ebooks on their Kindle most of the time. Figure 17, 18, and 19 display some supporting evidence about this industry. 45

Figure 17: Kindle Dominates the e-Reader market Figure 18: Kindle is the dominant device across all kinds of screens: books read 46

Figure 19: Kindle is the dominant device across all kinds of screens: device users 47