Privacy and Children: What drives digital data protection for very young children?

Page created by Ana Jimenez
 
CONTINUE READING
Privacy and Children: What drives digital data protection for very young children?
Privacy and Children: What drives digital
        data protection for very young children?
               G. Cecere∗, F. Le Guel†, V. Lefrere ‡, C. Tucker§, P.L. Yin¶
                                             June 4, 2019

                                                 Abstract

            Mobile platforms have provided children and their parents widespread access to edu-
        cational and other helpful apps in the past decade. The easy entry into mobile apps also
        creates opportunities for developers to take advantage of the increasing time children
        spend playing on apps. We use an original dataset of apps commercialized in the USA
        and targeting children to explore the types and scope of the data collected via children’s
        use of online mobile applications. Rules related to the collection of sensitive data vary
        with the developer’s geographical location and size. We find that apps that opt in to
        Google’s “Designed for Families” program generally comply with US children’s privacy
        regulation unless the developer is located in a country with weak privacy regulation.
        However, big developers from countries with weak regulation collect less sensitive data,
        suggesting that they may be better able to bear the costs of privacy regulation.

       JEL CODE: D82, D83, M31, M37
   ∗
     Institut Mines Telecom, Business School Email: grazia.cecere@imt-bs.eu
   †
     University of Paris Sud. Email: fabrice.le-guel@u-psud.fr
   ‡
     Institut Mines Telecom, Business School-University of Paris Sud. Email: vincent.lefrere@u-psud.fr
   §
     Massachusetts Institute of Technology (MIT) - Management Science (MS). Email: cetucker@mit.edu
   ¶
     Greif Center for Entrepreneurial Studies, Marshall School of Business, University of Southern California.
Email: pailingy@marshall.usc.edu
     We are grateful to Qinhou Li, Ginger Zhe for helpful comments and suggestions. An earlier version of
this paper was presented at IIOC Indiana 2018, AFSE Paris 2018, the Peex Lab, ZEW ICT Conference 2018,
11th Conference on Digital Economics Paris 2019, Summer Institute Munich 2019. The underlying work
for this paper was supported by the DAPCODS/IOTics ANR 2016 project (ANR-16-CE25-0015). We are
grateful to Theo Marquis, Hugo Allouard and Enxhi Leka for their research assistance. All errors are the
sole responsibility of the authors.

                                                      1
1        Introduction

Many mobile applications are targeted at very young children, including toddlers (i.e. 2 to
3-year-olds and preschool-age children generally). While mobile applications offer valuable
learning opportunities, they are also able to collect large amounts of user data, since children
do not have a good grasp of safety and privacy. Both the USA and the EU have specific
regulations in place designed to protect children’s privacy. While they may protect children,
these regulations may constrain developers’ opportunities to monetize their apps. Since
consumer and developer access to the Android app market is international, the effect of
regulation is unclear. To encourage developers to comply with the USA Children’s Online
Privacy Protection Act (COPPA), Google Play (the largest app store worldwide) introduced
a self-regulation program called “Designed for Families” (DfF). This program helps parents
identify child-appropriate content.1 Given the widespread use by kids of mobile applications,
what sensitive data are developers collecting from children? What institutions influence
whether and how much sensitive data developers collect? To our knowledge, there is no
empirical study on the extent and influence of the digital data collected on child users.2
        The US COPPA and the EU General Data Protection Regulation (GDPR) are both in-
struments designed to protect children’s privacy in digital markets. The USA Federal Trade
Commission (FTC) has stated that in order to collect data from a child, the app developer
must have documented consent from the parents. Developers should also require parental
consent if they want to use ad targeting children on the basis of their behaviour (behavioural
    1
     This program was introduced in May 2015 by Google Play (See, https://developer.android.com/google-
play/guides/families for more information, last retrieved, May 12, 2019) In 2013, the iOS App Store intro-
duced a kids app category (Apple’s WWDC 2013 Keynote).
   2
     The recent Mobile Kids Report published by Nielsen (2017) shows that 59% of the children interviewed
use mobile devices to download apps , retrieved January 8, 2018.

                                                    2
ad). The FTC has brought law enforcement actions against several app developers located
both in the USA and other countries. The most recent action involved the Chinese app
“TikTok” and resulted in a fine of $5.7 million for failure to seek and obtain parental consent
to collect children’s sensitive data.3 This paper provides empirical evidence on how the US
COPPA laws have spillover effects on developers originating from countries with weak regu-
lation enforcement. We also explore whether app compliance with COPPA legislation varies
with the developer’s size and location. We collected weekly data from July to September
2017 on apps available in Google Play in the DfF category. We compared them to apps pro-
duced by developers which had chosen not to adopt this categorization but which targeted
children based on app descriptions that included keywords such as “preschool” and “tod-
dler”. Our dataset includes 9,799 apps corresponding to 4,442 different developers located in
89 countries, generating a panel of 92,746 observations. To measure the effects of developer
country’s regulation, we identify developer location based on the address provided by the
developer. The dynamic structure of our data combined with the large set of controls and
fixed effects included in our main models allow us to interpret the effect of USA regulation
on the features of apps commercialized in this market.
      We find that developers located in regions with no privacy regulation collect more sen-
sitive child data relative to developers based in the USA or European countries. However,
developers that opt in to the Google self-certification program are less likely to collect child
data, which suggests that the program at least is an informative label for parents and may
even promote privacy protection spillovers from US regulation to developers from other coun-
tries. We also find that big developers located in countries with weak legislation are less likely
  3
      , last retrieved, March 3, 2019.

                                                3
to collect sensitive data. If the cost of compliance with COPPA legislation differs between
big and small developers and the marginal costs of compliance and security control are de-
creasing, larger developers may be more able to bear the cost of compliance with COPPA.
The results are robust to a broader definition of sensitive data and the granularity of the
location data. However, developers from Hong Kong and countries with no privacy laws are
likely to collect sensitive user data aimed at accurately identifying unique users - e.g. IMEI
(International Mobile Equipment Identity) - and to take children’s photos and record voices,
although they are less likely to collect location data.
   We contribute to three literature streams: the economics of privacy regulation, the eco-
nomics of mobile applications, and more general work on children’s Internet usage. The
economic effects of privacy regulation literature highlight a trade-off between protecting in-
dividuals and developing further innovations. Goldfarb and Tucker (2012) focus on the effects
of privacy regulation on firm performance. Campbell et al. (2015) examine how regulation
influences competition and show that privacy regulation is likely to affect small and new firms
adversely. An important regulatory enforcment tool in the context of privacy legislation is
the industry self regulation which can affect the competitive structure (Acquisti et al., 2016;
Brill, 2011). Johnson et al. (2017) evaluate the disadvantages of self-regulation initiatives in
the ad industry. Jia et al. (2018) show that regulation increases firms’ costs and reduces their
performance. Miller and Tucker (2009) examine the welfare outcomes of privacy regulation.
To our knowledge, the present study is the first to document the effects of privacy regulation
in the context of protecting children’s privacy. It builds on Rochelandet and Tai (2016),
who find that privacy regulation and location are related. We show that in the global app
economy, developers are influenced by the existence or lack of regulation, and that there can

                                               4
be international spillovers from privacy regulation on behavior.
   Our findings have direct relevance for the second literature stream on the economics of
mobile applications. Bresnahan, Davis, and Yin (2015) and Li, Bresnahan, and Yin (2017)
characterize the monetization challenges facing app developers. Yin et al. (2014) finds that
successful developer strategies differ by app category (game vs. non-game). Ghose and Han
(2014) use a structural model to assess the factors influencing consumers’ demand for apps.
Their results suggest that demand for children’s apps is higher than demand for adults’ apps.
They show also that children’s apps have lower marginal production costs compared to apps
designed for other age categories. These papers suggest that developers have strong incentives
to find fast ways to monetize apps outside of the app stores (to avoid the 30% revenue
share with Google Play), so we may find conflicts between privacy and commercialization.
Furthermore, the extent of this behavior may differ across apps, so children’s apps should
be examined as a distinct category. We also build on the body of work which demonstrates
that the role played by platform design on the strategies of app developers. Ershov (2017)
investigates how the design of the Google Play platform changed the entry dynamics, and
shows that splitting the game category into different subcategories reduces search costs and
lowers the quality of new entrants. Kummer and Schulte (2018) provide evidence of a trade-off
between app demand and supply: the amount of personal information collected to monetize a
given app reduces app success as measured by the number of downloads. Kesler et al. (2017)
show that apps that target the 13+ and 16+ age categories are more intrusive compared to
apps targeting the “everyone” category (which includes children and adults). While there is
empirical evidence showing the importance of the game category in the smartphone market,
there is almost no published economics or management research on the characteristics of apps

                                              5
aimed at children, or how platform policies can support regulation and influence individual
behavior. The exception (in computer science) is the recent paper by Reyes et al. (2018)
which analyzes popular free mobile apps. The authors show that the majority of apps do
not comply with USA child privacy regulation. Our paper extends this analysis by using a
larger, pooled sample, and taking account of both developers’ location and the impact of the
DfF program on compliance with COPPA.
      Finally, our research contributes to two broader streams of research on children’s use of
the Internet. Internet access has mixed effects on education outcomes (Bulman and Fairlie,
2016; Belo et al., 2013). There is empirical evidence showing that Internet use in schools
affects the level of household Internet penetration (Belo et al., 2016). We contribute to this
work by highlighting the participation of children in the mobile app economy.
      In the context of the existing literature, our study makes an important contribution
related to estimating the effect of USA child privacy regulation on national and foreign de-
velopers that want to sell their applications in the USA market. First, our statistics based
on the scope and depth of the data collected on children are an improvement on those used
in several existing examinations of policy. To our knowledge the FTC (FTC, 2012a,b) policy
reports provide some initial summary statistics on data collection by apps and focus on the
extent to which these apps disclose their data collection activity via privacy policies. A study
conducted by the Global Privacy Enforcement Network (GPEN, 2015) analyzes the privacy
practices of 1,494 world websites targeting children.4 We show that in the mobile applica-
tions economy (which increasingly replacing desktop access to websites), collection of data
on very young children may be even more pervasive. Indeed, compare to websites accessed
  4
     GPEN includes 29 Data Protection Authorities worldwide - “2015 GPEN Sweep - Children’s Privacy”:
, last retrieved, January 8, 2018.

                                                 6
its collection is automated without distinction between user because they implicitly accept
this collection when they download the app , such that data can be collected even on very
young children. As well as providing some preliminary and very comprehensive information
on automated data collection practices related to very young children, our empirical analysis
provides evidence that should be informative for future policy. Second, we identify spillover
effects from platform compliance efforts related to USA policy regulation on the behavior of
foreign developers through the platform’s self-regulation program. Our findings suggest that
the effect is extremely heterogeneous across countries, country regulation and developer firm
sizes. Third, our analysis suggests that in the global app economy, although some developers
are subject to regulation, collection of child data is pervasive in non-regulated countries.
Many international developers appear not to comply with any child privacy regulation.
The paper is structured as follows. Section 2 describes the data sources and presents the
descriptive statistics. Section 3 presents the econometric model. Section 4 discusses the
econometric results based on different country specifications. Section 5 provides some ro-
bustness checks related to the probability to collect sensitive data and provide ad targeting
via ad third parties. Section 6 concludes.

2    Description of the sample

We collected weekly data on smartphone applications for children from the USA Google
Play. Thus, we are able to study apps commercialized in the US, but which may have
been developed elsewhere. Developers who produce children apps can decide to opt in to

                                             7
DfF, or they can commercialize their apps in the Google Play without belonging to the DfF
program. Our data collection strategy allows us to collect both groups of children apps.
First, we collected the characteristics of apps in the DfF program aimed at children aged
under 13.5 The apps indicate appropriate age categories: children aged 5 & under, children
aged 6-8 years, children aged 9 years & over, or mixed audience. Within the DfF program, we
collected 4,679 different apps. Developers who opt in to this program self-declare compliance
with COPPA, along with other requirements specified by Google. Apps submitted to DfF
should comply to all Google requirements which are directly in line with COPPA legislation.
      Second, we constructed a benchmark group of applications aimed at children by simulat-
ing the user’s (parent’s) likely keyword search process in the Google Play. Using the Google
Adwords keyword planner tool, we identified the list of keywords most frequently associated
to children applications: children, children’s, kids, baby, babies, toddlers, educational, tod-
dler, preschool, preschoolers, child monitoring, kindergarten, kindergartners, boys, girls, kid
monitoring, 2 year old, 3 year old, 4 year old, 5 year old, 6 year old, 7 year old, 8 year old, 9
year old, 10 year old, 11 year old, 12 year old. Apps identified at least once by keyword search
in Google Play during the study period generated our list of apps targeted towards children
in the Google Play but which do not belong to the DfF program. The benchmark group of
apps includes 4,031 apps. There are 1,089 apps which has been collected both within the
DfF program and via keyword search.
      Our sample consists of apps included in the Google Play DfF program and our benchmark
apps. We tracked each application over a period of 12 weeks, starting from its first appearance
to the end of the sample period. New apps appear over time while others become unavailable;
  5
   The DfF program also includes six broad categories: Action & Adventure, Brain Games, Creativity,
Education, Music and Video, and Pretend Play.

                                                8
the number of apps available in the DfF program or identified by the keyword searches
increased from 5,137 to 9,799. We excluded apps found only in the last period.6 Our sample
includes 92,746 observations7 ; 79.65% of the applications included a clear developer address.
Developers originated from 89 different countries.
       Table 1 presents the statistics of the overall sample. Table 2 presents the breakdown
statistics of the app and developer statistics. Column (1) presents the statistics of the apps
that do not collect sensitive data, and shows that on this sub-sample of apps 69% are inside
DfF. Whereas column (2) shows the statistics of the apps that collect sensitive data, and only
31% of them are inside DfF. Column (4) presents the statistics of the apps that do not opt
in into DfF. Column (5) presents the statistics of the apps that opt in into DfF. USA, China
and Hong Kong are not included into the institutional and geographical variables because
they are the largest producer countries after the European Union. China is one of the largest
markets; it accounts for nearly 50% of total downloads across iOS and other Android Stores
in 2018 (App Annie, 2019).
       Our empirical strategy allows us to measure whether the platform policy related to chil-
dren’s content provides effective protection for their personal data compared to the bench-
mark group. We collected over time all publicly available data such as app characteristics
(e.g. user ratings, freemium, free, paid), developer’s name and address, type of interactive
elements utilized by the app, and number and type of permissions required by developers.
       We are interested in 1) measuring the effectiveness of the platform policy in protecting
children, and 2) testing whether national privacy regimes correspond to developers’ collection
   6
    We exclude 481 applications that appear only once.
   7
    We delete from our sample the applications which only appeared the last week of the data collection. An
observation is at the week and app level.

                                                    9
of less sensitive data.

                Table 1: Summary statistics for the full sample of apps

                                       Mean                     SD       Min.    Max.      N
     Sensitive Data                    0.358                  (0.479)      0        1    92746
     DfF                               0.722                   0.448       0        1    92746
     USA                               0.225                  (0.418)      0        1    92746
     Europe                            0.266                  (0.442)      0       1     92746
     Recognized by the EU              0.060                  (0.238)      0       1     92746
     Independent authority             0.040                  (0.197)      0        1    92746
     With legislation                  0.125                  (0.331)      0       1     92746
     No privacy law                    0.027                  (0.161)      0       1     92746
     China                             0.021                  (0.142)      0       1     92746
     Hong Kong                         0.033                  (0.178)      0       1     92746
     Without developer address         0.203                  (0.402)      0        1    92746
     Child specialization               0.44                   0.335       0        1    92746
     Log number of reviews             5.318                  (3.480)      0       18    92746
     User rating                       3.820                  (1.228)      0        5    92746
     App age (Month)                 20200.854               (672.065)   18263   21075   89375
     Users interact                    0.048                  (0.213)      0       1     92746
     Contains ad                       0.578                  (0.494)      0        1    92746
     Freemium                          0.318                  (0.466)      0        1    92746
     Price                             0.799                  (2.317)      0      100    92746
     Top ranking                       0.361                  (0.348)      0       1     92746
     High Download                      0.58                   0.494       0        1    92746
     Number of developers by country   15.55                  29.417       1      156    92746
     Developer: Downloads             13.033                  (5.213)      0       23    92746
     Developer: User ratings           3.560                  (1.378)      0       5     92746
     Developer: No User ratings        0.060                  (0.238)      0        1    92746
     Developer: Missing information    0.088                  (0.283)      0        1    92746
    Notes: Descriptive statistics of the full sample.

                                                        10
Table 2: Summary statistics

                                             Do not collect     Collect     Not In DfF      In DfF
            DfF                                  0.69            0.31
            Senstitive Data                                                    0.38          0.62
            High download                         0.60           0.40          0.35          0.65
            Users interact                        0.39           0.61          0.55          0.45
            Contains ad                           0.61           0.39          0.32          0.68
            Presence                              0.64           0.36          0.30          0.70
            Freemium                              0.62           0.38          0.25          0.75
            Child Specialization                  0.46           0.41          0.35          0.47
            China                                 0.33           0.67          0.12          0.88
            Hong Kong                             0.29           0.71          0.11          0.89
            Without developer address             0.63           0.37          0.53          0.47
            Europe                                0.71           0.29          0.19          0.81
            Recognized by the EU                  0.61           0.39          0.20          0.80
            Independent authority                 0.65           0.35          0.22          0.78
            With legislation                      0.65           0.35          0.37          0.63
            No privacy law                        0.56           0.44          0.29          0.71
            United States                         0.67           0.33          0.18          0.82
            Log number of reviews                 4.90           6.06          6.47          4.87
            User rating                           3.81           3.85          3.87          3.80
            Top ranking                           0.36           0.37          0.33          0.37
            Price                                 0.87           0.66          0.22          1.02
            App age                             20216.81       20174.31      19963.72      20292.16

2.1       Dependent variables: Sensitive data and IMEI

COPPA regulation defines the list of child-sensitive data covered by the law. It includes
geolocation details (sufficiently precise to identify street name and city) photos, videos, and
audio files that contain children’s images or voices), and usernames and persistent identifiers
to recognize an app user over time and across different applications.8 To measure whether
children’s apps possibly violate the USA children privacy legislation, we identify the Google
Play permissions that allow apps to collect these child-sensitive data. We created the binary
  8
      The complete list of children’s personal data is available at , last retrieved, November 6, 2018.

                                                       11
variable Sensitive Data which indicates whether the app collects any sensitive data covered
by COPPA regulation. More precisely, the law requires verifiable parental consent for the
collection, use, and disclosure of personal information on children aged under 13. This
information is not available to the researchers: only developers and users who actually use
the app have access to it. Thus, we are only able to measure whether any personal data
covered by law is required by each app.
      We identify seven permissions and one interactive element that require personal data
covered by the COPPA regulation. The permission Read Phone Status and Identity allows
developers to identify a smartphone’s unique IMEI which is considered a persistent unique
identifier by COPPA and the GDPR (Reyes et al., 2018). The IMEI can be used to recognize
a user over time and across different online services,9 and it could be used to log all kinds
of personal data and target the consumer. IMEI number also permits developers to know
which advertising as already being seen by a users. A child’s image and voice can be captured
via the permissions Take Pictures and Videos and Record Audio. There are different sets of
permissions that allow location data collection: ALEC (Access Location Extra Commands)
is used to determine user’s locations based on various device capabilities, and ANBL (Ap-
proximate Network Based Location) is used to access approximate location deriving from
network location sources such as cell towers and Wi-Fi. MLST (Mock Location Sources for
Testing) is used to facilitate developer’s testing of geolocation data applications as Precise
GPS Location and the interactive element Share Location. Appendix Table 17 shows the
permissions and interactive elements required to construct the dependent variable Sensitive
Data. Column (1) presents the statistics for the entire sample, and Columns (2) and (3)
  9
      Complying with COPPA: Frequently Asked Questions. Available at , last retreived, December 12, 2018.

                                                    12
present the respective app statistics for the USA and EU developers, Column (4) presents
the statistics for China, and Column (5) presents the statistics for Hong Kong.

2.2       App characteristics

Google Play provides a large set of information for each app: the number of reviews, the
users’ rating, the price, whether the app contains ads, etc. (Table 2). To measure app
success, we include the variable Log Number of Reviews. To indicate whether the app has a
high number of downloads, we create a dummy variable High Download that takes value 1
if the app has more than 10,000 downloads. Table 20 shows the breakdown statistics of the
all download categories from 0 download to more than 1 million.
       The variable Top ranking measures the highest search rank of each app in the Google
Play. To measure app popularity, we use the ranking indicated by the user (variable User
Rating) on app’s quality, on a 0 to 5 scale, where 0 indicates that the app has no rating. We
also include a dummy variable No app rating for the app without any stars. To measure the
age of the applications, we include the variable App age which measures the age of the app
in months since its launch on Google Play.
       The variable Price indicates the price of the apps going from zero for free apps to 99.99$,
the maximum price.10 While many apps are free, paid apps represent 26.8% of applications.
Finally, the binary variable Freemium indicates whether the application offers IAP (in-app
purchases). This applies to 31.8% of the applications in the sample. In order to measure if
the apps provide advertising, we include the binary variable Contains Ad which takes the
value 1 if the app displays advertisements to users. Overall, 57.8% of apps include ads.
  10
     In our sample, paid apps are over-represented compared to the overall Playstore. Indeed, we collected
inside the DfF category Top paid and Top New Paid categories.

                                                   13
The binary variable Users Interact measures if the app exchanges sensitive data collection.
This feature allows the app to be exposed to unfiltered/uncensored user-generated content
including user-to-user communications and media sharing via social media and networks.11
A dummy variable, Presence in the Search, indicates whether each week the app is present
either in the DfF program or in the benchmark search.

2.3       Developers’ geographical location

To explore regulation spillovers to other countries, we retrieved geographical information
disclosed by developers of apps available in the Google Play. In fact, each app developer’s
address has to indicate its country. Although the platform mandates developer’s address
(since September 30, 2014),12 for those offering paid apps, in-app purchases, and payment
through the app, there are several apps without a geographical address. To retrieve develop-
ers’ countries, we use different strategies. First, we use Maps APIs to collect the latitudes and
longitudes of the given address to identify the country.13 Second, we used a Python library
(Libpostal) to search for a country name in the developer’s address. Third, we checked the
match between the location identified using the Google Maps APIs and the country name
identified via Libpostal. Fourth, among the subset of apps without any developer’s address,
we identify the their location using the email extension.14 Finally, we did a manual check
for certain addresses. To indicate the apps without any developer location information, we
created the variable Without developer address, 20.3 % fall into this category. Regarding apps
  11
     , last retrieved, January 8, 2018.
  12
     Among the developers not declaring an address, 16.7% of those in our dataset were registered on the
platform before September 2014.
  13
     We retrieve the geographical address using four different geographical API: Google Map, Bing Map,
Open street Map, Geo Map.
  14
     Using this procedure, we identify the origin countries of 100 apps.

                                                  14
without a developer address, it is important to emphasize that COPPA legislation requires
that parents be informed about the companies that collect children data, and in particular,
that companies indicate their contact details such as email or geographical location.

2.3.1      National privacy regulation

Privacy regulation rules vary across countries, and we exploit this variation to characterize
countries’ privacy policies. To assess differences in national regulatory frameworks, we aug-
ment these data with a vector of the Institutional framework measures associated with the
developer’s address. Since developers in the US develop apps for their domestic market, it
is reasonable to believe that their behavior may differ from others. We created the dummy
variable USA to indicate whether the developer is located in the US. US developers produced
22.5 % of the apps in our sample.
      We use a measure of privacy regulation to indicate the country’s level of compliance with
EU privacy legislation. Appendix Table 11 presents countries categorized according to their
level of compliance with EU privacy legislation. This index was computed by the French
Privacy Regulation Authority (CNIL).15 The dummy variable Europe (Table 2) identifies
developer countries belonging to the EU and indicates if the country’s privacy laws are
compatible with EU legislation who represent 26.6% (21,44 % of these apps are produced
by developers in the United Kingdom). The variable Recognized by the EU indicates that
the privacy laws in a country outside the EU are compatible with the EU privacy laws
corresponds to 6% of the sample. The binary variable Independent authority indicates the
existence of an independent privacy regulation authority in the app developer’s country and
 15
      , last retrieved, January 8, 2018.

                                               15
the presence of a privacy legislation framework which corresponds to 4%. The binary variable
With Legislation indicates that the country has privacy legislation only (but no independent
authority regulating privacy), which counts for 12.5% of apps, and the dummy variable No
Privacy Law indicates the absence of privacy laws in the developer’s country (2.6% of apps).
       We also created a dummy variable China which takes the value 1 if the developer is
located in China. A dummy variable Hong Kong takes the value 1 if the developer is located
in Hong Kong. After European countries such as Germany, United Kingdom, and France, the
largest producers of apps commercialized in USA are located in Hong Kong and China with
respectively 3.3% and 2.1% of the whole sample. Apps in the Google Play are automatically
released worldwide with automated translation of the app’s description unless the developer
specifies otherwise.
       The developer’s strategy might also be associated to the home institutional framework.
First, we consider whether OECD country (excluding USA and Europe) developers demon-
strate behavior that is different from that displayed by developers located in non-OECD
countries (excluding China and Hong Kong) that have weaker institutions and regulation.

2.3.2      Graphical evidence: Collection of sensitive data by developer location

Figure 1 depicts the percentage of apps per country group that collect children’s personal
data.16 The graphical evidence shows that overall, developers located in China and Hong
Kong collect more data compared to developers in the USA and other country groups. The
bar graph (1) in Figure 1 shows the distribution of Sensitive Data items for OECD and
non-OECD countries, with USA, China and Hong Kong separated from the group of coun-
  16
    For example, considering Figure 1, bar graph 1, among developers with no address about 37% collect
sensitive data.

                                                 16
Figure 1: Collection of sensitive data by developer location

                                                              (1)                                                                               (2)
                    .8

                                                                                                     .8
                                                                                                     .6
                    .6

                                                                                           Average Sensitive data
         Average Sensitive data

                                                                                                   .4
                 .4

                                                                                                     .2
                    .2

                                                                                                     0

                                                                                                                    Without developer address         Member of the UE

                                                                                                                    Recognized by the EU              Independent authority \& law
                    0

                                          Without developer address   OECD                                          With legislation                  No privacy law

                                          No OECD                     United States                                 USA                               China

                                          China                       Hong Kong                                     Hong Kong

                                  Notes: The vertical axis is the percentage of apps collecting sensitive data.

tries. Developers in Hong Kong and China collect more sensitive data compared to all other
locations, followed by developers who do not list an address.
   The bar graph (2) in Figure 1 shows the distribution of the number of Sensitive Data
items collected according to the EU privacy regulation regime, again with USA, China and
Hong Kong separated out. Developers in Hong Kong and China collect more sensitive data
compared to all other locations, followed by developers in countries with no privacy laws.

                                                                                      17
2.4    Developer characteristics

Besides geographical data, we collected several data to measure developer characteristics
(Table 2). In particular, it is interesting to differentiate experienced developers. This data
includes the overall ratings associated to developers (Developer: User ratings), the overall
number of downloads (Developer: Downloads), and the date of entry in the Google Play (De-
veloper: Entry). We also collect the total number of apps produced by the developers and we
create the variable Number of app by developers. In order to measure developer specializa-
tion in children’s apps, we compute an index Developer: Child specialization which measures
the ratio between the number of apps that each developer produces in the children market
divided by the overall number of apps they have produced. We avoid dropping observations
in the regressions by setting the missing values equal to 0 and defining corresponding dummy
variables Developer missing information equal to 1. The main results are very similar even
if we exclude these observations.

3     Model specification

Our econometric analysis estimates the effect of the regulation in the developer’s country of
origin on the amount of child sensitive data collected. Our dependent variable is binary, and
we use probit estimation with robust standard errors clustered on the app level. We model
the probability of collecting sensitive data using the following specification:

              P(Sensitive Datait ) = α0 + Xit β + Zijt γ + Dijt θ + ρt + ijt              (1)

                                               18
Our primary vectors of interest are country privacy regulation, X. We include the level of
privacy protection using the distinction between OECD and non-OECD countries. We also
identify the stringency of regulation according to the European legislation by using a set of
dummy variables which capture the country’s level of compliance compared to EU regulation,
one of the most strict regulation frameworks. We also include dummies for developers from
China, Hong Kong and USA. Z is the vector of app characteristics i at time t developed in
country j. D is the vector of developers characteristics i at time t developed in country j. The
regressions are weighted by the number of developers in each country.  is an independent
and identically distributed random error term. The equation also includes time (week) effects
ρt . Since, we have very few time varying variables, we estimate the model using pooled cross-
section data.

4     Estimation of the probability to collect sensitive data

4.1    Baseline model: Effect of DfF and developer home country

       regulation

In order to estimate the impact of COPPA regulation on foreign developers, we begin with a
pooled cross-section framework. COPPA precisely defines the sensitive data covered by the
legislation and requires that each company or the third parties that collect user data provide
information such as name and address to allow parents to contact them. We investigate
the impact of privacy regulation and macro-economic characteristics on the probability of
collecting sensitive data requested by developers.Table 3 presents the coefficient of the Probit
estimation. All regressions include app characteristics to control for observable differences in

                                              19
the apps underlying the probability of collecting sensitive data. By including the full set of
app characteristics in these models, we can abstract away from app heterogeneity. We also
include developer characteristics in order to measure developer’s experience and size. All the
specifications include time fixed effects. The omitted category for the institutional variable is
Europe. Thus, the OECD dummy variable does not include Europe. To investigate how DfF
moderates the effect of developer country to measure USA legislation spillovers, Table 3 of
Column (2) adds a set of interaction terms between developer country and DfF. Developers,
both in China and Hong Kong, are more likely to collect sensitive data once they opt in to
the DfF program. Table 3 Column (3) includes measures of institutional privacy regulation.
Column (4) adds the interaction terms between the regulation measure and DfF.
   Overall, there is a negative association between the decision to opt in to the DfF program
and sensitive data collection, which is statistically significant in all specifications. This finding
is aligned with the intention of the platform to encourage compliance with COPPA legislation.
Apps with a large number of downloads are more likely to collect sensitive data when they
comply with the DfF program suggesting that the self regulation program is overall effective
for popular apps. Developers that decide to comply with the DfF program show lower
probability of collecting users’ sensitive data. Column (2) and (4) show that the interaction
term High Download × DfF is negative and statistically significant suggesting that apps with
a high number of downloads are likely to collect less sensitive data if they opt in to the DfF.
   Columns (2) and (4) show that developers located in China, Hong Kong, and Without
privacy legislation have a higher probability of collecting users’ data relative to developers
in Europe. Table 4 reports the marginal effects at the mean for the main specifications.
Column (4) shows that developers who decide to opt in to the DfF program and originate

                                                 20
from China, Hong Kong, or No privacy law are likely to collect sensitive data by respectively
19.3%, 60.6% and 20.6%. The result related to the Chinese apps is in line with the finding
of Wang et al. (2018) who show that apps commercialized in the Chinese market are more
intrusive compared to other apps presented in the Google Play.

                                             21
Table 3: Probit estimation: Sensitive Data
                                         (1)           (2)         (3)          (4)
 high download                       -0.194***       -0.030    -0.194***      -0.031
                                       (0.066)      (0.089)     (0.066)      (0.089)
 OECD                                0.276***        0.137
                                       (0.058)      (0.116)
 No OECD                             0.358***      0.258**
                                       (0.062)      (0.109)
 USA                                  0.190***    0.335***     0.189***     0.335***
                                       (0.048)      (0.102)     (0.048)       (0.102)
 China                               1.123***      0.623**     1.124***      0.623**
                                       (0.106)      (0.266)     (0.106)       (0.266)
 Hong Kong                            1.139***      -0.420*    1.139***       -0.420*
                                       (0.092)      (0.243)     (0.092)       (0.243)
 Without developer address           0.208***         0.100    0.208***        0.101
                                       (0.057)      (0.092)     (0.057)       (0.093)
 DfF                                 -0.462***    -0.386***    -0.461***    -0.385***
                                       (0.047)      (0.109)     (0.047)       (0.109)
 DfF × High Download                              -0.267***                 -0.267***
                                                    (0.086)                   (0.086)
 DfF × USA                                          -0.198*                   -0.198*
                                                    (0.115)                   (0.115)
 DfF × China                                       0.596**                   0.597**
                                                    (0.289)                   (0.289)
 DfF× Hong Kong                                    1.830***                  1.830***
                                                    (0.260)                   (0.260)
 DfF × OECD                                          0.178
                                                    (0.134)
 DfF× No OECD                                        0.129
                                                    (0.130)
 DfF× Without developer address                      0.176                      0.176
                                                    (0.107)                   (0.107)
 Recognized by the EU                                           0.250***       0.105
                                                                 (0.073)      (0.159)
 Independent authority & law                                    0.239***       -0.027
                                                                 (0.084)      (0.174)
 With legislation                                               0.347***      0.271**
                                                                 (0.062)      (0.108)
 No privacy law                                                 0.635***       0.201
                                                                 (0.107)      (0.187)
 DfF × Recognized by the EU                                                     0.180
                                                                              (0.179)
 DfF × Independent authority & law                                             0.348*
                                                                              (0.197)
 DfF× With legislation                                                          0.088
                                                                              (0.131)
 DfF × No privacy law                                                        0.591***
                                                                              (0.220)
 Constant                               -0.024       0.044        -0.024        0.043
                                       (0.192)      (0.213)      (0.192)      (0.213)
 Week fixed effect                       Yes          Yes          Yes           Yes
 App characteristics                     Yes          Yes          Yes           Yes
 Developer characteristics               Yes          Yes          Yes           Yes
 Observations                           92746        92746        92746        92746
 Log-likelihood                      -3.154e+07   -3.137e+07   -3.154e+07   -3.137e+07

Notes: Probit estimation. Sensitive Data is the dependent variable. Robust stan-
dard errors clustered at app level reported in parentheses. The omitted category
in all specifications is Europe. Weighted by the number of developers by country.
Significance levels: ∗p < .10, ∗ ∗ p < .05, 22
                                            ∗ ∗ ∗p < .01
Table 4: Probit Marginal effect: Sensitive Data main equation
                                               (1)          (2)        (3)          (4)
 High download                               -0.041       0.030      -0.041       0.030
                                            (0.025)      (0.034)    (0.025)      (0.034)
 DfF                                      -0.236***    -0.184***   -0.236***    -0.181***
                                            (0.020)      (0.054)    (0.020)      (0.054)
 OECD                                     0.092***        0.012
                                            (0.023)      (0.050)
 No OECD                                  0.102***        0.058
                                            (0.025)      (0.048)
 USA                                      0.084***      0.098**     0.084***     0.099**
                                            (0.018)      (0.039)     (0.018)      (0.039)
 China                                    0.384***       0.199**    0.384***     0.201**
                                            (0.034)      (0.097)     (0.034)      (0.097)
 Hong Kong                                 0.401***     -0.148**    0.401***     -0.146**
                                            (0.030)      (0.058)     (0.030)      (0.058)
 Without developer address                 0.050**        0.001      0.050**       0.003
                                            (0.023)      (0.040)     (0.023)      (0.040)
 high downloadxdff rc                                  -0.085***                -0.085***
                                                         (0.033)                  (0.033)
 Without developer address opt in DfF                    0.104**                 0.101**
                                                         (0.045)                  (0.044)
 OECD x DfF                                              0.098*
                                                         (0.055)
 No OECD x DfF                                            0.059
                                                         (0.052)
 United States x DfF                                      -0.023                  -0.025
                                                         (0.047)                 (0.046)
 China x DfF                                            0.196**                  0.194**
                                                         (0.094)                 (0.094)
 Hong Kong x DfF                                        0.627***                0.625***
                                                         (0.087)                 (0.087)
 Recognized by the EU                                               0.096***      0.018
                                                                     (0.027)     (0.058)
 Independent authority & law                                         0.073**      -0.042
                                                                     (0.033)     (0.060)
 With legislation                                                   0.096***      0.061
                                                                     (0.025)     (0.047)
 No privacy law                                                     0.203***    0.202***
                                                                     (0.038)     (0.038)
 Recognized by the EU x DfF                                                       0.093
                                                                                 (0.063)
 Independent authority x DfF                                                     0.148**
                                                                                 (0.072)
 With legislation x DfF                                                           0.046
                                                                                 (0.052)
 characteristics                             Yes         Yes          Yes          Yes
 Developer characteristics                  23
                                             Yes         Yes          Yes          Yes
 Week fixed effect                           Yes         Yes          Yes          Yes
 Observations                               92746       92746        92746        92746
Notes: Probit Marginal effects. Sensitive Data is the dependent variable. Robust standard
errors clustered at app level reported in parentheses. The omitted category in all specifi-
cations is Europe. Weighted by the number of developers by country. Significance levels:
∗p < .10, ∗ ∗ p < .05, ∗ ∗ ∗p < .01
4.2      Probability to collect Read phone status and Identity

We show the robustness of our results to an alternative dependent variable: IMEI. Table 5
shows the results of the estimation of the econometric specification presented in Equation
(1). The permission “Read phone status and identify” allows the developer to collect the
IMEI number of user’s smartphone. If there were unobserved heterogeneity issues related to
collection of all sensitive data, we would expect to see different results for these estimations.
   Column (1) includes the vector of the variables measuring the OECD institutional frame-
work. Column (2) adds the interaction terms between OECD institutional variables and
DfF. The interaction term DfF× USA is negative and statistically significant suggesting
that within the DfF program, developers in the USA are less likely to collect personal data.
Column (3) includes a set of dummies measuring compliance with EU legislation. Develop-
ers in countries recognized by EU or countries whose privacy laws are compatible with EU
legislation are more likely to collect IMEI data compared to Europe.
   Across all specifications, the decision to collect IMEI sensitive data appears to decrease
with participation in DfF. Overall, developers in the USA are less likely to collect IMEI
information. Importantly, developers in China and Hong Kong are more likely to collect this
information. This corroborates the intuition of the previous statistical evidence presented in
Table 17. There are different alternative explanations. First, there are different examples of
applications targeting adults such as Meitu,17 WeChat, Taobao18 produced in China that have
raised privacy concerns in USA and in Europe as they collect users’ IMEI information. This
might suggest that this permission is likely to be required by Chinese developers. Second,
while USA and Europe regulation consider the IMEI data as personal information, the Hong
 17
 18
    ff
   ff

                                               24
Kong regulation “Personal Data Privacy Ordinance” does not consider persistent identifiers
as personal data (Hargreaves and Tsui, 2017).
      Third, the actual plan of the Chinese government aiming to enlarge the surveillance might
encourage app developers to collect unique identifier like the IMEI number. In the same way,
IMEI collection also meets security considerations.19 Fourth, to our knowledge there are
very few recent allegations of the FTC against foreign developers for violating COPPA law;
it might not be considered as a credible threat for foreign developers.

 19
      , last retrieved, December 9, 2018.

                                               25
Table 5: Probit estimation: IMEI sensitive data
                                         (1)           (2)         (3)          (4)
 high download                        -0.158**       -0.085     -0.159**      -0.085
                                       (0.069)      (0.094)      (0.069)     (0.094)
 OECD                                0.243***        0.095
                                       (0.061)      (0.120)
 No OECD                             0.378***     0.430***
                                       (0.064)      (0.112)
 USA                                  0.119**     0.301***      0.118**      0.301***
                                       (0.051)      (0.105)     (0.051)       (0.105)
 China                               1.300***       0.561**    1.301***       0.560**
                                       (0.105)      (0.262)     (0.105)       (0.262)
 Hong Kong                            1.141***        0.017    1.142***        0.018
                                       (0.089)      (0.240)     (0.089)       (0.240)
 Without developer address           0.220***         0.058    0.221***        0.060
                                       (0.061)      (0.098)     (0.061)       (0.098)
 DfF                                 -0.272***    -0.313***    -0.271***    -0.312***
                                       (0.050)      (0.117)     (0.050)       (0.117)
 DfF × high download                                 -0.113                    -0.114
                                                    (0.092)                   (0.092)
 DfF × USA                                         -0.261**                  -0.260**
                                                    (0.120)                   (0.120)
 DfF × China                                      0.866***                   0.868***
                                                    (0.286)                   (0.286)
 DfF × Hong Kong                                   1.318***                  1.318***
                                                    (0.257)                   (0.257)
 DfF × OECD                                           0.194
                                                    (0.139)
 DfF × No OECD                                       -0.128
                                                    (0.136)
 DfF × Without developer address                    0.272**                   0.272**
                                                    (0.113)                   (0.113)
 Recognized by the EU                                           0.195**        -0.176
                                                                 (0.078)      (0.168)
 Independent authority & law                                    0.187**         0.034
                                                                 (0.089)      (0.178)
 With legislation                                               0.364***     0.444***
                                                                 (0.065)      (0.111)
 No privacy law                                                 0.743***      0.433**
                                                                 (0.107)      (0.190)
 DfF × Recognized by the EU                                                   0.479**
                                                                              (0.189)
 DfF × Independent authority & law                                             0.198
                                                                              (0.203)
 DfF × With legislation                                                        -0.193
                                                                              (0.138)
 DfF × No privacy law                                                          0.428*
                                                                              (0.224)
 Constant                              -0.441**      -0.226      -0.441**      -0.228
                                        (0.209)     (0.229)       (0.209)     (0.229)
 App characteristics                      Yes         Yes           Yes          Yes
 Developer characteristics                Yes         Yes           Yes          Yes
 Week fixed effect                        Yes         Yes           Yes          Yes
 Observations                            92746       92746         92746       92746
 Log-likelihood                      -2.684e+07   -2.666e+07   -2.683e+07   -2.665e+07

Notes: Probit estimation. The dependent variable is the dummy variable IMEI.
Robust standard errors clustered at app level reported in parentheses. The omitted
category in all specifications is Europe. Weighted by the number of developers by
country. Significance levels: ∗p < .10, ∗ ∗26
                                            p < .05, ∗ ∗ ∗p < .01
4.3       Third parties and targeted ad

COPPA legislation also regulates the distribution of targeted ads. To identify whether the
apps offered targeted ads, we identify the third parties that allow their distribution. For this
purpose, we collected data on third parties and categorized them.20
       To measure whether the inclusion of targeted ads in apps is associated with the collection
of sensitive user data, we created an alternative dependent variable to Sensitive Data, named
Sensitive Data Ad Targeting which takes the value 1 if the app collects sensitive data (as
defined by the variable Sensitive Data) and at the same time has third parties that offer
targeted ads. Table 16 shows the percentage of third party ads used by developers in each
group of countries and by apps collecting sensitive data.
       Tables 6 shows the results of the main estimation with the alternative dependent variable
Sensitive Data Ad Targeting. Overall, the variable DfF is negative in all specifications, sug-
gesting that platform design aimed at COPPA compliance and personalized ads is effective.
Column (2) shows the estimation with the interaction terms between the set of institutional
variables and DfF. While developers in non-OECD countries that decide to opt in to the DfF
program are likely to collect sensitive data, only developers in Hong Kong and China do not
comply with COPPA when they decide to opt in to the DfF program. Column (4) shows
the estimation with the interaction terms between the set of regulation variables and DfF.
The interaction term Without developer address × DfF is positive and significant, suggesting
that developers without an address collect more sensitive data and are more likely to offer
  20
   The third parties providing targeted ad are: AdMarvel, AdMob, AdsMogo, AdWhirl, AirPush, AppLovin,
CaulyAds, Chartboost, InMobi, Inneractive, Jumptap, LeadBolt, Madhouse SmartMAD, MdotM, Mediba
Admaker, Millennial Media, MobClix, MobFox, MobWIN, MoPub, Nexage, Noqoush AdFalcon, Revmob,
Smaato, Smart AdServer, Sponsorpay, Tap for Tap, Tapjoy, YuMe.

                                                 27
targeted ads.

                28
Table 6: Probit estimation: Sensitive Data Ad Targeting
                                         (1)           (2)         (3)          (4)
 high download                       0.353***      0.541***    0.353***      0.542***
                                       (0.107)      (0.134)     (0.107)       (0.134)
 DfF                                 -0.607***     -0.440**    -0.606***     -0.438**
                                       (0.070)      (0.202)     (0.070)       (0.202)
 OECD                                0.275***         0.062
                                       (0.083)      (0.163)
 No OECD                             0.358***      0.394***
                                       (0.087)      (0.149)
 USA                                    0.110         0.115       0.110        0.115
                                       (0.078)      (0.141)      (0.078)      (0.141)
 China                               1.342***       0.585**     1.343***     0.585**
                                       (0.116)      (0.298)      (0.116)      (0.298)
 Hong Kong                            1.232***       -0.046     1.231***       -0.045
                                       (0.098)      (0.258)      (0.098)      (0.258)
 Without developer address              0.093        -0.036       0.092        -0.036
                                       (0.092)      (0.137)      (0.092)      (0.137)
 DfF × high download                                 -0.265                    -0.265
                                                    (0.168)                   (0.168)
 DfF × USA                                           -0.035                    -0.035
                                                    (0.169)                   (0.169)
 DfF × China                                       0.891***                  0.892***
                                                    (0.322)                   (0.322)
 DfF × Hong Kong1                                  1.464***                  1.464***
                                                    (0.277)                   (0.277)
 DfF × OECD                                           0.302
                                                    (0.191)
 DfF × No OECD                                       -0.105
                                                    (0.186)
 DfF × Without developer address                    0.336**                   0.335**
                                                    (0.164)                   (0.164)
 Recognized by the EU                                           0.325***       0.034
                                                                 (0.094)      (0.187)
 Independent authority & law                                      -0.165       -0.196
                                                                 (0.133)      (0.237)
 With legislation                                               0.347***     0.398***
                                                                 (0.087)      (0.148)
 No privacy law                                                 0.676***      0.430**
                                                                 (0.116)      (0.207)
 DfF × Recognized by the EU                                                    0.407*
                                                                              (0.215)
 DfF × Independent authority & law                                             0.046
                                                                              (0.285)
 DfF × With legislation                                                        -0.142
                                                                              (0.187)
 DfF × No privacy law                                                          0.351
                                                                              (0.245)
 Constant                             -1.253***    -1.265***    -1.253***    -1.265***
                                       (0.287)      (0.305)      (0.287)      (0.305)
 App characteristics                     Yes          Yes          Yes          Yes
 Developer characteristics               Yes          Yes          Yes          Yes
 Week fixed effect                       Yes          Yes          Yes          Yes
 Observations                           92746        92746        92746        92746
 Log-likelihood                      -3.454e+09   -3.432e+09   -3.452e+09   -3.430e+09

Notes: Probit estimation. Sensitive Data Ad Targeting is the dependent variable.
Robust standard errors clustered at app level reported in parentheses. The omitted
category in all specifications is Europe. Weighted by the number of developers per
country. Significance levels: ∗p < .10, ∗ ∗29
                                            p < .05, ∗ ∗ ∗p < .01
4.4    Tax haven countries

In order to investigate whether the probability of collecting sensitive data might be affected
by a country’s legal framework, we identify which of our sample countries were considered
“tax havens” at the time of data collection. Hong Kong has been on the list of tax heaven
countries. We construct a binary variable called Tax haven which takes the value 1 if the
country is considered a tax haven jurisdiction. Tax regulation can affect collection of various
types of personal data. Developing an app in a tax haven country would suggest that the
developer is intentionally trying to subvert the regulation: collecting personal information
can be profitable and is likely to increase the incentive to violate COPPA. Column (1) of
Table 7 presents the estimates of the tax haven country specification. The estimation in
column (2) includes the interaction terms between the set of tax regulation settings and DfF.
Developers in countries considered tax havens are more likely to collect COPPA sensitive data
even if they comply with DfF. The other results are consistent with our previous comments.

                                              30
5     Developers and app heterogeneity

5.1    Developer specialization in children’s apps

Compliance with the regulation can entail additional costs for companies, especially in the
children’s apps market.
    To assess whether developers specializing in children’s apps are likely to bear the regu-
lation costs if they join DfF, we include the variable Developer: Child specialization in our
regression to measure the ratio between the number of apps that each developer produces in
Google Play and the overall number of apps she produces. Table 8 presents the estimates.
All the specifications include the interaction term DfF × Child specialization, along with
variables for app and developer characteristics and time fixed effects. The regressions are
weighted by the number of apps produced in each country. Table 8 column (1) presents the
estimation for the overall sample. The interaction term DfF × Child specialization is nega-
tive and significant, suggesting that apps created by children’s app specialists who decided
to opt in to DfF are likely to bear regulation costs.
    To further study the heterogeneity of the specialization effect, we estimate the same
specification on different sub-samples. Columns (2), (3) and (4) respectively present the
estimates for the sub-sample of apps produced in the USA, China and Hong-Kong; Columns
(5) and (6) present the estimates for the sub-samples of apps produced in the OECD and
the non-OECD countries. Column (7) shows estimates for countries without a privacy law.
In columns (1), (2) and (5), the interaction terms DfF × Child specialization are negative
and significant, which could indicate that children’s apps developers of the most advanced
countries are able to cover costs of compliance. In Column (6), the interaction term DfF

                                              31
× Child specialization is positive and significant. suggesting that developers in non-OECD
countries will be less likely to bear the costs of regulation.

                                               32
Table 7: Probit estimation: The effect of Tax haven on compliance

                                                    (1)             (2)
       DfF                                      -0.463***      -0.449***
                                                  (0.047)        (0.112)
       Other countries                           0.388***       0.233**
                                                  (0.054)        (0.101)
       United States                             0.234***       0.333***
                                                  (0.050)        (0.104)
       China                                     1.169***       0.621**
                                                  (0.107)        (0.266)
       Hong Kong                                 1.185***        -0.421*
                                                  (0.093)        (0.244)
       Tax heaven countries                      0.372***         -0.107
                                                  (0.081)        (0.218)
       Without developer address                 0.253***          0.099
                                                  (0.059)        (0.094)
       High download                            -0.195***         -0.030
                                                  (0.066)        (0.088)
       DfF × Other countries                                      0.205*
                                                                 (0.118)
       DfF × United States                                        -0.135
                                                                 (0.118)
       DfF × China                                               0.663**
                                                                 (0.291)
       DfF × Hong Kong                                          1.894***
                                                                 (0.262)
       DfF × Tax heaven countries                                0.565**
                                                                 (0.233)
       DfF × Without developer address                          0.241**
                                                                 (0.111)
       DfF × High download                                     -0.268***
                                                                 (0.086)
       Constant                                   -0.069           0.047
                                                 (0.193)         (0.214)
       App characteristics                         Yes              Yes
       Developer characteristics                   Yes              Yes
       Week fixed effect                           Yes              Yes
       Observations                               92746           92746
       Log-likelihood                          -3.153e+07     -3.136e+07
      Notes: Probit estimation. Sensitive Data is the dependent variable.
      Robust standard errors clustered at app level reported in parentheses.
      The omitted category in all specifications is Europe. Weighted by
                                       33
      the number of developers per country. Significance levels: ∗p < .10,
      ∗ ∗ p < .05, ∗ ∗ ∗p < .01
Table 8: Is developer’s children specialization reducing costs of compliance?
                                Overall         USA        China     Hong-Kong       OECD      Non OECD     No privacy law
                                   (1)           (2)         (3)           (4)         (5)          (6)          (7)
 Child Specialization            0.066         -0.045       1.937        -0.707      0.077       -0.370*       1.577**
                                (0.115)       (0.196)     (2.684)       (0.914)     (0.123)      (0.203)       (0.668)
 DfF                           -0.360***    -0.508***      -1.136      1.137***    -0.328***    -0.271***      -0.411*
                                (0.076)       (0.130)     (1.054)       (0.387)     (0.072)      (0.099)       (0.250)
 DfF × Child Specialization    -0.509***     -0.558**      -0.488        0.586     -0.617***     0.417**       -1.139*
                                (0.134)       (0.221)     (2.797)       (0.927)     (0.137)      (0.211)       (0.689)
 high download                  -0.130*        -0.144     1.389*        0.642*     -0.179***      -0.063        0.371
                                (0.076)       (0.108)     (0.714)       (0.356)     (0.067)      (0.097)       (0.253)
 Constant                        0.117        0.700**   -11.574***    -7.230***      0.261        -0.507        1.854
                                (0.211)       (0.325)     (4.006)       (2.792)     (0.205)      (0.344)       (1.775)
 App characteristics              Yes           Yes          Yes          Yes         Yes          Yes           Yes
 Developer characteristics        Yes           Yes          Yes          Yes         Yes          Yes           Yes
 Week fixed effect                Yes           Yes          Yes          Yes         Yes          Yes           Yes
 Observations                    92746         20863        1876          2985       52816        21040         2484
 Log-likelihood               -7.021e+09   -11686.934    -588.358     -1264.556   -31147.419   -13375.824     -1446.375

Notes: Probit estimation. Sensitive Data is the dependent variable. App characteristics include all donwload
categories. Robust standard errors clustered at app level reported in parentheses. Weighted by the number of
developers per country. Significance levels: ∗p < .10, ∗ ∗ p < .05, ∗ ∗ ∗p < .01

                                                         34
5.2    Do big developers collect more data?

We explore the sizes of developers to investigate whether different sizes are associated with
a different probability of collecting sensitive data. To do this, we introduce the number of
the developer’s apps available from the Google Play using the variable Number of apps by
developers to our regression. This measure allows us to determine whether the behaviors
of large professional developers and small developers differ. Table 9 presents the results.
Column (1) includes the interaction term between Number of apps by developer and the
regulation variables. Column (2) presents the full specification including the interaction
term between DfF × Number of apps by developer and a vector of the regulation measures.
For developers located in countries with strong regulation, subscription to the DfF program is
unlikely to affect the probability of the apps collecting sensitive data. In countries with weak
privacy legislation, the size of the developer has an effect. Large developers including those
in countries with weak privacy regulation seem better able to bear the costs of regulation.
This result is confirmed in the case of big developers originating in China and is in line with
the theoretical findings in Campbell et al. (2015) that privacy regulation imposes costs on
all firms but that small firms are less likely to be able to bear these costs.

                                               35
You can also read