The Betrayal At Cloud City: An Empirical Analysis Of Cloud-Based Mobile Backends - Usenix

Page created by Clyde Osborne
 
CONTINUE READING
The Betrayal At Cloud City: An Empirical Analysis Of Cloud-Based Mobile Backends - Usenix
The Betrayal At Cloud City: An Empirical Analysis
       Of Cloud-Based Mobile Backends
 Omar Alrawi, Georgia Institute of Technology; Chaoshun Zuo, Ohio State University;
 Ruian Duan and Ranjita Pai Kasturi, Georgia Institute of Technology; Zhiqiang Lin,
  Ohio State University; Brendan Saltaformaggio, Georgia Institute of Technology
           https://www.usenix.org/conference/usenixsecurity19/presentation/alrawi

           This paper is included in the Proceedings of the
                  28th USENIX Security Symposium.
                     August 14–16, 2019 • Santa Clara, CA, USA
                                     978-1-939133-06-9

                                               Open access to the Proceedings of the
                                                28th USENIX Security Symposium
                                                     is sponsored by USENIX.
The Betrayal At Cloud City: An Empirical Analysis Of Cloud-Based Mobile Backends - Usenix
The Betrayal At Cloud City:
                       An Empirical Analysis Of Cloud-Based Mobile Backends

            Omar Alrawi*                             Chaoshun Zuo*                            Ruian Duan
    Georgia Institute of Technology              The Ohio State University           Georgia Institute of Technology
         Ranjita Pai Kasturi                          Zhiqiang Lin                      Brendan Saltaformaggio
    Georgia Institute of Technology              The Ohio State University           Georgia Institute of Technology

                             Abstract                             mobile app developers often disregard prudent security
   Cloud backends provide essential features to the mobile        practices when choosing cloud infrastructure, building, or
app ecosystem, such as content delivery, ad networks, ana-        renting these backends.
lytics, and more. Unfortunately, app developers often disre-         Recent backend breaches of the British Airways [1] app
gard or have no control over prudent security practices when      and Air Canada [2] app demonstrate how wide-spread these
choosing or managing these services. Our preliminary study        incidents are. More recently, the hijacking of the Fortnite
of the top 5,000 Google Play Store free apps identified 983       mobile game [3] showed how incrementally-downloaded
instances of N-day and 655 instances of 0-day vulnerabilities     content from mobile backends can allow an attacker to in-
spanning across the software layers (OS, software services,       stall additional mobile apps without the user’s consent. Ad-
communication, and web apps) of cloud backends. The mo-           ditional cases [4] involving the exposure of 43TB of enter-
bile apps using these cloud backends represent between 1M         prise customer names, email addresses, phone numbers, PIN
and 500M installs each and can potentially affect hundreds        reset tokens, device information, and password lengths was
of thousands of users. Further, due to the widespread use of      due to insecure mobile backends and not the developer’s mo-
third-party SDKs, app developers are often unaware of the         bile app code.
backends affecting their apps and where to report vulnera-           Even for security-conscious developers, it is not clear
bilities. This paper presents SkyWalker, a pipeline to auto-      what backends their mobile app will interact with because
matically vet the backends that mobile apps contact and pro-      of third-party libraries. Third-party libraries do not expose
vide actionable remediation. For an input APK, SkyWalker          their backends to developers, instead, they offer an applica-
extracts an enumeration of backend URLs, uses remote vet-         tion program interface (API) that developers use. Many of
ting techniques to identify software vulnerabilities and re-      these vulnerabilities can be identified ahead of time if de-
sponsible parties, and reports mitigation strategies to the app   velopers have the right tools and resources to evaluate the
developer. Our findings suggest that developers and cloud         security of their backends. Further, identifying vulnerable
providers do not have a clear understanding of responsibil-       software layers and the responsible party can expedite reme-
ities and liabilities in regards to mobile app backends that      diation and therefore lower the risk of exposure.
leave many vulnerabilities exposed.                                  To deal with the complexities in cloud infrastructure, the
                                                                  research community surveyed [5] and proposed several tax-
                                                                  onomies [6], ontologies [7], assessment models [8], and
1    Introduction                                                 threat classifications [9]. Unfortunately, these approaches
                                                                  provide few practical recommendations for mobile app de-
Cloud-based mobile backends provide a wide array of               velopers. Recent works on server-side vulnerability discov-
features, such as ad networks, analytics, content delivery,       ery of mobile apps [10]–[12] have shown that a lack of secu-
and much more. These features are supported by multiple           rity awareness among app developers is a growing problem.
layers of software and multiple parties including content         Yet, these works only scratch the surface by examining only
delivery networks (CDNs), hosting providers, and cloud            the software service layer of mobile backends.
providers who offer virtual/physical hardware, provisioned           A systematic study is needed to identify the most pressing
operating systems, and managed platforms. Due to the in-          issues facing mobile backends. Moreover, to conduct such
herent complexity of cloud-based backends, deploying and          a study, the analysis must be reproducible, transparent, and
maintaining them securely is challenging. Consequently,           easy to interpret for developers. The study should be done
    *Authors contributed equally.                                 on a representative mobile app ecosystem to provide real in-

USENIX Association                                                                   28th USENIX Security Symposium        551
sight into the backend vulnerability landscape. Finally, the       or Java libraries, i.e., the Unity3D backends. In most cases,
study should offer practical steps to guide and inform app         the developer will first have to employ static binary analy-
developers on the security of their mobile backends.               sis tools or dynamically instrument the app to track multiple
   To this end, this paper presents the design and implemen-       levels of SDK inclusion. SkyWalker automatically identified
tation of SkyWalker, an analysis pipeline to study mobile          13 unique backends from this app’s APK (shown in Table 1)
backends. Using SkyWalker, we conducted an empirical               and mapped them to the modules they were found in, i.e.,
analysis of the top 5,000 free mobile apps in the Google Play      library backends versus developer backends.
store from August 2018. Based on this study, we uncovered
                                                                       Party   Vendor                Backend                  Purpose
655 0-day instances and 983 N-day instances affecting
                                                                               Vasco                                          Game
thousands of apps. We used Google Play Store metadata to              Hybrid
                                                                               Games
                                                                                         androidha.vascogames.com
                                                                                                                              Content
measure the impact of our findings and estimate the number                               api.uca.cloud.unity3d.com            Telemetry
                                                                                         cdn-highwinds.unityads.unity3d.com   Ads
of affected users. We propose mitigation strategies for dif-                   Unity3D
                                                                                         config.uca.cloud.unity3d.com         Telemetry
ferent types of vulnerabilities and guidelines for developers                            impact.applifier.com                 Telemetry
                                                                                         bs.serving-sys.com                   Ads
to follow. Lastly, we offer the SkyWalker analysis pipeline                    Sizmek
                                                                                         secure-ds.serving-sys.com            Ads
                                                                       Third
as a free public web-service to help developers identify                        Moat
                                                                                         px.moatads.com                       Analytics
                                                                                         z.moatads.com                        Ads
what backends their mobile apps interact with, the security                              googleads.g.doubleclick.net          Ads
state of the backends, and recommendations to address any                      Google
                                                                                         pagead2.googlesyndication.com        Ads
                                                                                         tpc.googlesyndication.com            Ads
detected issues.                                                                         www.google-analytics.com             Analytics
   Our empirical study found 983 N-day instances of 52
vulnerabilities affecting hypervisors, operating systems,          Table 1: Backends identified for Crime City Real Police
databases, mail servers, DNS servers, web servers, scripting       Drive and their purpose. The red cells indicate vulnerable
language interpreters, and others. We found 655 0-day in-          backends.
stances of SQL injection (SQLi), cross-site-scripting (XSS),
and external XML entity (XXE). These affected thousands               Backends have layers of software (components) that sup-
of mobile apps, with some apps having over 50M+ installs           port the web application software (AS), including an oper-
and more than 332,000 reviews. We present two case studies         ating system (OS), software services (SS), and communi-
to demonstrate the vulnerabilities affecting a specific devel-     cation services (CS). The developer now has to fingerprint
oper and vulnerabilities affecting a platform that is used by      the backends to inventory the software layers and identify
many developers.                                                   the software type, version, and its purpose. Using this in-
   We found these backends to be geographically distributed        formation, the developer can then check to see if any of
across the globe and hosted on 6,869 different networks.           their software is outdated or affected by a known vulnera-
We notified all affected parties about the findings, and           bility [13], a laborious and time-consuming task. SkyWalker
were careful to follow ethical and legal guidelines when           identified that the game content backend runs Debian 6 for
conducting this study, additional details are in Section 8. We     the OS; OpenSSH 6.5p1, Apache httpd 2.2.22, PHP/5.4.4-
propose mitigation strategies for developers to follow based       14, and Apache-Coyote/1.1 for the SS; and uses the HTTP
on the issues found and the types of backends. We conclude         protocol for CS. SkyWalker’s search of the national vulner-
with recommendations for deploying and maintaining secure          ability database (NVD) [13] and correlation with the finger-
backends.                                                          print results showed multiple common vulnerability expo-
                                                                   sure (CVE) entries affecting PHP 5.4.4-14. Further, the De-
2     A Motivating Example                                         bian version running on the backend is no longer supported
                                                                   and does not receive any updates from the vendor.
Mobile apps use cloud-based backend services to support ex-           In addition to these issues, the developer’s AS can contain
tensive functions like ads, telemetry, content delivery, and       bugs that must be audited. The developer can check the AS
analytics. Unfortunately, a mobile app developer who wants         by auditing the parameters passed to each API and testing
to audit the backends their app uses will quickly find that this   for SQLi, XSS, XXE, or any other applicable vulnerabilities
is harder than it seems. The first thing the developer must        from OWASP’s top 10 common issues [14]. This task re-
do is simply enumerate those mobile backends. Consider             quires secure programming experience and security domain
the Crime City Real Police Driver (com.vg.crazypoliceduty)         expertise to identify bugs in the source code. SkyWalker
app, a mobile game with over 10M+ installs and 126,257             found that the game content backend interface is vulnera-
reviews. The mobile app uses several third-party SDK li-           ble to SQLi for some parameters passed by the mobile app,
braries including Amazon In-App Billing, SupersonicAds,            which is due to the AS not properly sanitizing the input.
Google AdMob, Unity3D, Nuance Speech Recognition Kit,                 The developer must now remediate or mitigate these risks,
and Xamarin Mono. The developer may not be aware of                but each backend layer may be operated by different entities
many of the backends that are invoked from imported native         that provide hardware and software as a service. Therefore,

552    28th USENIX Security Symposium                                                                            USENIX Association
before fixing any issues, they must figure out what party is re-     • Hardware (HW ) refers to the physical or virtual hard-
sponsible for each component. SkyWalker fingerprinted the              ware that hosts the backend.
Crime City Real Police Driver game content backend, an-
droidha.vascogames.com, as being hosted on a Google Com-             • Operating System (OS) refers to the OS running on the
pute Engine Flexible Environment instance (which provides              hardware, i.e., Linux or Windows.
virtual hardware, operating system, and PHP). We refer to            • Software Services (SS) refers to software services run-
this type of backend model as hybrid since Google is par-              ning in the OS, i.e., database service, web service, etc.
tially responsible for the virtual environment and the devel-
oper is responsible for the AS and CS.                               • Application Software (AS) refers to the custom appli-
   The developer must come up with a remediation strategy              cation interface used by mobile apps to interact with the
to address these problems. Google advertises that they patch           running services.
any vulnerable software affecting the OS and SS, but this is
only applicable to non-deprecated versions. In the case of           • Communication Services (CS) refers to the communi-
Crime City Real Police Driver app, the developer is respon-            cation channel supported between the mobile app and
sible for all the software layers since the OS and SS versions         the mobile backend.
are deprecated. The developer must upgrade to a supported             Our approach does not consider the hardware layer
OS, apply patches to the PHP interpreter (SS), patch the AS        because 1) we would need root-level access on the backend
source code against SQLi, and support HTTPS for secure CS.         to evaluate the hardware and 2) mobile app developers have
   The Unity3D, Sizmek, and Moat backends shown in Ta-             no direct way of addressing hardware vulnerabilities, i.e.,
ble 1 are called third-party, since the developer has no con-      manufacturers must issue firmware updates or replace the
trol over them. This evaluation must also be carried out           hardware. It is important to note that this work does not
on third-party backends to identify additional vulnerabilities     consider the mobile app security, instead we leverage the
(potentially affecting all apps which use those shared ser-        mobile app to study the backend.
vices). SkyWalker found that the Crime City Real Police               We differentiate between mobile backends by ownership,
Driver app uses the config.uca.cloud.unity3d.com backend,          which provides a granular mapping between stakeholders
which contains an XXE vulnerability, and the bs.serving-           and resources. We define four labels for the mobile back-
sys.com backend that contains an XSS vulnerability. Ideally,       ends with respect to the app developer:
the developer could report those vulnerabilities to the plat-
form through a bug bounty program or migrate their app to            • First-Party (B1st ) refers to backends that are fully man-
backends that are not vulnerable.                                      aged by the mobile app developers (i.e., full control
   This manual assessment procedure is very involved and               over the backend).
requires extensive security domain knowledge, which many
app developers may not have. Instead, SkyWalker gives all            • Third-Party (B3rd ) refers to backends that are fully
mobile app developers the ability to identify the backends             managed by third-parties (i.e., no control over the
invoked by their app, assess their software layers, and sug-           backend).
gest remediation strategies to improve the security for their        • Hybrid (Bhyb ) refers to backends that are co-managed
mobile app backends.                                                   by third-parties and developers such as cloud infras-
                                                                       tructure (i.e., some control over the mobile backend).
3     Background
                                                                     • Unknown (Bukn ) refers to backends that ownership
This section defines an abstraction to model mobile backends           could not be established with high confidence.
for our empirical study. We also define our labeling for back-        In our model, there are two primary stakeholders, the app
ends and create a mapping between responsible stakeholders         developers (D) and the cloud service providers (SP). There
and resources. We outline how we count vulnerabilities and         are additional stakeholders, like app users and internet ser-
define them in the context of this work.                           vice providers (ISP), but they do not have direct remediation
                                                                   oversight. We define a mapping between backends layers,
3.1    Mobile App Backend Model                                    labels, and ownership, shown in Table 2.
                                                                      The final piece of the model is the mitigation compo-
We follow the standard definition for mobile backends used         nent that maps vulnerable backends to the proper mitigation
by industry leaders [15]–[18], which encompass many cloud          strategies. There are five mitigation strategies for developers:
features, such as storage, user management, notifications,
and APIs for various services, regardless of who maintain-           • Upgrade (u) the software to vendor supported versions.
s/owns them. We breakdown mobile backends into a stack
representation that consists of five layers:                         • Patch (p) vulnerable software with a vendor patch.

USENIX Association                                                                     28th USENIX Security Symposium         553
Label                 HW   OS   SS    AS   CS              4     Methodology
         First-Party (B1st )   #    #    #     #    #
         Third-Party (B3rd )                                        In this section, we provide an overview of our assessment
         Hybrid (Bhyb )             #
                                    H    #
                                         H     #    #
                                                                    and details about implementing SkyWalker. Figure 1 is an
Table 2: Backend labels (first-party - B1st , third-party           overview of SkyWalker’s internal components. We divide
B3rd , and hybrid - Bhyb ) and cloud layers (hardware - HW ,        the implementation into four phases, namely binary analy-
operating system - OS, software services - SS, application          sis, labeling, fingerprinting, and vulnerability analysis. Each
software - AS, and communication services - CS) mapping             phase provides input to the next phase, starting from an input
to stakeholders (developers - #, service providers - , and          app APK to the final vulnerability/mitigation report.
shared - #
         H)
                                                                    4.1    Binary Analysis
  • Block (b) incoming internet traffic to exposed services.        SkyWalker leverages our prior work, Smartgen [19], to per-
                                                                    form the binary analysis and extract query messages from
  • Report (r) the vulnerability to the responsible party.          an APK binary. SkyWalker dynamically executes the code
                                                                    paths to the network functions and extracts the native usage
  • Migrate (m) the backend to secure infrastructure.
                                                                    of the backend APIs. The native usage of an API includes
In many cases, the developer may not have control or                the URI path and their parameter types/values.
authority to fix the issues but still has the option to report it
(r) or change service provider (m).                                 4.2    Backend Labels
                                                                    Backend labeling assigns one of the four labels defined in our
3.2    Counting Vulnerabilities                                     model. The labels are used to map the responsible parties
This work considers vulnerabilities which are software bugs         and the mitigation strategies needed (excluding unknown),
that exist in the backend software stack, including the oper-       shown in Table 2. Moreover, the labels are used to iden-
ating system (OS), services (SS), application (AS), and com-        tify where the most common issues are found. To perform
munication (CS). We consider N-Day vulnerabilities to be            the labeling, we curate three unique lists using the ipcat [20]
those vulnerabilities which have an associated common vul-          datacenter dataset. The first list is called CP and contains
nerabilities and exposure (CVE) number assigned by the na-          cloud providers, content delivery networks (CDNs), and mo-
tional institute of standards and technology (NIST) and in-         bile platform cloud services. The second list, Colo, contains
dexed in the national vulnerability database (NVD) [13]. In         a list of collocation centers. The third list is a list of SDK
our findings, we count N-Day vulnerabilities by class and           libraries that we extracted using LibScout [21] (Table 3),
instance, where class refers to the CVE number of a partic-         which help SkyWalker identify third-party backends. OS-
ular vulnerability and an instance refers to the vulnerabil-        SPolice [22] provides a more comprehensive list, including
ity affecting a specific interface or software component on         native libraries used by the mobile app, but our binary analy-
a mobile backend. For example, Apache Struts vulnerabil-            sis technique only instruments Java code, therefore, we limit
ity CVE-2017-5638 that affects Apache Struts 2.3.x before           the third-party SDK identification to LibScout.
2.3.32 and 2.5.x before 2.5.10.1 is counted as a single vul-           To perform the labeling we generate a tuple for each
nerability (class), but it can affect multiple backends that run    extracted backend B that contains the effective-second level
different versions of Apache Struts (instances).                    domain d, IP address ip, a boolean flag lib indicating if
   Some software versions are affected by multiple CVEs, in         the backend belongs to an SDK library, and the developer
this case, we do not count every CVE as an instance. We             or vendor name v. We define a function owner() that
generally assume patching the latest CVE should address all         parses WHOIS, MaxMind [23], and ASN records to extract
previous unpatched CVEs. We only consider the latest CVE            ownership information. The owner() function uses text
affecting the vulnerable software and count it once. Further,       tokenization, normalization, and aliasing to consolidate
a vulnerability instance is a tuple of the backend’s domain         varying records.
name, IP address, and the vulnerable software version. As              SkyWalker uses Algorithm 1 to assign labels to each back-
for 0-Day vulnerabilities, they are associated with the soft-       end. Algorithm 1 takes as input a list of backends, β , con-
ware application (AS) running on the backend. This work             taining tuples B = {d, ip, lib, v} and returns a list of labeled
looks at three classes of 0-Day vulnerabilities, SQLi, XSS,         backends β 0 . The algorithm uses the CP and Colo list to
and XXE and counts each instance per API interface end-             check membership for the domains and IPs to determine the
point on the mobile backend. The defined model, labels,             appropriate label. The first check is to determine the origin
mitigations, mappings, and vulnerabilities are the basis for        of the backend (was it extracted from an SDK library?) then
our methodology, which we describe next.

554   28th USENIX Security Symposium                                                                          USENIX Association
Binary Analysis                            Backend Labeling             Service Discovery and Fingerprinting                Vulnerability Analysis
                      Static Analysis                                   Populate Backend
                                             Network API                                                                    Interactive Service               Fingerprint – CVE
         App IR                                                        Tuple {d, ip , lib, v}
                       (Build ECG)           Identification                                                                    Identification                    Correlation
 10101                                                                                                       Remote Ping
 10001
                                                                                 Ownership                                                        Fingerprinted       Vulnerability
 01011                                                                                        Labeled
                                                                                 Extraction Backend URL                                             Backend F         Verification
App                                                                                                          Port Scan
                        Constraint Instrumented      Backend URL Backend URL                                                         NASL
APK       Selective                                                                                                                                     Fingerprint      Report
                        Dynamic Analysis              Extraction                  Label                                       Layered
          SymbEx                                                                                                                                      Confidence         Backend
                                                                                Assignment                                    Software Identification   Score            Issues

Figure 1: SkyWalker Overview. Phase 1 (Binary Analysis) extracts backend URLs through a dynamic binary instrumentation
technique. Phase 2 labels backends into first-party, third-party, and hybrid. Phase 3 discovers and fingerprints the backend
services to collect cloud layer information. Phase 4 (vulnerability analysis) uses the fingerprints and correlates them with
public vulnerabilities to identify vulnerable backends.

                                                                                                                           Third-Party SDKs
 Algorithm 1: Assigning Labels to Backends
                                                                                                 ACRA               CleverTap          InMobi                     Supersonic
    Input: β = List of backend tuple B = {d, ip, lib, v}
                                                                                                 AMoAd              Crashlytics        JSch                       Syrup
    Output: β 0 = Ownership labeled backend list
    SDK: List of backend domains found in the SDK libraries;                                     AdColony           Crittercism        Joda-Time                  Tapjoy
    CP: List of cloud and hosting providers (domains, net prefix, and ASNs);                     AdFalcon           Dagger             MdotM                      Tremor Video
    Colo: List of collocation providers (domains, net prefix, and ASNs);                         Adrally            EventBus           Millennial Media           Twitter4J
    for ∀B ∈ β do                                                                                Amazon             ExoPlayer          Mixpanel                   Urban-Airship
         if B.lib ∨ B.d ∈ SDK then                                                               Android            Facebook           MoPub                      Vungle
               // Backend from Java lib
               B.label ← “third-party”;                                                          Apache             Firebase           New-Relic                  WeChat
               continue                                                                          AppBrain           Flurry             OkHttp                     flickrj
         end                                                                                     AppFlood           Fresco             Parse                      heyZap
         if owner(B.d) 6= v ∧ owner(B.d) ∈ / CP then                                             AppsFlyer          Fyber              Paypal                     ironSource
               // Backend domain not owned by developer or CP
                                                                                                 BeaconsInSpace     Google             Picasso                    jsoup
               B.label ← “third-party”;
               continue                                                                          Bolts              Gson               Pollfish                   roboguice
         end                                                                                     Brightroll         Guava              Retrofit                   scribe
         if B.ip ∈ CP then                                                                       Butter-Knife       Guice              Segment                    smaato
               // Backend IP hosted by cloud provider                                            Chartboost         HockeyApp          Stetho                     vkontakte
               B.label ← “hybrid”;
               continue
         end
         if B.ip ∈ Colo then                                                                    Table 3: A list of third-party SDKs extracted by LibScout
               // Backend IP hosted by collocation center                                       from the top 5,000 apps, which is used to curate third-party
               B.label ← “first-party”;
               continue                                                                         backends.
         end
         B.label ← “unknown”;
    end
                                                                                                and configuration of each service. Our approach is a multi-
                                                                                                tier approach that starts by remotely pinging the backend,
                                                                                                then port scanning it, then interacting with the discovered
assigns “third-party” label if lib‘s value is true or the back-                                 service, and finally collecting service configurations. For in-
end domain belongs to the list of SDK backends.                                                 stance, the scan first checks to see if the host is reachable,
   If none of the previous statements are true about the do-                                    then it scans for all ports to identify available services, then
main, then SkyWalker checks the IP membership against the                                       it tries to connect to the service to collect its banner, and fi-
CP and Colo list. If the IP address belongs to a network                                        nally, if the services use TLS/SSL, it would collect their con-
on the CP list SkyWalker assigns “hybrid” label. If the IP                                      figurations and supported ciphers. For each step, our scanner
address belongs to a network on the Colo list SkyWalker as-                                     is configured to be non-intrusive, throttled (slow scan speed
signs “first-party” label. Otherwise, SkyWalker assigns an                                      and a light load on the remote server), and conservative (us-
“unknown” label since it cannot be determined. It is im-                                        ing techniques that yield low to no false positives).
portant to note that SkyWalker’s labeling approach relies on                                        First, SkyWalker groups all IP addresses into their net-
LibScout [21] to identify third-party backends based on the                                     work prefixes and in a random order picks a prefix and a ran-
SDK libraries. SkyWalker performs an additional check be-                                       dom IP from the selected prefix to scan. Prefixes are grouped
fore setting the lib flag to exclude SDK libraries built by the                                 by the autonomous system number (ASN) for each network.
same vendor (Google, Facebook, etc.).                                                           If a network spans multiple ASNs, SkyWalker keeps each
                                                                                                ASN as a separate prefix to distribute the scanning uniformly
4.3        Service Discovery and Fingerprinting                                                 across different IP segments. SkyWalker does a TCP ping
                                                                                                against common service ports (FTP, SSH, HTTP/S, IMAP,
Service discovery identifies internet-facing services on back-                                  SMTP, RDP, etc.) by sending out a SYN packet followed by
ends and fingerprinting identifies the software type, version,

USENIX Association                                                                                                       28th USENIX Security Symposium                          555
a RST packet. TCP ping scans are more reliable in detecting        exposure present a high risk that can violate legal obliga-
the availability of the remote server (backend) because they       tions. Adding a module to SkyWalker to support additional
are not filtered by firewalls like ICMP scans.                     vulnerabilities is trivial and can be easily implemented.
   Once SkyWalker establishes the host is reachable, Sky-             For each backend interface, a number of parameters
Walker conducts a TCP SYN scan (SYN-SYN/ACK-RST)                   (p) are associated with each request. SkyWalker tests
across all ports. This process identifies candidate ports on       each interface p times to check every parameter for SQLi
the target backend that will be used for a more thorough scan      and XSS. The XXE check is performed on all interfaces
(TCP connect). To be efficient, SkyWalker uses the list of         because some AS can accept JSON or XML requests. As
ports identified in the TCP SYN scan to conduct a TCP con-         mentioned earlier, the scan is slow and randomly done to
nect scan (SYN-SYN/ACK-ACK) i.e., establish a complete             avoid congestion and degradation of service on production
connection. Based on the port/service identified, SkyWalker        backends. SkyWalker creates two queues, a job queue and
interactively grabs the banner, the header response, and any       a processing queue. SkyWalker generates p requests for a
available configuration. The retrieved information varies per      given backend interface and stores them in the job queue.
service type, for example, HTTP will have header informa-          The job queue contains all backend requests, which are
tion unlike SSH, nonetheless, both help fingerprint the host.      shuffled and loaded into the processing queue in batches
Moreover, SkyWalker looks for TLS/SSL connections on all           (128 requests per batch). Batches that contain requests with
candidate ports because many services like HTTP and IMAP           the same domain or IP address are removed and replaced by
can run over TLS/SSL. Finally, to obtain the backend IP ad-        non-overlapping domains and IP address requests. There are
dress fronted by CDNs, SkyWalker looks up the IP address           32 workers that ingest from the processing queue and store
in a manually curated CDN list and uses passive DNS to find        the results for vulnerability analysis.
historical records that existed just before the current records.
When SkyWalker cannot locate such record, the backend is           4.4    Vulnerability Analysis
excluded from fingerprinting.
   Once SkyWalker discovers all the services running on a          The vulnerability analysis is two parts, N-day analysis, and
backend, SkyWalker uses the result to fingerprint the back-        0-day analysis. For the N-day analysis, SkyWalker correlates
end. The fingerprint identifies the OS, SS, and CS type            CVE entries with results from the fingerprinting to identify
(Linux, Windows; PHP, .NET, Python, Perl; FTP, SFTP,               possible issues. The confidence level of the fingerprint re-
HTTP, HTTPS, SSH, IMAP, etc.), version, and configura-             sults is also used to verify each vulnerability. SkyWalker
tion information if available. The fingerprinting uses open        uses NASL scripts that take the output of the service dis-
source and commercial Nessus Attack Scripting Language             covery, OS identification, SS identification, and CS identifi-
(NASL) scripts to identify the different layers of software        cation as input and match them against known vulnerabili-
on the backend. For example, to identify the OS, the NASL          ties (CVEs). The NASL results are considered if they have
script inspects the banner string, analyzes the SSL certificate,   90% confidence level or higher for OS detection, which pro-
checks additional running services (SMB, RDP, SSH), per-           vides high accuracy for vulnerability matching. Note, that
forms structured ICMP pings, inspects HTTP headers, and            the confidence level is calculated based on pre-profiled OSes
uses TCP/IP fingerprinting algorithms [24]. Based on these         by matching the fingerprint signals (collected from all lay-
signals a confidence score is provided based on matching a         ers) to the profile signals.
set of pre-profiled OSes. For example, if 90% of the signals          We manually verified all 983 N-days and found them to
match a Windows Server 2008 R2 Service Pack 1 profile, we          be all true positives. The zero false positive results are due
consider the OS layer for that backend in the vulnerability        to the Nessus configuration, which allows us to tune how
analysis. Any confidence level below 90% or ambiguity be-          the scans are done and how they should be reported. For
tween the same OS but different versions will not be consid-       example, we configure Nessus to perform the scan types
ered for the vulnerability analysis phase.                         described above, consider OS type and version detection of
                                                                   90% or higher and consider SS that have banner information
Web Applications. Web apps (AS) are generally tailored
                                                                   with version numbers. On the other hand, when we used
per mobile app, unlike OS, SS, and CS layers. The binary
                                                                   UDP scanning techniques and consider generic service
phase performs in-context analysis for each API interface on
                                                                   banner information we find over 6, 500 candidate N-day
the backend, which provides API information used for fin-
                                                                   instances with a large false positive rate. In theory, the back-
gerprinting. We reference the OWASP’s top 10 vulnerability
                                                                   end can be configured to lie about the banner information,
issues [14] that can be passively tested within the ethical and
                                                                   which would make it hard for us to verify.
legal bounds discussed in Section 8. Specifically, SkyWalker
                                                                      For the 0-day analysis, SkyWalker carefully triggers the
uses side-channel SQLi through time delay, reflective XSS,
                                                                   candidate vulnerability to verify the findings. For each vul-
and XXE callback to identify candidate issues in web apps.
                                                                   nerable parameter, SkyWalker generates a pair of request
It is important to note that other vulnerabilities such as au-
                                                                   messages, the original message and the vulnerable message.
thentication bypass, broken access control, and sensitive data

556   28th USENIX Security Symposium                                                                         USENIX Association
For SQLi, SkyWalker baselines the original request message        this process. The SkyWalker service can be found at:
several times throughout the week and at different times of       https://MobileBackend.vet.
the day. Then SkyWalker performs the same measurement
on the vulnerable message in the same week but in non-
                                                                  5     Assessment Findings
overlapping time intervals by triggering the vulnerable pa-
rameter through an SQLi sleep injection. SkyWalker calcu-
                                                                  5.1    Experiment Setup
lates the response time deviation based on the sleep param-
eter passed in the SQL statement and the average response
                                                                  Environmnet. We use a local workstation running Ubuntu
time of the message pairs. If the deviation is equal to the
                                                                  14.04 with 24GB memory and 16 x 2.393GHz Intel Xeon
time delay parameter in the SQL statement, SkyWalker con-
                                                                  CPUs and four Nexus phones to run and instrument the mo-
cludes that the interface and parameter pair is vulnerable.
                                                                  bile apps. We use an Amazon Web Service (AWS) Elastic
   Similarly for XSS, SkyWalker triggers the vulnerable pa-
                                                                  Compute (EC2) instance with a reserved IP address to con-
rameter and includes JavaScript code to creates a new div
                                                                  duct the fingerprinting and run a web server with informa-
element with a unique name attribute. SkyWalker checks
                                                                  tion about our study along with an email address for backend
the returned content by parsing the document object model
                                                                  hosts to contact us if they want to opt-out.
(DOM) to find the div element containing the unique name
attribute. If the div element with the set name attribute ex-     Tools and Data Sets.        For the binary analysis tool im-
ists SkyWalker concludes that the interface and parameter         plementation, we relied on Soot [25], FlowDroid [26],
are vulnerable. Note that SkyWalker matches the returned          Z3-str [27], and Xposed [28] with custom code written in
content with parameters sent to ensure that the XSS candi-        Java (7, 000 lines of code) and Python (900 lines of code).
date vulnerability is of type 2 (reflected). For XXE, Sky-        For our backend labeling implementation, we relied on Team
Walker generates a request message that contains an HTTP          Cymru IP-to-ASN [29], MaxMind Geolocation [23], Alexa
callback request to a server we operate. The request mes-         ranking [30], ipcat list [20], and Domaintools WHOIS [31]
sage is passed to the backend, which will parse the specially     with custom code written in Python (480 lines of code).
crafted XML document. If the parser is vulnerable to XXE,         For fingerprinting, we relied on the Nessus scanner and
SkyWalker will log an HTTP request from the backend un-           commercial plugins [32], sqlmap [33], and Acunetix [34].
der analysis, which indicates the interface is vulnerable. In     We used Nessus plugins and custom Python code (1010 lines
addition, we manually reviewed the request/return pairs for       of code) to perform the vulnerability analysis. For internet
all 655 0-day instances and found no false positives.             measurements, we utilized honeypot scanning activity from
                                                                  Greynoise [35].
4.5    Open Access for Developers
                                                                  5.2    Software Vulnerability Details
One of our primary goals for this work is to empower app
developers with open access to SkyWalker via a free-to-use        Table 4 shows the distribution of 0-day and N-day instances
web-service. The service currently supports Android mobile        across the software layers. We categorize the apps using the
apps but can be extended to support other mobile platforms,       Google Play store groups and present the number of vulnera-
e.g., Apple iOS. The web-interface takes as input a link to an    bilities and backend labels. Overall, we analyzed 4, 980 apps
Android app in the Google Play store or a direct APK upload.      with cloud-based backends and successfully extracted back-
SkyWalker then performs binary analysis to extract the back-      ends for 4, 740 mobile apps. The remaining 240 mobile apps
ends, label them based on our curated dataset, fingerprint        crashed and did not complete the full binary analysis.
them, and identify vulnerabilities that affect them. In addi-        Interestingly, the OS component reports the least vulner-
tion to the analysis, the output report provides guidelines on    abilities, while the AS component reports the most vulnera-
how to mitigate the identified issues using the strategies dis-   bilities, across all mobile app categories. Recall from Sec-
cussed earlier (upgrade, patch, block, report, and migrate).      tion 3.2, vulnerabilities affecting AS components are all con-
   SkyWalker summarizes vulnerability findings across all         sidered 0-day. The OS, SS, and SC components account for
observed SDK and Java library backends, which developers          N-day vulnerabilities. Although the number of apps is not
can turn to proactively to make an informed decision when         uniform across the categories, we use the raw vulnerability
choosing third-party libraries to include in their future         count for ranking. For 0-day vulnerabilities, the top three
apps. It is important to note that attackers can abuse this       mobile app categories are tools, entertainment, and games.
system to attack mobile app backends. Therefore we                For N-day vulnerabilities, the top three mobile app categories
require the developers to disclose their affiliation with the     are entertainment, tools, and games.
target app before the analysis results are provided. Once
                                                                  Ownership. Table 4 presents the labels for the backends
a user is manually vetted, they can only submit apps that
                                                                  used by mobile apps. The most common label is hybrid,
they develop. We do not consider third-party SDKs in
                                                                  where 3,336 backends use hybrid infrastructure. The second

USENIX Association                                                                   28th USENIX Security Symposium         557
Vulnerabilties                               Labels
               Category              # Mob. Apps
                                                     # OS    # SS   # AS    # CS   Total   # B1st   # B3rd   # Bhyb   # Bukn    Total
               Books & Reference     332             15      49      55    71      190     365      653      501      354       1,873
               Business              145             5       22      10    37      74      93       258      150      113       614
               Entertainment         1,177           36      108     158   170     472     746      913      942      783       3,384
               Games                 1,283           34      81      147   106     368     290      804      651      444       2,189
               Lifestyle             363             20      50      79    72      221     262      665      311      237       1,475
               Misc                  199             6       21      45    46      118     76       422      163      105       766
               Tools                 792             19      84      184   115     402     729      796      812      464       2,801
               Video & Audio         689             24      46      89    98      257     267      648      434      357       1,706
               Total                 4,980           121     356     655   506     1,638   2,492    1,089    3,336    2,506     9,423

      Table 4: An overview of the vulnerable mobile apps per genre along with the raw counts of vulnerabilities and labels.

                Party
                          Vulnerable Component                               Operating System (OS). The OS component issues can
                          OS    SS    AS   CS       Total                    be summarized into two categories: legacy unsupported OS
                B1st      37   87       155   211   490                      or unpatched OS. The difference is that the legacy OS are
                B3rd      6    21       200   42    269
                Bhyb      47   150      154   184   535
                                                                             no longer supported by the vendor, hence vulnerabilities will
                Bukn      55   135      146   173   509                      not be addressed. We see from Table 6 that both Linux (vari-
                                                                             ous flavors) and Windows backends use expired lifecycle ver-
Table 5: Count of apps affected by vulnerabilities per cloud                 sions and 133 apps use these backends. The second most
layer and their corresponding labels.                                        common issue is the Windows Server vulnerability MS15-
                                                                             034 affecting 64 apps, which have patches by the vendor.
      Comp.     Vulnerability (Top 3)                        #Apps           Overall, the top three OS vulnerabilities listed in Table 6 af-
                Expired Lifecycle for Linux OS (various)     124             fect 197 mobile apps.
      OS        Windows Server RCE (MS15-034)                64                 We found the MS15-034 vulnerability affecting hybrid
                Expired Lifecycle for Windows Server         9               backends (Bhyb ) that run on Amazon AWS, Akamai, OVH, Go
                Vulnerable PHP Version                       357             Daddy, Digital Ocean, and other smaller hosting providers.
      SS        Expired Lifecycle for Web Server (various)   181
                Vulnerable Apache Version                    76              Further, some of the backends appear on CDN networks, like
                XSS (various)                                262             Akamai, Fastly, and CloudFlare, that offer “EdgeComput-
      AS        SQLi (various)                               160             ing” services [36] which provide web app accelerator ser-
                XXE (various)                                86
                                                                             vices. This insight shows that some developers who deploy
                Support for Vulnerable SSL Version 2 and 3   997
      CS        OpenSSH Bypass (CVE-2015-5600)               16              vanilla versions of Windows Server OS are not maintaining
                Vulnerable OpenSSL (various)                 15              them. In Table 5 the first-party OS component has 37 vulner-
                                                                             able backends, which is much higher than third-party back-
Table 6: The top three vulnerabilities found per cloud layer                 ends (6). App developers who run and maintain their own
along with the number of affected mobile apps.                               backends (B1st ) have to be mindful of these bugs, which in
                                                                             some cases require provisioning new backends with newer
                                                                             OSes causing incompatibilities with existing services (SS)
largest is first-party with 2,492 backends followed by third-                and applications (AS). SkyWalker can inform the developer
party with 1,089 backends. There are 2,506 backends that                     of these issues and report mitigation strategies.
we were not able to label due to ambiguities, but we labeled
approximately 73% of all backends we encountered.                            Software Services (SS). SkyWalker identified multiple vul-
   More important is providing remediation guidance to the                   nerabilities affecting a range of PHP versions, which can be
responsible party. Table 5 shows the mapping between the                     used to cause denial of service (CVE-2017-6004), disclose
backend labels and the vulnerable apps. We cannot say much                   memory content (CVE-2017-7890), disclose sensitive infor-
about the Unknown category since the vulnerabilities may                     mation (CVE-2016-1903), and execute arbitrary code (CVE-
belong to either first-party or hybrid categories. We observe                2017-11145). Backends of 357 mobile apps affected by PHP
that for first-party backends, the highest number of vulnera-                vulnerabilities, significantly higher than the other two vul-
ble apps are found in AS and CS components with 155 and                      nerabilities. Further, even though some mobile app back-
211 instances, respectively. Similarly, the hybrid backends                  ends had no 0-day vulnerabilities, an attacker can still craft
have 154 0-day vulnerability instances and 184 N-day in-                     special requests to trigger deep bugs within the interpreter to
stances. In general, we observe that the components that app                 compromise the backend. Although this might be a difficult
developers are responsible for (AS and CS in the B1st and                    task, recent advancement in vulnerability fuzzing [37] can
Bhyb ) have more vulnerabilities.                                            uncover these deep bugs.

558        28th USENIX Security Symposium                                                                                   USENIX Association
The second most common SS vulnerability was unsup-                            Language     # Backends    # 0-Days
ported versions of Apache web server (1.3.x and 2.0.x),                          PHP          108           284
Tomcat server (8.0.x), and Microsoft IIS web server (5.0).                       ASP.NET      13            33
                                                                                 PERL         4             9
Similar to unsupported OS, web server vendors will not is-                       JS           4             8
sue security patches for unsupported software, which affects                     JSP          2             5
backends of 181 mobile apps. For Apache web server ver-                          Unknown      72            316
sions less than 2.2.15, they are affected by several denial of
service bugs (CVE-2010-0408, CVE-2010-0434) and TLS               Table 8: The number of identified languages associated with
injection bug (CVE-2009-3555) affecting 76 mobile apps.           0-day vulnerable backends.
Additionally, Apache servers that use Apache Struts ver-
sions 2.3.5 - 2.3.31 or 2.5.10.1 and lower are vulnerable
to CVE-2017-5638, which allows remote code execution.             ends, followed by ASP.NET with 33 0-day instances affect-
The same Apache Struts vulnerability was reportedly used          ing 13 different backends. We note that this trend does
against Equifax’s hack [38]. In total, the top three SS vulner-   not mean causation. PHP is the most popular language
abilities affect 614 mobile apps.                                 used for web application development [39], hence it is ex-
                                                                  pected to represent more vulnerabilities by being more pop-
Applications (AS). Table 7 has a breakdown of the number          ular. Furthermore, we found 9 0-day instances in PERL, 8 in
of mobile apps, their number of install categories, and the       JavaScript (NodeJS), 5 in JSP, and the rest of the 316 could
instances of 0-day bugs affecting them. Although XSS is the       not be determined.
largest category with 503 instances followed by SQLi (215)
and XXE (46), we note that not all of the bugs have the same      Communication (CS).             All mobile apps rely on the
impact and some affect the same backend. For instance, an         HTTP/HTTPS protocol for communication with their back-
SQLi can be limited to an isolated instance of the app (e.g.,     ends. The binary analysis phase extracted a total of 17, 725
a container), which would limit the attack to disclosing in-      request messages from the 4, 740 mobile apps. The request
formation from the application database or modifying preex-       messages are split into HTTP (8, 118) request messages and
isting records. Moreover, XSS vulnerabilities often have less     HTTPS (9, 607) request messages. There are 446 mobile
impact than SQLi and XXE.                                         apps that only use HTTP communication and another set of
   XXE vulnerabilities affect web apps that use XML for           147 mobile apps that only use HTTPS communication. The
their API communication. The fundamental flaw that en-            remaining set of 4,147 apps mix between HTTP and HTTPS
ables XXE vulnerabilities to exist is a faulty implementation     communication.
of the XML parser. Based on our measurement, we found                Despite using HTTPS, over 20% of the backends (1,012)
1 XXE instance in the top 100M, 5 in the top 50M, 15 in           have issues with TLS/SSL configuration (e.g., insecure ses-
the top 10M, 9 in the top 5M, and 17 in the top 1M. Table 7       sion renegotiation and resumption) or unpatched software
shows the concentration of vulnerabilities found in lower         versions (e.g., SSL version 2 and 3). These flaws can be ex-
ranking apps. For example: 1 XXE and 3 XSS vulnerabil-            ploited by an attacker to carry out a MITM attack by down-
ities in the top 132 mobile apps; 4 SQLi, 10 XSS, and 5           grading the protocol negotiation using the POODLE [40] at-
XXE vulnerabilities in the next 131 mobile apps (though           tack. Additionally, the OpenSSH Bypass vulnerability ex-
still representing over 50M+ installs each). However, AS          poses the backend to compromise via SSH credential guess-
vulnerabilities are not confined to lower ranking apps but do     ing or secret key leak. The mobile apps using these vulnera-
affect higher ranking apps.                                       ble backends do not use the SSH service and to remediate one
                                                                  can turn off, patch, or block the incoming internet traffic to it.
         # Installs   # Apps   # SQLi   # XSS   # XXE                Those backends which only use HTTP expose users to
         1B           5        0        0       0                 eavesdropping and MITM attacks because it does not of-
         500M         11       0        0       0                 fer integrity or confidentiality. We manually inspected the
         100M         116      0        3       1                 request messages sent from 3,253 apps that use HTTP and
         50M          131      4        10      5
                                                                  found personally identifiable information (PII) such as name,
         10M          1,049    25       85      15
         5M           1,047    54       89      9                 gender, birth year, user ID, password, username, and country.
         1M           2,621    132      316     17                Additionally, we found device information like MAC, IMEI,
                                                                  SDK version, make/model, SSID, Wifi signal, cell signal,
Table 7: The number of 0-day vulnerabilities found per in-        screen resolution, carrier, root access, IP Address, and co-
stall category.                                                   ordinate location. Combining this information, a network
                                                                  attacker can identify individuals and attribute behavior pro-
  Table 8 shows the AS layer implementation language and          files to them. Furthermore, 6 apps we investigated perform a
associated vulnerabilities. AS implemented in PHP have the        password reset over HTTP. Interestingly, the Apple iOS App
most 0-days instances (284) affecting 108 different back-         Store enforces strict use of HTTPS through their App Trans-

USENIX Association                                                                     28th USENIX Security Symposium          559
1.0
                                                                                                                                                                                                                                  In general, an attacker has a larger attack surface for
      0.8
                                                                                                                                                                                                                               apps that have many backends. Figure 2 shows a CDF of
                                                                                                                                                                                                                               backends per mobile app. We can see that the majority of
      0.6                                                                                                                                                                                                                      the 5,000 apps studied have between one and 25 different
                                                                                                                                                                                                                               backends and in the worst case they have up to 203 different
      0.4
                                                                                                                                                                                                                               backends. We also observe that these backends reside in
      0.2                                                                                                                                                                                                                      diverse networks as shown in Figure 3, which means the
                                                                                                                                                                                                                               infrastructure set up for the backends will be different
      0.0
         0                                25                            50                     75                      100                125                       150                   175                         200      affecting the impact of the vulnerability.
                                                   Number of Backends Per Mobile App

                                                                                                                                                                                                                                                                                                                                     Vulnerable Backends
Figure 2: The figure shows the CDF of the number of back-                                                                                                                                                                                                                                                                            All Backends

                                                                                                                                                                                                                                  Count (Log Scale)
                                                                                                                                                                                                                                                           3
                                                                                                                                                                                                                                                      10
ends per mobile app.
                                                                                                                                                                                                                                                           2
                                                                                                                                                                                                                                                      10

port Security [41] model. We recommend that the Android
platform adopt the same restriction.                                                                                                                                                                                                                  10
                                                                                                                                                                                                                                                           1

                                                                                                                                                                                                                                                           0
                                                                                                                                                                                                                                                      10
5.3                     Impact on Mobile Application Users

                                                                                                                                                                                                                                                               US
                                                                                                                                                                                                                                                                    KR
                                                                                                                                                                                                                                                                         JP
                                                                                                                                                                                                                                                                              DE
                                                                                                                                                                                                                                                                                   IE
                                                                                                                                                                                                                                                                                        CN
                                                                                                                                                                                                                                                                                             RU
                                                                                                                                                                                                                                                                                                  FR
                                                                                                                                                                                                                                                                                                       NL
                                                                                                                                                                                                                                                                                                            SG
                                                                                                                                                                                                                                                                                                                 CA
                                                                                                                                                                                                                                                                                                                      TR
                                                                                                                                                                                                                                                                                                                           GB
                                                                                                                                                                                                                                                                                                                                HK
                                                                                                                                                                                                                                                                                                                                     IN
                                                                                                                                                                                                                                                                                                                                          TW
                                                                                                                                                                                                                                                                                                                                               TH
                                                                                                                                                                                                                                                                                                                                                    ES
                                                                                                                                                                                                                                                                                                                                                         PL
                                                                                                                                                                                                                                                                                                                                                              CZ
                                                                                                                                                                                                                                                                                                                                                                   BR
                                                                                                                                                                                                                                                                                                   Country Code
The overall impact for each vulnerability varies based on the
severity, the mobile app to backend usage, and the adversary                                                                                                                                                                   Figure 4: The figure shows the distribution of all mobile
capability/visibility. Although it is important to understand                                                                                                                                                                  backends and vulnerable backends across the world.
the impact of each vulnerability, it is not trivial to quantify
the impact of each vulnerable backend on mobile apps. For                                                                                                                                                                         Finally, the geographical distribution of the backends,
N-day vulnerabilities, an attacker can perform an internet-                                                                                                                                                                    shown in Figure 4, affect the impact on mobile apps. Many
wide scan to identify and attempt to compromise these                                                                                                                                                                          mobile apps deploy multiple backends that are geographi-
backends. Even once identified, these N-days span many                                                                                                                                                                         cally distributed to provide faster content for different user
different components (OS, SS, and CS) that have varying                                                                                                                                                                        segments. In some cases, the different backends may not
impacts on the backend from basic information disclosure                                                                                                                                                                       be fully synchronized in terms of the latest software patches
to a full system compromise. For 0-day vulnerabilities the                                                                                                                                                                     for OS, SS, AS, and CS layers, which results in a vulnerable
attack impact varies based on the exploit type (SQLi, XSS, or                                                                                                                                                                  backend affecting only a segment of users for a particular
XXE) and how the backend infrastructure is set up. More-                                                                                                                                                                       mobile app. Directly quantifying the impact of each vulner-
over, how the mobile app uses the backend directly impacts                                                                                                                                                                     ability is an involved task and depends on many variables
the severity of the vulnerability. For example, if a mobile                                                                                                                                                                    such as the severity of the vulnerability, the mobile app to
app uses app slicing [3] or downloads additional libraries                                                                                                                                                                     backend usage, the adversary capability, and other nuance
from the mobile backend, an attacker who compromises the                                                                                                                                                                       factors (number of backends per app, network distribution,
backend can modify the content and attain code execution                                                                                                                                                                       and geographical distribution). We plan to perform a com-
on the mobile device.                                                                                                                                                                                                          prehensive analysis to understand this impact as future work.

                                                                                                                                                                                                                               5.4                     Vulnerability Disclosure, Bug Bounties,
   Count (Log Scale)

                            3
                       10
                                                                                                                                                                                                                                                       And In The Wild Threats
                                                                                                                                                                                                                               During our disclosure process, we identified two mobile
                                                                                                                                                                                                                               platform vendors that have a bounty program, namely
                            2
                       10
                                                                                                                                                                                                                               Unity3D [42] and Simpli.fi [43]. In addition, the top third-
                                                                                                                                                                                                                               party platform providers, Google, Facebook, Crashlytics,
                                                                                                                                                                                                                               and Flurry, all participate in or run their own bug bounty
                                amazon
                                         akamai
                                                  google
                                                           cloudflare
                                                                        KR tel
                                                                                 microsoft
                                                                                             fastly
                                                                                                      ovh

                                                                                                                                         yahoo
                                                                                                            leaseweb
                                                                                                                       dig-ocn
                                                                                                                                 adobe

                                                                                                                                                 hetzner
                                                                                                                                                           linode
                                                                                                                                                                    lg corp
                                                                                                                                                                              JP tel
                                                                                                                                                                                       yandex
                                                                                                                                                                                                level 3
                                                                                                                                                                                                          softlayer
                                                                                                                                                                                                                      cdnets

                                                                                                                                                                                                                               programs. Similarly, the cloud providers either run their
                                                                                       Organization (Network)                                                                                                                  own program or use a third-party bug bounty program like
                                                                                                                                                                                                                               Bugcrowd [44] or Bounty Factory [45]. We submitted our
Figure 3: The figure shows the distribution of backends                                                                                                                                                                        vulnerability disclosures through their bounty management
across internet networks.                                                                                                                                                                                                      program (e.g., HackerOne [46]) and received confirmation
                                                                                                                                                                                                                               of the bugs.

560                    28th USENIX Security Symposium                                                                                                                                                                                                                                                                            USENIX Association
For smaller third-party and first-party developers, they did          Label          Backend                Use
                                                                                                                          Vulns.
not have a formal way to contact them to report vulnerabil-                                                              AS CS
ities. We followed a tiered approach in our notification by                      api-news.dailyhunt.in      App Data     0   1
                                                                                                            App Data &
first notifying the app developer directly using the contact             Bhyb    acq-news.dailyhunt.in
                                                                                                            Telemetry
                                                                                                                         2   1
information in the Play Store. Our second attempt to report                      bcdn.newshunt.com          CDN          0   1
the vulnerability is by contacting the domain owner using the                    acdn.newshunt.com          CDN          0   1
WHOIS information and following the mitigation strategy.                         fonts.gstatic.com          CDN          0   0
Our third attempt to report the vulnerability is by contacting                   e.crashlytics.com          Telemetry    0   0
                                                                         B3rd    settings.crashlytics.com   Telemetry    0   0
Google directly through their issue tracker portal. For parties                  t.appsflyer.com            Ads          0   0
that did not confirm or respond to our multiple attempts, we                     api.appsflyer.com          Ads          0   0
reported the vulnerabilities to US-CERT [47].
                                                                    Table 10: A list of backends and issues found for the mobile
              Component               # IP Scanners                 app Dailyhunt.
              Operating System (OS)   341,521
              Services (SS)           445,908
              Application (AS)        206,533
                                                                    api-news.* domain registers the device and requests content,
                                                                    where the acq-news.* backend captures user behavior and
Table 9: Number of IPs observed scanning the internet for           offers promotion and the actual content is delivered by the
vulnerabilities reported by Greynoise.io [35] over a period         two CDN domains acdn.* and bcdn.*.
of a year (Sept 2017 to Sept 2018).                                    We were not able to fingerprint the OS and the SS be-
                                                                    cause the Akamai servers respond only to web app spe-
   The N-day vulnerabilities we found are discoverable              cific responses, i.e., minimal header and banner informa-
and easy to exploit due to the availability of fast internet        tion. Nonetheless, we found two 0-day vulnerabilities in the
scanners like ZMap [48] and MASSCAN [49]. We argue                  acq-news.* backend on the same API interface. Since this
that it is a matter of time until these vulnerabilities are found   web application is specific to this mobile app, we looked for
and exploited. Table 9 shows the number of active scans             other apps published by the same developer. We found that
detected on the internet through Greynoise [35] honeypots           the eBooks by Dailyhunt app (which has over 500K installs
over a period of one year (Sept 2017 to Sept 2018). There           but does not rank in the top 5,000 apps) also uses the same
are 341,521 unique IPs scanning for OS related vulnerabil-          vulnerable API interface. Additionally, the mobile apps use
ities, 445,908 unique IPs scanning for SS vulnerabilities,          HTTP to communicate with the hybrid backends and HTTPS
and 206,533 unique IPs scanning for AS vulnerabilities.             to communicate with third-party services.
Many of these scans target N-day vulnerabilities, while                As for the third-party services, we did not find any vul-
scans for 0-day vulnerabilities cannot be accounted for.            nerabilities. The third-party backends serve requests on port
Nonetheless, past events demonstrate that attackers are             443 (HTTPS). The appsflyer.com backend is a service for
prone to scan for and exploit 0-day vulnerable web apps like        ad analytics that provides different functions using the same
Wordpress [50], Drupel [51], and PHPMyAdmin [52] when               interface. The t.appsflyer.com backend is a telemetry end-
publicly disclosed. Furthermore, a recent report [53] also          point for the ad network and the api.appsflyer.com backend
pointed out that the number of vulnerabilities in web apps          authenticates and associates the app with its profile.
increased in 2018 and that support for PHP version 5.x and
7.x will end in 2019, which means we can anticipate more            Takeaway. This case highlights several challenges to secur-
unpatched and exposed backends in the future.                       ing mobile app backends. First, backends are heterogeneous
                                                                    and differ across their software stack, topology setup, config-
                                                                    uration, and custom application. Second, outsourcing cloud
6     Case Studies                                                  management and provisioning (e.g., to cloud providers and
                                                                    CDNs) benefits security but comes with a lack of visibility,
6.1    Case Study 1: Vulnerable Web App                             limited per-app customization, and unclear incident liability.
The mobile app “Dailyhunt” has more than 50M+ installs              Third, vulnerabilities can exist (and be scanned for) in any
and is part of the “Books & Reference” category. The mo-            software layer of the cloud and API interface on the web
bile app interacts with nine different backends as shown in         app, which makes them challenging to identify and fix. Un-
Table 10. The backends are split into two labels, hybrid and        fortunately, app developers do not have the resources, time,
third-party. The hybrid backends are hosted on Akamai’s             or personnel to fulfill this task. Using SkyWalker, we aim to
EdgeComputing [36] and run a custom web app to serve                provide guidance to where the most pressing issues exist and
the mobile app. The hybrid backends are used for CDN,               map them to responsible parties as shown in Table 2.
telemetry, and requesting app-specific data. Specifically, the

USENIX Association                                                                       28th USENIX Security Symposium            561
App Name                              # Reviews    # Installs   We have notified the developers about these findings and
   com.icegame.fruitlink                 332,907      50M+         awaiting remediation.
   com.unbrained.wifipasswordgenerator   151,518      10M+
   com.magdalm.wifimasterpassword        148,355      10M+         Takeaway. This case highlights multiple vulnerabilities,
   com.unbrained.wifipassgen.app         43,824       1M+          0-day and N-day, that affect three of the four software
   com.magdalm.freewifipassword          35,552       1M+
                                                                   layers. This mobile platform collects sensitive information
   apps.ignisamerica.gamebooster         23,725       500K+
   com.icegame.crazyfruit                23,631       1M+          about user behavior, including PII and device information.
   com.magdalm.wifipasswordpro           22,113       1M+          Unfortunately, these backend vulnerabilities are inherited by
   apps.ignisamerica.bluelight           16,659       500K+        multiple apps and developers, and the app developers cannot
   com.icegame.fruitsplash2              15,193       1M+
                                                                   immediately remediate the vulnerabilities in third-party
                                                                   services. The mitigation strategy for the app developer is
Table 11: A list of the top 10 mobile apps using the appnext       to report (r) these findings to appnext or migrate (m) their
platform.                                                          app to a different service. SkyWalker helps us label the
                                                                   backends, identify the vulnerability, and guide the developer
                                                     Vulns.        to a clear action (report or migrate).
      Label       Backend           Usage
                                              OS      AS CS
              admin.appnext.com    App Data   0       1    0
      B3rd
              global.appnext.com   App Data   0       0    0       7     Mitigation
              cdn.appnext.com      CDN        1       0    1
              cdn3.appnext.com     CDN        1       0    1
                                                                   The goal of our empirical analysis was to bring attention
                                                                   to this overlooked problem in mobile backends, but also to
Table 12: List of backends and vulnerable layers found in
                                                                   provide guidance to app developers for building or choosing
the appnext platform.
                                                                   secure backends. In this section, we discuss the general
                                                                   mitigation strategies which SkyWalker recommends for app
6.2     Case Study 2: Vulnerable Platform                          developers and to help improve the security posture of their
                                                                   app backends.
The appnext [54] platform integrates with mobile apps to in-
gest user behavior telemetry and provide predictive actions
                                                                   7.1    Remediation Strategies
that users might perform. Developers use this to upsell sub-
scription, ads, or recommend actions to app users. The app-        App developers who rely on first-party backends have to up-
next platform is used by 6 mobile apps from the top 5,000          grade, patch, and block as needed for each software layer
free apps. We analyzed all apps by the same developers that        on their backend. If they rely on third-party backends they
are not in the top 5,000 and found 140 additional apps using       can report the issue or migrate their backend to a more se-
the appnext platform. The top 10 most reviewed apps using          cure provider. Ambiguity arises when the backend is hosted
the appnext platform can be found in Table 11. The top app         by a cloud provider, a hybrid type backend. To resolve these
has 332,907 reviews and over 50M+ installs. These numbers          issues we further generalize the hybrid backends into IaaS
give us an indication of the the platform’s significant popu-      (cloud provider manages the virtual HW ) and PaaS (cloud
larity and daily use.                                              provider manages HW , OS, and SS).
   The appnext platform backends (shown in Table 12) are
labelled as third-party, because the backends are found in an                                 Hybrid
                                                                              Strategies   HW  OS    SS     AS     CS
SDK library. We found two CDN domains that point to the
                                                                              Upgrade           4    4             44
same server IP, which are hosted on Limelight Networks, a                     Patch             4    4      44     44
CDN provider. This CDN backend is vulnerable to an OS                         Block                  4              4
integer overflow in the HTTP protocol stack (MS15-034)                        Report       44   4    4
                                                                              Migrate      44   4    4
that can be remotely exploited. Further, the CS still offers
SSLv2 and SSLv3, which are vulnerable to insecure padding          Table 13: A mapping of mitigation strategies for developers
scheme for CBC cipher. appnext’s admin.* and global.* do-          hosting their hybrid backend on infrastructure (IaaS) or a
mains run on Amazon AWS and provide app-specific data,             platform (PaaS).
like authentication, telemetry ingestion, predictive actions,
and configuration. The infrastructures run Microsoft Win-
                                                                      Table 13 provides developers with a guideline on how to
dows Server 2008 R2 for the OS, Microsoft-IIS/7.5 for its
                                                                   mitigate vulnerable hybrid backends. For example: if the
web server (SS), and the CS uses HTTPS. The application
                                                                   hybrid backend is using a cloud provider’s platform offer-
(AS) backend is a custom web application that is written in
                                                                   ing, developers should report and/or migrate their backend
ASP and uses the ASP.NET framework. The AS has a vulner-
                                                                   if the vulnerabilities are found in HW , OS, SS and upgrade
ability that allows an attacker to run arbitrary SQL queries.
                                                                   or patch if the vulnerabilities CS or AS related, respectively.

562    28th USENIX Security Symposium                                                                       USENIX Association
You can also read