Android Malware Detection using Large-scale Network Representation Learning - arXiv

 
CONTINUE READING
Android Malware Detection using Large-scale Network Representation Learning - arXiv
Android Malware Detection using Large-scale Network
                                                                   Representation Learning
                                                              Rui Zhu, Chenglin Li, Di Niu                                                    Hongwen Zhang, Husam Kinawi
                                                                      University of Alberta                                                         Wedge Networks Inc.
                                                                     Edmonton, AB, Canada                                                           Calgary, AB, Canada
                                                                 {rzhu3,ch11,dniu}@ualberta.ca                                       {hongwen.zhang,husam.kinawi}@wedgenetworks.com

                                         ABSTRACT                                                                                    applications, but also posed challenges to defending attacks from a
                                         With the growth of mobile devices and applications, the number                              proliferation of malware (short for malicious software). Due to a lack
                                                                                                                                     of trustworthy review methods, it is possible that some developers
arXiv:1806.04847v1 [cs.CR] 13 Jun 2018

                                         of malicious software, or malware, is rapidly increasing in recent
                                         years, which calls for the development of advanced and effective                            may upload their Android apps with malicious components, which
                                         malware detection approaches. Traditional methods such as signa-                            can be found in a number of third-party Android markets, and even
                                         ture based ones cannot defend users from an increasing number                               in Google’s official Android market, Google Play. According to a
                                         of new types of malware or rapid malware behavior changes. In                               report [22], the quantity of mobile malware detected in 2016 was
                                         this paper, we propose a new Android malware detection approach                             about 18.4 million, representing an increase of 105% from that in
                                         based on deep learning and static analysis. Instead of using Appli-                         2015.
                                         cation Programming Interfaces (APIs) only, we further analyze the                               To protect users from malware threats, a number of anti-malware
                                         source code of Android applications and create their higher-level                           solution providers (e.g., Norton, MacAfee, Symamtec, Kingsoft) pro-
                                         graphical semantics, which makes it harder for attackers to evade                           vide software products as a major means of defence. Their products
                                         detection. In particular, we use a call graph from method invoca-                           typically use the signature-based method to detect threats. In this
                                         tions in an Android application to represent the application, and                           method, a unique signature is generated from a known type of mal-
                                         further analyze method attributes to form a structured Program                              ware, such that malware detection is to match a suspicious app with
                                         Representation Graph (PRG) with node attributes. Then, we use a                             existing signatures in the maintained database. However, the attack-
                                         graph convolutional network (GCN) to yield a graph representa-                              ers can easily evade detection, for example, by changing signatures
                                         tion of the application by embedding the entire graph into a dense                          using code obfuscation or repackaging. To overcome these limita-
                                         vector, and classify whether it is a malware or not. To efficiently                         tions of the signature-based method, the heuristic-based method
                                         train such a graph convolutional network, we propose a batch train-                         was introduced in the late 1990s, which operates based on explicit
                                         ing scheme that allows multiple heterogeneous graphs to be input                            rules crafted by security analyst experts. However, such rules are
                                         as a batch. To the best of our knowledge, this is the first work to                         prone to biases of human expertise; it is also hard to generate rules
                                         use graph representation learning for malware detection. We con-                            to match the speed of malware creation.
                                         duct extensive experiments from real-world sample collections and                               To overcome these challenges, there is an emerging trend of
                                         demonstrate that our developed system outperforms multiple other                            developing automatic malware detection methods using machine
                                         existing malware detection techniques.                                                      learning. These techniques are capable of classifying previously
                                                                                                                                     unseen malware samples as well as identifying the malware families
                                         KEYWORDS                                                                                    of malicious samples. In these systems, detection has two phases:
                                                                                                                                     feature extraction and classification. In the first phase, various
                                         Android Malware Detection; Call Grpah; Graph Convolution Net-
                                                                                                                                     features such as API calls, binary strings, are extracted from the
                                         works
                                                                                                                                     original file samples. In the second phase, machine learning is used
                                                                                                                                     to automatically categorize the file samples into several classes
                                         1     INTRODUCTION                                                                          based on feature representation. Different machine-learning-based
                                         Recent years have witnessed the rapid growth of smart phone usage                           malware detection methods differ in both phases.
                                         in daily life, e.g., for online shopping, online banking, entertainment,                        In this paper, instead of only using API calls as features, we
                                         and even for remote control. As the major operating system for                              further analyze the control flow graphs (CFGs) that represent the
                                         smart phones, Android is now powering tablets, TVs, wearable                                control flows of Android applications. CFGs are widely used in
                                         devices and even embedded systems in cars and IoT devices. The                              software analysis and have been widely studied in the literature,
                                         large market share of Android and its open sourced development                              since it not only provides information of API calls, but also reveals
                                         ecosystem has not only brought about opportunities for Android                              how these API calls interact in the application. Since some APIs are
                                                                                                                                     more security sensitive than others, we further extract features for
                                         Permission to make digital or hard copies of part or all of this work for personal or
                                         classroom use is granted without fee provided that copies are not made or distributed       all APIs, such as requested permissions or hardware resources, and
                                         for profit or commercial advantage and that copies bear this notice and the full citation   represent each Android application as a graph with node attributes.
                                         on the first page. Copyrights for third-party components of this work must be honored.
                                         For all other uses, contact the owner/author(s).
                                                                                                                                         To make classification decisions from such graph structures, we
                                         ,,                                                                                          use graph convolutional neural networks (GCNs), a generalization
                                         © 2018 Copyright held by the owner/author(s).                                               of classical CNN to handle graph structures. Convolutional neural
                                         ACM ISBN 978-x-xxxx-xxxx-x/YY/MM.                                                           networks (CNNs) have proven to be successful on a wide range of
                                         https://doi.org/10.1145/nnnnnnn.nnnnnnn
,,                                                                         Rui Zhu, Chenglin Li, Di Niu and Hongwen Zhang, Husam Kinawi

machine learning problems, including image classification, object                 • Activities are the entry points for interacting with the user.
detection, and deep reinforcement learning. However, in these prob-                 Activities handles events triggered by users and provide how
lems, data can be represented on a regular grid, e.g., pixels in digital            users navigate within and between apps.
images and states of game in Go are grids with fixed numbers of                   • Services are general-purpose entry point for keeping an
rows or columns. To overcome the challenges of representing graph                   application running in the background for all kinds of reasons.
structures for classification, we use GCNs to embed the derived                     For example, the user might listen to music, which is running
control flow graphs as points in a vector space for graph classi-                   in the background service, and use another application. In
fication. Since the input graphs vary in shapes and structures, it                  addition, components can be bounded to services to interact
is challenging to learn and train GCNs on arbitrary graphs. We                      with them, and even perform inter-process communication
propose a batch training algorithm to overcome this issue.                          (IPC).
   Our contribution in this paper is summarized as follows:                       • Broadcast receivers are components that enable the system
                                                                                    to deliver events to the application outside of a regular user
      • Novel feature representation: instead of using APIs or bi-                  flow, allowing the application to respond to system-wide
        nary OpCode (operation code) only, we extract control flow                  broadcast announcements. As broadcast receivers are also
        graphs (CFGs) for all Android applications at question, and                 well-defined entry points, the system can deliver broadcasts
        further analyze their API security sensitive attributes, includ-            event to apps that are currently not running. Mostly, they
        ing requested permissions and hardware resources. Based                     are originated from the system, and some are initiated from
        on these features, we represent each Android application as                 apps that usually used to notify other apps.
        a graph with node attributes, where a graph convolutional                 • Content providers are components managing a shared set
        network is subsequently used to classify and detect malware.                of application data that you can store in the file system, in a
      • Graph convolutional network with global context: Tra-                       SQLite database, on the web, or on any other persistent stor-
        ditional GCNs only consider information from graph and                      age location that your application can access. If the content
        node attributes. However, in Android malware detection, a                   provider allows, other apps can query or modify the data.
        wide range of contextual information can also be utilized. In
        this paper, we use various diverse information from manifest            All components must be declared in the application manifest
        files, which are included in all Android applications, as the       file before it can actually be used. Communications between differ-
        global contextual information, and extend the traditional           ent components are through intents and intent filters. Intents are
        GCNs to take these additional global features into account.         messaging objects that can be used to request actions from other
      • Batch training for GCNs: A graph classifier is hard to train,       application components. An intent-filter is an expression declared
        since input graphs can have arbitrary sizes and structures.         in the application manifest file that specifies the intent type that
        Unlike images, it is unreasonable to resize all input graphs        the component will receive.
        into a fixed shape. As the compatibility with diverse topolo-
        gies is necessary for convolutional operations on graphs, we        2.2     System overview
        propose a batch training algorithm to solve this issue.             An overview of our proposed malware detection system is shown
                                                                            in Fig. 1, which mainly consists of four components: unpacking,
                                                                            static analysis, feature extraction, and classification.
2     BACKGROUND AND SYSTEM OVERVIEW
In this section, we introduce preliminaries of the Android opera-               2.2.1 Unpacking and Decompiling. We firstly take a look at what
tion system, which are crucial for further data preprocessing and           is inside the apk file. Each apk file is actually a zipped file that
designing machine learning algorithms. Then, we will introduce              includes the application code, resources, assets, and manifest file.
how we design an end-to-end Android malware detection system                The manifest file plays the central role in Android apps. In this file,
from large-scale representation learning.                                   it contains various of important and security sensitive information,
                                                                            including components, permissions, hardware features, .etc. In fact,
                                                                            the Android system requires all apps to declare these information
2.1     Preliminaries                                                       in their manifest files; otherwise, they are not recognized by the
We firstly introduce some background of Android system and the              system and will be ignored.
components in Android application files (known as apk files). The               The manifest file is not sufficient to provide the whole picture
application in Android system are written in Java and executed              of an app. In addition to this, sources codes are our next and the
within a custom Java virtual machine, and each application package          major part to extract features. The original source codes included
is contained in a jar file with the extension of apk. Each Android          in apk files are dex codes, i.e., Dalvik executable files, that can be
application consists of many components of different types. These           interpreted by the Dalvik Virtual Machine (DVM). Unfortunately,
components are the essential building blocks of an Android applica-         the dex file is hard to understand and we need to convert it into
tion. Each component is an entry point through which the system             human-readable format as smali code, which is the intermediate
or a user can enter your application and applications interact via          code and can be obtained by disassembling from the dex file. For-
components. Therefore, it is essential to analyze the component             tunately, the nature of DVM provides tools for such decompiling
API for security concerns. There are four different types of app            purpose. In particular, we use APKTool [5] to unpack the apk file
components:                                                                 and decompile the dex files to smali codes.
Android Malware Detection using Large-scale Network Representation Learning                                                                           ,,

                                                                                                                                        or

      Unpacking                                    Static Analysis                    Feature Extraction                             Detection

  1. Unzipping                                  1. Scanning methods                   Extracting:                               Using graph
  2. Decompile                                  2. Constructing call                  1. Hardware                               convolution network
                                                graph                                 2. Permissions                            to detect malware
                                                                                      3. Intent filters
                                                                                      4. Suspicious API

                                                     Figure 1: An overview of the system architecture

   2.2.2 Construct Call Graph. As smali codes are readable and                     Execute(), cause programmer can also set constrains for this in-
close to original Java codes, we can analyze these codes to extract                vocation. We will ignore those constrain for our call graph and try
both statical and relational features for further usages. This is the              to conclude as many invocations as possible.
step called static analysis which aims at analyzing a program with-
out execution.                                                                        2.2.3 Node Feature Extraction. After construction of call graph,
                                                                                   the whole program is then represented by how methods invocate
                           com.fa.c.RootCommand: Execute()
                                                                                   each other, and each method is now represented by a node. In this
                                                                                   step, we further extract secure sensitive properties of each method
                                                                                   that act as node attributes of our constructed call graph. In other
                                                                                   words, our goal is to construct a graph with node attributes. In
                                          com.stericson.RootTools: getShell()      particular, we extract the following five types of attributes for each
   android.util.Log: d()
                                                                                   method:
                  java.lang.Throwable: getMessage()                                    • Method type: We categorize all methods into four types:
                                                                                         Android system API, third-party API, component methods and
                                        com.stericson.RootShell.execution.Shell:
                                                                                         others. Android system API includes all APIs listed in the
                                                        add()                            Android official references, and libraries provided by Java.
                                                                                         Third-party APIs include some widely used APIs, including
                 Figure 2: An example of call graph                                      Google Map APIs, Facebook APIs, Yahoo! Weather APIs.
                                                                                         Since all components should be declared in the manifest file,
                                                                                         all methods in such component classes are categorized as
   The major feature to represent an Android application is its call                     component methods. All other methods are categorized as
graph, which represents calling relationships between methods.                           others.
In this graph, each node represents a method and each edge, for                        • Hardware features: If a method requests some hardware
example (f , д), denotes a method invocation that method f calls                         resources, like camera, GPS, sensors, .etc, these hardware
method д. If there is a recursive call, meaning a method f calls itself,                 resources are considered as the method’s features.
a cycle will be used. Call graph is a good visualization of internal                   • Requested permissions: Like the previous one, if a method
structure of any kinds of computer programs and has been widely                          needs special permissions to execute, these permissions are
used in many fields. An example of a call graph is shown in Fig. 2.                      considered as the method’s features.
In this call graph there are five methods and four invocations. Each                   • Component permissions: Sometimes it is the component
node in the call graph is assigned with a unique ID, which consists                      class, not the method, that requests permissions. If so, all
of class name, method name, function arguments type and return                           methods in this component have permission features.
type. Call relations are represented as directed edges, for example                    • Intent filter: Like the previous one, if a component declares
the directed edge from com.fa.c.RootCommand: Execute() to                                some intent filters, these intent filters are features for all
android.util.Log: d() indicates that there is a invocation state-                        methods in this component class.
ment for android.util.Log: d() inside com.fa.c.RootCommand:
Execute() method. However we can not affirm this invocation will                     2.2.4 Classification. The outcome of previous steps is a graph
be executed when the APP is running com.fa.c.RootCommand:                          with node attributes. With proper transformation, we can apply
,,                                                                       Rui Zhu, Chenglin Li, Di Niu and Hongwen Zhang, Husam Kinawi

grpah convolutional networks (GCN) for malware detection. Tradi-              Finally, we introduce how we extract these features by static
tional methods involve hand-craft feature extraction on the top of        analysis. Most of these node attributes can be extracted from mani-
such graph structured data in order to measure the neighborhood           fest file, for example, intent filters. Requested permissions can be
of two nodes in the graph. In this paper, however, we turn to ap-         found in the tag uses-permissions and component permission
ply graph convolutional networks that are capable of end-to-end           can be found as an attribute android:permission. In our system,
training. In this approach, a graph will be embedded as a point           we use some offline tools to obtain system API permission map-
in a vector space, and classification can be done on such vector          ping, including official references [3] and using PSCout [8], then
space. A key benefit of this approach is that learning the mapping        assign component permissions to corresponding methods, and fi-
of embedding and the classification scheme can be done jointly.           nally assign all permissions shown in uses-permissions to the
                                                                          dummyMain node. Note that we only gather permissions or hard-
                                                                          ware features that a method requests, no matter where to collect
2.3     Additional details                                                them. A tricky thing is about hardware features. In Android, the
In the last part of this section, we put some details of our proposed     tag uses-feature is used to declare hardware or software features.
system. We start from construction of call graphs. For many reasons,      Sometimes we may not find any hardware features in manifest
generating a precise call graph is challenging for reason as follows.     file, since they are implicitly declared by permissions. A common
                                                                          practice is to use a tool aapt from Android SDK to determine what
     (1) When a calling statement is found, the binding between two       hardware features are declared.
         methods may be resolved at compilation time or runtime.
         An example is when a method is inherited from its parent
         class.                                                           3     A GRAPH APPROACH FOR MALWARE
     (2) Unlike Java programs, Android applications do not have a               DETECTION
         main method but multiple entry points instead. These entry       In this section, we exploit the graph representation of Android mal-
         points are implicitly called by the Android framework in the     ware samples provided by control flow graph, and propose the Call
         back end.                                                        Graph based Graph Convolutional Network (CG-GCN) for malware
     (3) Callbacks are prevalent in Android applications. There are       detection. We illustrate the overall architecture of our proposed
         some existing work like FlowDroid [7] to solve these issues.     model in Fig. 3. We firstly formulate the malware detection problem
         However, we found that these tools are quite complicated         as a classification problem. Then we apply graph convolutional net-
         and time consuming with limited benefits for malware de-         work (GCN) to solve it. To speed up training, we further propose a
         tection. Therefore, we use a simple yet effective call graph     batch training scheme that allows to simultaneously learn graph
         construction way by adding an additional dummyMain node          representation vectors in a batch.
         that connects to all methods listed in smali codes.

   Once the call graph is constructed, we need to extract features        3.1    Feature Transformation and Graph
for all nodes, including the dummyMain node. As discussed above,                 Representation Learning
each node contains four kinds of attributes, including method types,      So far, we obtain a number of call graphs as well their corresponding
permissions, and hardware features. Permissions are definitely most       node attributes. Recall that the nodes in a call graph are actually
security sensitive attributes; in fact, many operations need specific     methods, and each method may have certain permission to request,
permissions to execute and these permissions are granted by user          or hardware resources to use, .etc. Each node is associated with a
at installation, and malware samples are prone to request a special       set of such attributes and the empty set is also allowed here.
set of permissions. We actually have two types of permissions                 We now formulate the above idea as follows. A call graph is
from manifest file, namely, requested permissions and component           denoted by G = (V , E), and we use A to denote its adjacent matrix.
permissions. As the dummyMain node connects to all methods in             For each node v ∈ V , it is associated with a set Fv that extracted
this apk file, we assign requested permissions as its attribute, and      from the previous stage, and the goal of feature transformation
assign component permissions to corresponding methods in the              is to find a proper function ϕ(·) such that it can convert the node
component.                                                                attribute set Fv into a vector xv ∈ Rh0 , where h 0 is the dimension of
   Other attributes are also crucial for malware detection. For ex-       the destination vector space. By doing so, we can have a new matrix
ample, we also collect all intent filters since they can be used for      X such that the v-th row is xv , and the role of graph convolutional
eavesdropping specific intents. Malware samples are sensitive to a        network (GCN) is to classify the input tuple (A, X ) into categories
special set of system events, so intent filters can be hints.             malicious or benign.
   Note that we should be aware of a special set of APIs that can             Traditional approaches often involve carefully feature engineer-
lead to malicious behaviors without requesting permissions. For           ing techniques to design ϕ and measure local neighborhood struc-
example, cryptography functions in the Java library are considered        tures from A, and then we can use existing machine learning al-
as some math functions so no permissions needed. However, these           gorithms for non-structural data. However, these hand-craft ap-
functions can be used by malware samples for code obfuscation             proaches are inflexible and limited under the rapid changing trend
purpose, so unusual usage of these functions should be paid atten-        of Android malware samples. In fact, as we will see in Sec. 4, the
tion to. We will mark these type of functions as suspicious APIs, like    call graph structure can vary a lot, both in scale and in complexity,
what [6] did.                                                             and designing these features can be time-consuming.
Android Malware Detection using Large-scale Network Representation Learning                                                                                ,,

                           GCN Layer                           BN Layer                 Aggregation Layer        Combination Layer

                                                                                                                                     Softmax   Detection
                                                                                                                                                Result

                              |                             {z                      }
                                                        Layer Combo

                                       Figure 3: Architecture of Graph Convolution Network for Malware

   Recently, a surge of new approaches attempt to learn representa-           3.2       Deep Graph Convolutional Networks for
tion of the graph by learning a mapping that embeds nodes or entire                     Malware Detection
(sub)graphs as points in a vector space, which is usually in low-
                                                                              Unlike conventional network representation learning algorithms,
dimension. A good mapping should reflect the graph structure from
                                                                              which attempts to learn node representations in unsupervised learn-
geometric relationships among learned vectors in this space, which
                                                                              ing settings, learning the graph representation zG is quite challeng-
is called embeddings and can be used for further machine learn-
                                                                              ing and often should involve supervised learning setting. In the
ing tasks as feature inputs. In these approaches, representations of
                                                                              subsequent of this section, we will see how our proposed model
nodes and the whole graph (or subgraph) are no longer designed
                                                                              CG-GCN can be utilized for graph classification while learns low-
from kernel functions or other carefully engineered schemes; in-
                                                                              dimensional vectors for all graphs.
stead, we design algorithms that can automatically learn them.
                                                                                 The basic idea in graph neural networks is to generate node em-
   In this spirit, a good feature transformation should keep as
                                                                              bedding vector by iteratively aggregating vectors from its neighbor
much raw information as possible. Therefore, we consider one-
                                                                              nodes. The operations at each layer is illustrated in Fig. 5. In each
hot encoding scheme to convert sets into vectors. Specifically,
                                                                              layer, node v is associated with a hidden vector hv and let hv0 = xv
we denote S as the set of all possible values in Fv , and we have
                                                                              at the beginning. At layer k, the hidden vector of node v aggregates
S = {s 1 , . . . , s |S | }. Then, we assign a vector xv ∈ R |S | such that   hidden vectors from its neighbors as follows:
the i-th entry xv (i) = 1, if si is shown in Fv , and xv (i) = 0 vice                                          Õ
versa. Fig. 4 illustrates details of one-hot encoding that used in our                               h̃v =             hvk −1
                                                                                                                           ′ ,                   (1)
system.                                                                                                       v ′ ∈N(v)∪{v }

                                                                                                        hvk = σ (h̃v W k ),                            (2)
                                                                              where N (v) denotes the set of neighbors of node v, σ denotes the
                                                                              activation function at layer k, and W k is the weight matrix with
           A = {SEND_SMS, BIND_ADMIN, BLUETOOTH}                              size Rdk −1 × Rdk at layer k. Here we denote dk as the dimension
         +) B = {SEND_SMS, CHANGE_WIFI_STATE, NFC}                            of hidden vectors at layer k. By iteratively performing the above
   S = {SEND_SMS, BIND_ADMIN, BLUETOOTH, CHANGE_WIFI_STATE, NFC}              equation for all nodes at all layers, we can finally obtain node
                                                                              embedding vectors at the last layer K for all nodes. These final
                                                                              representation vectors are regarded as the embedding vectors: for
                                                                              node v, its embedding vector is defined as zv := hvK .

Figure 4: One hot encoding for node features. Here we use
different color blocks to represent different permissions. In
this example, node A requests three permissions, and node
B also requests three permissions. The block with color will
be encoded as 1 and the white block will be encoded as 0.
Obviously, such one-hot encoding scheme results in sparse
node features.

                                                                                 Figure 5: Aggregation of Graph Convolution Network
   Once we have the tuple (A, X ) on hand, the next task is to learn
                                                                                 Similar to X , we can juxtaposition all node embedding vectors as
representations of graph G that embeds G into a low-dimensional
                                                                              a embedding matrix Z . As each node aggregates from its neighbors,
vector space. More formally, the goal is to find zG ∈ Rd for all G            it also implies that the node features are propagating to further
given its adjacent matrix A and node feature matrix X , and zG will           nodes in the graph in deeper layers, and the formation of Z im-
be used for further classification. This is the role for GCN.                 plicitly depicts local neighborhood structure. After obtaining node
,,                                                                           Rui Zhu, Chenglin Li, Di Niu and Hongwen Zhang, Husam Kinawi

embeddings Z , we sum all the individual node embeddings in the               rewrite (1) as
graph to form the representation vector zG :                                                             H̃k = AHk −1 + Hk −1 ,
                                Õ
                         zG =      zv .                      (3)              since A only accounts for links with neighbor nodes. Denote  :=
                                   v ∈V                                       A + I , and let D̂ as the diagonal node degree matrix of  that are
   As discussed in previous section, we have some special informa-            used for normalization. Then, we can combine (1) and (2) as follows
tion that encoded as node feature of dummyMain node, including                [17]:
permission requests and hardware requests. These features provide                                                    1        1
a global context for other methods to learn node embedding vectors.                                Hk = σ (D̂ − 2 ÂD̂ − 2 Hk−1W k ).                    (5)
                                                                                                        1        1
For malware detection purpose, however, they are also important                  We denote à =    D̂ − 2 ÂD̂ − 2
                                                                                                          which is invariant to all layers. There-
features to discriminate whether this app is malicious. For this rea-         fore, we can simply precompute à before passing it into the neural
son, we create a shortcut from the node feature for dummyMain node            network, and at layer k we have:
to the last layer and the input vector for classification is actually
the concatenation vector of graph embedding vector and the node                                          Hk = σ (ÃHk −1W k ).                           (6)
feature vector for the dummyMain node. More formally, we denote
                                                                              In summary, the sequential training is firstly compute Ãi for the
xG as the row vector of the dummyMain node in X , and the vector
                                                                              i-th sample. Then, let it passes the GCN and we will get [zi , x i ] at
for graph classification is actually [zG , xG ], where [] denotes the
                                                                              the final representations. By calculating the loss on this individual
concatenation operator.
                                                                              sample, we can have its derivatives and in turn updates weights
   A deep GCN model is well suited for our malware detection
                                                                              using stochastic gradient descent (SGD) or its variant optimizers.
problem for following reasons. First, it enables us to capture struc-
                                                                                 Now let us extend the above idea from single sample training
tural information. A full malicious behavior in an app often reflects
                                                                              to batch training. Suppose we want to train m samples as a mini-
a long trait on the call graph. For example, when eavesdropping
                                                                              batch, denoted as (A1 , X 1 , y1 ), . . . , (Am , Xm , ym ). Similarly, we can
messages in a smartphone, a malware should firstly execute the
                                                                              precompute Ãi for all samples in this minibatch. As graphs can
API that can read messages, and then send it out. In call graph, this
                                                                              have various number of nodes, we try to concatenate all Ai the
simple action refers two sites: a source site that gets user’s message,
                                                                              minibatch as follows:
and a sink site that sends message out. Consider individual API calls
is not enough to analyze such malicious behaviors. Also, we can                                         Ã1                                
                                                                                                                     Ã2
                                                                                                                                           
obtain a graph embedding as well as node embedding using GCN.                                                                              
                                                                                                  Ã =                       ..            .           (7)
                                                                                                                                           
This means, we can simultaneously have representations for both
methods and the entire app, which encodes structural information
                                                                                                       
                                                                                                                                  .        
                                                                                                                                            
for both.
                                                                                                       
                                                                                                                                      Ãm 
                                                                              So now we can use the following way to calculate a minibatch of
3.3    Batch Training GCN                                                     graphs simultaneously:
Now we turn to introduce how we train GCN and propose our batch
                                                                                                                                         Hk1 −1 
training approach to speed it up. Suppose we have a training set                             ©  Ã1
                                                                                                                                  
                                                                                                            Ã2
                                                                                                                                        k −1     ª
D := {(Ai , X i , yi )} of n graphs, where (Ai , X i ) denotes the input                     ­                                         H
                                                                                                                                         2
                                                                                                                                                   ®
                                                                                      Hk = σ ­ 
                                                                                                                                                  k®
                                                                                                                     ..                      .   W ®.
                                                                                             ­                                    
tuple (A, X ) for the i-th graph in D. The GCN takes (Ai , X i ) as input                    ­                           .
                                                                                                                                   
                                                                                                                                   
                                                                                                                                        
                                                                                                                                            ..    ®
and we can obtain [zi , x i ] as the vector for graph classification. By                     ­                                                  ®
                                                                                                                             Ãm      k −1 
                                                                                                                                         Hm 
adopt the sigmoid loss, we obtain the optimization for embedding                             «                                                     ¬
parameters and discriminative classifier estimation as                        If we denote Ĥk −1 as the concatenation matrix of all Hki −1 , for
                               n
                            1Õ                                                i = 1, . . . , m, updating H̃k can be simply written as simple as
                   min −          −yi log(σ (⟨[zi , x i ], u⟩),        (4)
                  u, {W} n
                              i=1                                                                         Ĥk = σ (ÃĤk −1W k ),                        (8)
where u is the weight parameter for classification and {W} is the
collection of weight parameters for GCN. Here we use [zi , x i ] as           which is now similar to the sequential case. By iteratively perform-
the concatenation of graph representation vector zi for the i-th              ing the above equation, we can obtain node embeddings for all
input graph ands x i is the node feature of dummyMain node in the             nodes in all input graphs, and further obtain graph embeddings.
same graph. We can also add regularizer term in (4) to prevent from
overfitting.                                                                  4    EXPERIMENTAL RESULTS
   One of the greatest challenges for conducting convolution on               In this section, we evaluate the performance of our proposed CG-
graph-structured data is the difficulty of training graphs in a batch         GCN model on the Android malware detectoin task. We will first
[18]. Due to the irregular structure and shapes, some existing tech-          introduce the dataset of malware samples and clean files for this
niques in conventional CNN, like resizing or reshaping, are not               task. After that, to evaluate our modelâĂŹs efficiency, we will com-
suitable for GCN, which weakens the compatibility of GCN. Here                pare our model with a wide variety of existing machine-learning
we propose our approach for training GCN in batch mode.                       based malware detection approaches as well as some commercial
   Here we denote Hk as the matrix of hidden vectors whose v-th               anti-virus engines. Finally, we qualitatively evaluate the final rep-
row is the hidden vector for node v. From graph theory we can                 resentations learned from our proposed CG-GCN model.
Android Malware Detection using Large-scale Network Representation Learning                                                                 ,,

4.1    Experiment Setup                                                   Table 1: Performance metrics of Android malware detection
In this paper, we evaluate our algorithm on DREBIN dataset [6]
that contains 5, 560 malware files collected from August 2010 to                Metrics     Description
October 2012. All malware samples are labeled by one of 179 mal-                TP          # of malicious apps correctly detected
ware families. Along with these malware datasets, we also collect               TN          # of benign apps correctly classified
a number of real-world Android applications collected from the                  FP          # of false prediction as malicious
Internet. Resources of these files include Apkpure [4] with 5400                FN          # of false prediction as clean
samples, 700 samples from 360.com and over 13, 000 commercial                   ACC         (T P + T N )/(T P + T N + F P + F N )
applications from the HKUST Wake Lock Misuse Detection Project                  Precision   T P/(T P + F P)
[19]. In summary, we have collected 19, 100 real-world applications.            Recall      T P/(T P + F N )
    Although these Android applications are mostly collected from               F1          2 ∗ Precision ∗ Recall/(Precision + Recall)
well-known Android markets and research projects, we should                     ER          Error rate, which is 1 − ACC.
ensure whether they are clean. To do so, we uploaded all these                  F PR        False positive rate, F P/(T N + F P)
collected files to the VirusTotal service, a public anti-virus service          DF          Detection failure rate, F N /(T P + F N )
with 78 popular engines, and inspected scanning reports from the
VirusTotal service for each file. Each engine in VirusTotal would                    Table 2: Malware and clean file Datasets.
show one of three detection results: True for “malicious”, False
for “clean”, and NK for “not known”, respectively. If an application                    Dataset           DREBIN      Benign
has more than one True result, we label it as malware; otherwise,                       # samples         5,560       5,877
we label it as clean. As a result, only 16, 753 out of 19K collected                    # nodes (avg.)    9,590.23    28,973.35
samples passed all scanners on the VirusTotal service, and we take                      # nodes (max)     41,905      65,439
5, 877 samples from them as clean files for evaluation in this paper.                   # edges (avg.)    19,377.96   39,031.57
    Table 2 shows some statistics of the DREBIN Android malware                         # edges (max)     132,731     207,997
dataset and the dataset of clean files. A key observation is the highly
skewness of node and edge distributions among Android apk files.
Due to the highly diversity of Android application developers, the        Table 3: Size of extracted node feature sets on DREBIN
size can range from KB to GB on the Internet. In the dataset we           datasets.
use for our evaluation, the largest file in the DREBIN dataset is
29MB and in clean files is 62 MB. Such diversity will bring sever                            Feature set           DREBIN
challenges in training and learning GCN.                                                     Hardware features     86
    Table 4 shows the number of extracted features in Sec. 2. As                             Permissions           3,830
hardware resources are restricted by devices and Android system,                             Intent filters        9,317
its value set is quite small. Mostly used permissions are provided                           Total feature         13,233
by Android system as well, therefore it is expected to have many
overlap of permissions among apps. However, intent filters can be
easily created by users and developers, it has the most variety and       methods as well as our proposed one are conducted in the same
sparsity.                                                                 procedure.
    All experiments were conducted on a Compute Engine on Google             To evaluate our model, we firstly compare it with various machine-
Cloud with 4 cores and 16 GB RAM running Ubuntu 16.04. This               learning based malware detection algorithms. In particular, we com-
engine is also equipped with an NVidia Tesla P100 GPU to speed            pare the performance against the ones using static analysis without
up graph convolutional network, which is implemented on top               graph structure. For those baseline algorithms, we extract features
of TensorFlow [2]. We evaluate the Android malware detection              from static analysis of both manifest and source codes, which is
performance of different methods using the measures shown in              similar to Arp et al. [6] for all samples, except that we do not ex-
Table 1. One thing need to be noticed is that in security precision and   tract network addresses here. All features were encoded in one-hot
false positive rate are two most important evaluations for security       fashion as shown in Fig. 4.
system.                                                                      We compare with four other typical classification methods on
                                                                          these features, they are Random Forest (RF), Support Vector Ma-
                                                                          chine (SVM), and Naive Bayes (NB) with three kernels of variants:
4.2    Performance Evaluation on Benchmark                                Gaussian (NB-G), Bernoulli (NB-B) and Multinomial (NB-M). For
       Dataset                                                            RF, we set the maximum dept as 6 to trade off time and performance.
                                                                          For SVM, which is also the classifier used in [6], we use LibSVM in
In this experiment, we randomly select 80% of the data for training,
                                                                          our experiment and the penalty is set to 2.
and the rest 20% for testing. During training stage, all training
                                                                             The results of this experiment are shown in Table 4. In this table,
samples will be used to do 4-fold cross validation to train our model
                                                                          “GCN” refers to the algorithms without concatenating the feature
as well as tune the hyperparameters, and the testing samples are
                                                                          vector x, and “GCN+” refers to the one we introduced in Sec. 3.
only be used for performance evaluation at the testing stage. We
                                                                          From this table, we can clearly observe that our proposed GCN
repeat this procedure for 5 times and average results. All baseline
                                                                          significantly outperforms the other approaches by nearly 2.94%
,,                                                                         Rui Zhu, Chenglin Li, Di Niu and Hongwen Zhang, Husam Kinawi

                                                                                                                        ROC curves on AMD set
of precision in prediction. All the baseline algorithms have also                                  1.00
achieved good performance, most of them have a precision above                                     0.95
90%. However, without modeling the semantics between API nodes                                     0.90

                                                                              True Positive Rate
these algorithms will never get a comparable performance to our                                    0.85
GCN approach. This would even become more obvious when the                                         0.80
                                                                                                                                         RF (area = 0.988)
malware become more complicate.                                                                    0.75                                  NB-Multinomial (area = 0.998)
   The most significant improvement of GCN is the false positive                                   0.70
                                                                                                                                         NB-bernoulli (area = 0.997)
                                                                                                                                         GCN (area = 0.998)
rate, Table 4 shows that our proposed method drops the false posi-                                                                       SVM (area = 0.998)
                                                                                                   0.65
tive rate from 5% to 0.09%, corresponding to nearly 100 fewer false                                                                      NB-Gaussian (area = 0.936)
                                                                                                   0.60
alarms during evaluating 2346 samples, which is remarkable cause                                          0.00   0.05   0.10      0.15        0.20       0.25        0.30
                                                                                                                          False Positive Rate
false alarms have always be a big concern in security and would
cost considerable time and energy for system user to get rid of it.
Although Naive Bayes with Bernoulli kernel provides the highest             Figure 6: ROC curves for all the baseline algorithms and our
detection rate of 99.91% and best detection failure rate of 0.09%,          proposed GCN method.
such rate is at the expense of high false positive rate (10.12%), which
diminishes its effectiveness and overall performance. Note that in
                                                                            our experiments are actually labeled by these AV engines using the
this table, GCN+ outperforms GCN, which indicates that we can
                                                                            rule described in subsection 4.1. Therefore, AV engines are supposed
enhance detection performance by incorporating global contextual
                                                                            to have a better false positive rate than their normal performance.
information.
                                                                            Another thing is, even though we got the scan results of all 78 AV
   This can be attributed to the two characteristics of our model.
                                                                            engines from VirusTotal, here we just list the ones with the best
First, the input of Android apk files are transformed into call graphs,
                                                                            performance or ones that are already popular and widely used in
which provides a more detailed picture of Android applications
                                                                            security programs such as, Kaspersky, Cylance and McAfee.
like data flow. In contrast, traditional machine learning based mal-
                                                                               Table 5 lists parts of the AV engines’ scanning results on the
ware detection algorithm only accounts for static features. Second,
                                                                            testing split of our experiment dataset. Comparing to the result of
some features are more expressive in our model. We can consider
                                                                            our GCN method we can see that our method outperforms most of
permissions as an example. A malware sample may declare more
                                                                            the AV engines, with a precision of 99.91% and FPR of 0.06%. For
permissions than necessary in order to use remote server to control
                                                                            false positive rate, our method outperforms 7 out of 10 antivirus
a device. Traditional methods only scans the permissions that are
                                                                            engines, which is remarkable cause all the engines would have
actually used, while our model can explicitly learn the “permission
                                                                            better FPR on this dataset. And for most of the antivirus engines
distribution” in call graphs. When permissions are requested by
                                                                            that have a better recall or F 1 score than what our method provided
isolated methods, which are often executed by command & control
                                                                            would often end up with either much worse precision or higher FPR,
servers, such apk file is more likely to be malware samples. Such
                                                                            e.g. AV5, AV7 and AV10. Only several engines have a comparable
information can only be exploited by learning from structural data
                                                                            overall results, e.g. AV4 and AV2.
as our model does, of which traditional models are lack.
   Note that the thresholds of prediction malicious or benign appli-
cations in Table 4 are all set as 0.5. To further investigate detection
                                                                            4.4                      Qualitative evaluation of interpretable
performance under different threshold, we plot the receiver oper-                                    representations
ating characteristic curves (ROC curves) in Fig. 6. From Fig. 6, we         A core problem of machine learning is to interpret the trained
can see that given a certain false positive level, our detection rate       model. To qualitatively assess how much interpretable our model
is always higher than other methods in almost all region, which is          has, we randomly chose 2, 000 samples in our dataset, in which we
a significant improvement and means that we can get very high               have 1, 000 malware samples and 1, 000 clean samples. We firstly
malware detection rate and give little false alarms at the same time.       generate t-SNE plots [20] of all samples using feature vectors xG of
We also notice that the performance given by the classical “flat            their dummyMain nodes in Fig. 7(a). From this figure, we can clearly
features + SVM” model is relatively not bad. Actually this is reason-       see clusters of clean samples and malware samples, which implies
able, as the features are carefully extracted. However, our model           that the features we have extracted are highly relevant of detecting
provides a method to detect Android malware without manually                malware. However, the boundary is not separable. A number of
designed features: we only need to construct a call graph and the           points on the right half plane are entangled a lot, and any simple
method specifications like permissions and hardware resources. We           hyperplane or curves on this would fail to classify these points.
can also easily incorporate handcrafted features into our model for         This helps explain why using flat features for traditional machine
specific purpose by concatenating them with our embedded vector             learning algorithms are hard to improve performance.
at the last layer.                                                             By further exploiting graph structure from call graphs, our pro-
                                                                            posed CG-GCN model can simultaneously learn classification and
                                                                            graph representation. We generate t-SNE plots for all samples us-
4.3    Compare with AV engines                                              ing their embedding vectors zG in Fig. 7(b). At this time, malware
We also compared the performance of our malware detection al-               points and clean points are separable by a significant margin. An
gorithm with existing Anti-Virus engines on VirusTotal [1]. The             interesting finding is that clean points are clustered at the top, while
critical point to mention is that all of the ‘truly’ clean files used in    malware points are forming several small spirals that disconnected
Android Malware Detection using Large-scale Network Representation Learning                                                                   ,,

Table 4: Test result on DREBIN set. All values are in percentage for better readability. The boldface denotes the best algorithms
in terms of corresponding metric.

                                     Algorithm     GCN+      GCN       RF    SVM       NB-G     NB-B     NB-M
                                     Precision     99.91     98.83   95.75   97.93     88.46    90.32    94.54
                                       Recall      99.45     99.82   93.34   97.75     99.37    99.91    99.73
                                         F1        99.68     99.32   94.53   97.84      93.6    94.87    97.07
                                       ACC         99.69     99.34   94.75    97.9      93.4    94.75    97.07
                                        FPR         0.09      1.11    3.91    1.96     12.24    10.12     5.44
                                        DF          0.51     0.17    6.15    2.12      0.67     0.09      0.27
                                        ER          0.31      0.66    5.25     2.1       6.6     5.25     2.93

                                   Table 5: Performance of VirusTotal scanners on DREBIN test set.

                    Scanner      GCN       AV1     AV2       AV3      AV4      AV5      AV6      AV7      AV8    AV9     AV10
                   Precision%    99.91    99.91    99.64     62.84   99.91     59.03    99.63    50.09   99.84   97.78   61.42
                    Recall%      99.45    98.74    99.46     99.10   99.28    100.0     97.21    99.91   54.64   94.96   100.0
                      F1%        99.68    99.32    99.55     76.91   99.59    74.24     98.41    66.73   70.62   96.35   76.10
                     FPR%         0.09    0.089    0.357    58.073   0.089    68.778    0.357    98.66   0.089   2.141   62.27

with each other. This implies that the rich semantics encoded in call        Android apps will be represented by a graph with node features,
graph and node features can bring more information for malware               and a sparse vector consists of global contextual information.
detection.                                                                      Another trend of static analysis in Android security is to detect
                                                                             some specific malicious behaviors like privacy breach and over
5   RELATED WORK                                                             privilege. For example, [13, 16] goes through source code with a
                                                                             predefined source and sinks to find a potential private breach. Fu
In order to keep combating the increasing number of malicious
                                                                             et al. [12] attempts to protect from stealing private information by
applications, there have been a number of research studies on devel-
                                                                             examining all URL addresses in source codes. However, we note
oping Android malware detection system using machine learning
                                                                             that static taint-analysis and over privilege are prone to be false
and data mining, e.g., [6, 11, 15, 24, 25]. The major difference among
                                                                             positive.
them is on how to extract features from packed applications. One
                                                                                To differentiate malicious apps from clean ones, we use graph
category is to use dynamic analysis to capture API calls or envi-
                                                                             convolution neural network for machine learning on graph-structured
ronmental variables during execution and obtain the original codes
                                                                             data. Several convolutional neural network architectures have been
from packed Android applications. For example, DroidDophin [25]
                                                                             proposed for learning over graphs in recent years, most of them can
use DroidBox and APE to record thirteen activity features. An-
                                                                             be categorized as spectral graph convolutional neural networks. Its
other example is CopperDroid [23], which is a Virtual Machine
                                                                             seminal work was done by Bruna et. al. [9] and later by Defferrard
Introspection (VMI) based dynamic analysis system that extract
                                                                             et. al. [10] with fast localized convolutions. Kipf and Welling [17]
operating system interactions and process communications as fea-
                                                                             propose a first order approximation scheme to reduce the computa-
tures, in which both intra-process and inter-process are considered.
                                                                             tional costs the graph filter spectrum. One thing interesting of these
However, the coverage of dynamic analysis is limited since not all
                                                                             two works is that although they consider spectral convolution, all
malicious behaviors can appear in only one execution, so dynamic
                                                                             convolution operations in their papers are actually done in spatial
analysis usually takes long time.
                                                                             domain only, which is convenient to implement on various deep
   In contrast, static analysis focuses on analyzing the internal
                                                                             learning frameworks. A more recent work by Hamilton et al. [14]
components of an application, and it is able to explore all possi-
                                                                             further extends GCNs by considering more generic form of aggre-
ble execution paths in malware samples. For example, DroidMat
                                                                             gation functions and allow nodes to sample their neighborhoods.
[24] and DREBIN [6] performed static analysis on manifest file and
                                                                             In our application, we extend these existing works, which are fo-
source codes to extract multiple features including permissions,
                                                                             cusing on node embedding, to graph embedding models. Another
hardware resources and API calls, where the first uses k-means
                                                                             extension is that we further jointly train the deep GCN model with
clustering and k-NN classification and the later uses support vector
                                                                             a wide model for global contextual features. Finally, to efficiently
machine (SVM) to train the one-hot encoded feature vectors for An-
                                                                             train a large number of graphs with arbitrary shapes, we propose
droid malware detection. There are other classifiers in the literature.
                                                                             a batch training algorithm to allow multiple graphs as input in a
For example, Peiravian and Zhu [21] consider SVM, decision tree
                                                                             minibatch.
and ensemble classifiers. Different from existing works, we analyze
the method invocations to form a call graph and extract attributes
for all methods in an application, which provides a more complete
picture of the application. Based on these extracted features, the
,,                                                                                         Rui Zhu, Chenglin Li, Di Niu and Hongwen Zhang, Husam Kinawi

                     t-SNE view of feature representation                                                 t-SNE view of feature representation
                                                                                              15
          15
                                                                                              10
          10
                                                                                                5
            5
                                                                                                0
            0

          −5                                                                                  −5

        −10                                                                                 −10
                                                                    Malicious                                                                                clean
        −15                                                                                 −15
                                                                    Benign                                                                                   malware
        −20
          −30          −20         −10          0          10         20          30                −20            −10               0             10              20
                                    (a) Before embedding                                                                 (b) After embedding

Figure 7: Scatter plot of app files before embedding and after embedding. (a) Scatter plot of sparse representations from
dummyMain node feature vectors. (b) Scatter plot of final representations zG learned from our proposed GCN.

6    CONCLUSIONS                                                                             [8] Kathy Wain Yee Au, Yi Fan Zhou, Zhen Huang, and David Lie. 2012. Pscout:
                                                                                                 analyzing the android permission specification. In Proceedings of the 2012 ACM
In this paper, we present an Android malware detection frame-                                    conference on Computer and communications security. ACM, 217–228.
work based on deep graph convolutional networks. Instead of using                            [9] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2013. Spectral net-
                                                                                                 works and locally connected networks on graphs. arXiv preprint arXiv:1312.6203
API calls only, we utilize static analysis to generate call graphs                               (2013).
and method attributes to represent Android applications. Such fea-                          [10] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolu-
ture representation not only provides higher-level sematics but                                  tional neural networks on graphs with fast localized spectral filtering. In Advances
                                                                                                 in Neural Information Processing Systems. 3844–3852.
also includes detailed execution information that makes attackers                           [11] Marko Dimjašević, Simone Atzeni, Ivo Ugrina, and Zvonimir Rakamaric. 2016.
hard to evade the detection. Based on the extracted features, we                                 Evaluation of android malware detection based on system calls. In Proceedings of
present a novel Android malware detection framework based on                                     the 2016 ACM on International Workshop on Security And Privacy Analytics. ACM,
                                                                                                 1–8.
graph convolutional networks. We extend existing convolutional                              [12] Hao Fu, Zizhan Zheng, Somdutta Bose, Matt Bishop, and Prasant Mohapatra. 2017.
networks for graph classification and incorporate global contextual                              Leaksemantic: Identifying abnormal sensitive network transmissions in mobile
                                                                                                 applications. In INFOCOM 2017-IEEE Conference on Computer Communications,
information that extract from manifest files. To further enhance                                 IEEE. IEEE, 1–9.
training efficiency, we propose a batch training algorithm that en-                         [13] Clint Gibler, Jonathan Crussell, Jeremy Erickson, and Hao Chen. 2012. Androi-
ables multiple various shapes of graphs as a input minibatch. To                                 dLeaks: automatically detecting potential privacy leaks in android applications
                                                                                                 on a large scale. In International Conference on Trust and Trustworthy Computing.
the best of our knowledge, this is the first work to use GCN for                                 Springer, 291–307.
Android malware detection. A comprehensive experimental study                               [14] Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation
on the real sample collections is performed to compare various mal-                              learning on large graphs. In Advances in Neural Information Processing Systems.
                                                                                                 1025–1035.
ware detection approaches, and results reveal that our algorithm                            [15] Shifu Hou, Yanfang Ye, Yangqiu Song, and Melih Abdulhayoglu. 2017. Hindroid:
outperforms state-of-the-art techniques.                                                         An intelligent android malware detection system based on structured heteroge-
                                                                                                 neous information network. In Proceedings of the 23rd ACM SIGKDD International
                                                                                                 Conference on Knowledge Discovery and Data Mining. ACM, 1507–1515.
REFERENCES                                                                                  [16] Jinyung Kim, Yongho Yoon, Kwangkeun Yi, Junbum Shin, and SWRD Center.
                                                                                                 2012. ScanDal: Static analyzer for detecting privacy leaks in android applications.
 [1] [n. d.]. VirusTotal. ([n. d.]). https://www.virustotal.com/#/home/upload [Online;           MoST 12 (2012).
     accessed 9-May-2018].                                                                  [17] Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph
 [2] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey                  convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
     Dean, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In           [18] Ruoyu Li, Sheng Wang, Feiyun Zhu, and Junzhou Huang. 2018. Adaptive Graph
     12th USENIX Symposium on Operating Systems Design and Implementation (OSDI                  Convolutional Neural Networks. arXiv preprint arXiv:1801.03226 (2018).
     16). USENIX Association, 265–283.                                                      [19] Yepang Liu, Chang Xu, Shing-Chi Cheung, and Valerio Terragni. 2016. Under-
 [3] Android API Reference. [n. d.]. https://developer.android.com/reference/index.html.         standing and detecting wake lock misuses for android applications. In Proceedings
     ([n. d.]).                                                                                  of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software
 [4] Apkpure.com. 2018. apkpure. (2018). https://apkpure.com/ [Online; accessed                  Engineering. ACM, 396–409.
     9-May-2018].                                                                           [20] Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE.
 [5] APKTool. [n. d.]. https://ibotpeaches.github.io/Apktool/. ([n. d.]).                        Journal of machine learning research 9, Nov (2008), 2579–2605.
 [6] Daniel Arp, Michael Spreitzenbarth, Malte Hubner, Hugo Gascon, Konrad Rieck,           [21] Naser Peiravian and Xingquan Zhu. 2013. Machine learning for android malware
     and CERT Siemens. 2014. DREBIN: Effective and Explainable Detection of An-                  detection using permission and api calls. In Tools with Artificial Intelligence
     droid Malware in Your Pocket.. In Ndss, Vol. 14. 23–26.                                     (ICTAI), 2013 IEEE 25th International Conference on. IEEE, 300–305.
 [7] Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bar-         [22] Symantec            Internet         Threat         Report.           [n.        d.].
     tel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. 2014.               https://www.symantec.com/content/dam/symantec/docs/reports/istr-22-
     Flowdroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint         2017-en.pdf. ([n. d.]).
     analysis for android apps. Acm Sigplan Notices 49, 6 (2014), 259–269.
Android Malware Detection using Large-scale Network Representation Learning                 ,,

[23] Kimberly Tam, Salahuddin J Khan, Aristide Fattori, and Lorenzo Cavallaro. 2015.
     CopperDroid: Automatic Reconstruction of Android Malware Behaviors.. In
     NDSS.
[24] Dong-Jie Wu, Ching-Hao Mao, Te-En Wei, Hahn-Ming Lee, and Kuo-Ping Wu.
     2012. Droidmat: Android malware detection through manifest and api calls
     tracing. In Information Security (Asia JCIS), 2012 Seventh Asia Joint Conference on.
     IEEE, 62–69.
[25] Wen-Chieh Wu and Shih-Hao Hung. 2014. DroidDolphin: a dynamic Android
     malware detection framework using big data and machine learning. In Proceedings
     of the 2014 Conference on Research in Adaptive and Convergent Systems. ACM,
     247–252.
You can also read