Rethinking Antivirus: Executable Analysis in the Network Cloud

Page created by Warren Greene
 
CONTINUE READING
Rethinking Antivirus: Executable Analysis in the Network Cloud
                               Jon Oberheide, Evan Cooke, Farnam Jahanian
                         Electrical Engineering and Computer Science Department
                               University of Michigan, Ann Arbor, MI 48109
                                 {jonojono, emcooke, farnam}@umich.edu

A BSTRACT                                                        host-based antivirus software.
Antivirus software installed on each end host in an or-
ganization has become the de-facto security mechanism             1. More is better: A network service can employ a
used to defend against unwanted executables. We argue                cluster of servers to quickly analyze executables us-
that the executable analysis currently provided by host-             ing multiple techniques. For example, instead of us-
based antivirus software can be more efficiently and ef-             ing a single antivirus engine, a network service can
fectively provided as an in-cloud network service. In-               run a wide range of antivirus engines in parallel
stead of running complex analysis software on every end              while also performing behavioral analysis [3, 8].
host, we suggest that each end host run a lightweight pro-           Additional detection engines can easily be inte-
cess to acquire executables entering a system, send them             grated into a network service, allowing for consider-
into the network for analysis, and then run or quarantine            able extensibility. Such comprehensive analysis can
them based on a threat report returned by the network                significantly increase the detection coverage of ma-
service. An executable analysis service run inside an en-            licious software.
terprise network or by a service provider could integrate         2. Keep it simple stupid: By significantly lowering
antivirus software, behavioral simulation, and other anal-           the complexity of host-based monitoring software,
ysis engines from multiple vendors providing better de-              a number of advantages are realized. Clients no
tection of malware and simplify client software enabling             longer need to continually update their local signa-
deployment on a broader range of devices. To explore                 ture database, reducing administrative cost. Simpli-
this idea we construct a prototype composed of a Win-                fying the client software also decreases the chance
dows based host agent and an in-cloud analysis service               that it could contain exploitable vulnerabilities [5,
and evaluate it using a diverse dataset of 5066 unique               13]. Finally, lightweight client software allows the
malicious executables. By correlating information be-                service to be extended to mobile and resource-
tween multiple detection engines, our system provides                limited devices that lack sufficient processing power
over 98% detection coverage of the malicious executa-                but remain an enticing target for malware.
bles using eight antivirus engines and two behavioral en-
gines compared to a 54% to 86% detection rate using the           3. Sharing is caring: Correlating information be-
latest commercial antivirus products.                                tween engines is enabled when multiple engines are
                                                                     run within a network service and can be beneficial
1   I NTRODUCTION                                                    to detection coverage. For example, if a behavioral
Detecting malicious software has become an increas-                  system finds that the behavior of an unknown exe-
ingly challenging problem. While a single antivirus en-              cutable has similar behavior to an executable pre-
gine may be able to detect many types of malware, 0-day              viously classified as malicious by antivirus engines,
threats and other obfuscated attacks [12] can frequently             the unknown executable can be quarantined.
evade a single engine. We argue that the executable anal-
ysis service currently provided by host-based antivirus             This paper investigates how to design a network ser-
software can be more efficiently and effectively provided        vice for executable analysis. We construct a prototype
as an in-cloud network service. Instead of running com-          composed of a Windows-based host agent and an in-
plex analysis software on every end host, we suggest that        cloud network service. Using a recent dataset of 5066
each end host run a lightweight process to acquire exe-          unique malicious executables [7], we demonstrate how
cutables entering a system, send them to a network ser-          the system detects 98% of the samples using eight an-
vice for analysis, and then run or quarantine them based         tivirus engines and two behavioral engines in parallel,
on a threat report returned by the network service.              compared to a 54% to 86% detection rate using the lat-
   Centralizing the analysis of suspicious software as           est commercial antivirus products. We also show that the
a network service has several advantages over existing           average time to analyze a legitimate or malicious exe-

                                                             1
cutable, including network latencies, is less than a sec-         ware analysis service runs as an additional layer of pro-
ond.                                                              tection to augment existing security systems inside an or-
                                                                  ganizational network such as an enterprise.
2   A PPROACH
                                                                  3.1     Executable Identification and Acquisition
The use of traditional signature-based approaches for the
problem of detecting malicious software in the network            Malicious executables can enter an organization using a
and on the host has become increasingly difficult. The            range of different techniques. For example, mobile de-
elevating sophistication of modern malicious software             vices (e.g., a USB stick), email attachments, downloads
poses a significant challenge for any single vendor to de-        (wanted or unwanted), and vulnerable network services
velop signatures for every new piece of malicious soft-           are all common entry points. Due to the broad range
ware. Indeed, a recent Microsoft survey found more than           of entry vectors our system includes a lightweight exe-
45,000 new variants of backdoors, trojans, and bots dur-          cutable acquisition system that is run on each end host.
ing the second half of 2006 [6].                                     Just like existing antivirus software, the client module
    In response, a new generation of systems has been             runs on each host and inspects each executable. The exe-
developed to monitor the behavior of suspicious appli-            cution of any binary is trapped and diverted to a handling
cations to identify malicious binaries. Sandbox services          routine first. The handling routine begins by hashing the
such as Norman [8] and CWSandbox [3] enable the anal-             binary and checking the hash against a local and remote
ysis of suspicious binaries and produce a detailed report         whitelist and blacklist. If the hash does not match in the
of a binary’s actions in a simulated environment. In addi-        whitelist or blacklist, the entire binary is sent securely to
tion, virtual machines have been used to analyze malware          the in-cloud service for analysis.
and track malicious behavior [1, 10, 14]. While behav-               To minimize the time between when a user downloads
ioral and VM-based detection techniques improve our               an executable and receives a response from the network
ability to detect malicious software, they can be resource        service, the architecture provides a method to send a bi-
intensive and are not designed to protect every end host.         nary for analysis as soon as it is written or changed on the
    In this paper, we explore how to improve the detection        end host system (e.g., via file-copy, installation, or down-
of malicious software by providing executable analysis            load). Doing so amortizes the transmission and analysis
as an in-cloud network service. We envision a vendor-             cost over the time elapsed between binary creation and
agnostic service that is deployed in a high-speed, low-           user-initiated execution.
latency network such as inside an enterprise or by a ser-         3.2     In-Cloud Executable Analysis
vice provider. Such a centralized service would integrate
detection engines from many vendors and include an end            The second major component of the system is the net-
host component for the automated acquisition of bina-             work service responsible for malware analysis. The job
ries. The network service is similar to other in-cloud pro-       of the analysis service is to receive an executable and re-
tection mechanisms such as email [2, 4] and HTTP [11]             turn a threat report with information about whether the
filtering, except that we use a dedicated end host agent          executable is safe to run.
which enables our system to protect against executables           3.2.1    Executable Analysis
that arrive using different protocols and devices.
                                                                  Rather then choosing one single technique to detect mali-
3   A RCHITECTURE                                                 cious executables, we suggest that a network service run
                                                                  multiple engines in parallel. Unlike host-based analysis
Performing the analysis of executables using a network            systems that must meet relatively tight storage and com-
service is not a simple task. Suspicious executables must         putational constraints, a network service can easily scale
be acquired and isolated so that they can be sent to an up-       to meet the demand of multiple analysis engines.
stream analysis service. The analysis service must be ef-            While the specific backend analysis techniques used
ficient, and the execution of malicious executables must          are independent of the proposed architecture, two gen-
be prevented.                                                     eral classes seem particularly well-suited to the proposed
   To solve these problems we envision an architecture            approach.
that includes two major components. The first host-based
component acquires executables and sends them to the                • Antivirus Engines: There is a wide range of an-
network service, and the second network-based compo-                  tivirus products commercially available today, and
nent identifies malicious executables and returns a threat            as we will show, they differ significantly in their de-
report to the end host. Figure 1 shows a high level de-               tection coverage. By analyzing each candidate exe-
sign of the system architecture. It is important to point             cutable with multiple antivirus engines, the network
out that we do not see our system replacing existing an-              service is able to provide better detection of mali-
tivirus or intrusion detection solutions. Rather, our mal-            cious software.

                                                              2
Executables
          Obtained from                                  Executable
                                                                                    Executable Analysis Engines
        HTTP                                                                    • Antivirus
                          EXE                        Threat Report              • Behavioral Analysis
  FTP
     Email                                                                      • Black/White Lists
      P2P IM                                                                                  Network Cloud

                                 Figure 1: Architectural approach for in-cloud executable analysis.

  • Behavioral Analyzers: A second major method of                      When the client receives the threat report, it will either
    testing executables is behavioral analysis. A behav-             begin execution of the binary or prevent execution and
    ioral system executes suspicious executables in a                present the report to the user. This, of course, assumes
    sandboxed environment [3, 8] or virtual machine [1]              that the client module has not been compromised. Just as
    and records host state changes and network activity.             with antivirus software, we assume that integrity of the
                                                                     client is maintained as a trusted module.
   While performing analysis in parallel is far more effi-
cient then serial execution, delay can still exist between           4     I MPLEMENTATION AND E VALUATION
when an executable is submitted and when the result is
                                                                     We constructed a prototype implementation of the pro-
returned. The architecture employs a caching mechanism
                                                                     posed architecture that includes an executable acquisi-
to better deal with associated latencies.
                                                                     tion system for the Windows platform and an in-cloud
   The caching mechanism is simply a whitelist/blacklist
                                                                     network service for analysis. This section describes how
database of hashes of previously analyzed binaries. The
                                                                     we implemented the prototype and explores the detection
idea is that executables run in an organization will fre-
                                                                     capabilities and performance of the system.
quently be similar across different hosts so keeping a
cache of previously analyzed binaries can significantly              4.1     System Implementation
improve system performance. A cache hit in either the
whitelist or blacklist would therefore result in an imme-            4.1.1     Client Module
diate server response indicating whether the client should           We implemented a lightweight client module to detect
execute the binary.                                                  executables written to disk and then send them to the
                                                                     network service for analysis. The notification for file sys-
3.2.2   Executable Threat Report
                                                                     tem writes is provided by the ReadDirectoryChangesW
The results from the executable analysis engines are syn-            Win32 API call, a mechanism similar to inotify on
thesized into a threat report returned to the client over            Linux. Each file written is scanned for a valid Portable
a secure and authenticated channel. The report includes              Executable (PE) header to verify whether it is an exe-
three distinct sections.                                             cutable. The executable is then hashed using the SHA-
                                                                     1 algorithm and compared to both the local and remote
  • Execution Directive: a boolean indicating whether                whitelist and blacklist. Local whitelists are seeded with
    the executable should be run. The formula used to                hashes of the default set of executables included with
    set this value is customizable as described above but            common Microsoft Windows installations to reduce re-
    the simplest mode is to set the execution directive              mote whitelist lookups.
    to “do not run” if any detection engine classifies an               The client module also prevents the execution of bi-
    executable as malware.                                           naries that have been identified as malicious. By hooking
                                                                     the CreateProcess Win32 API call, we interpose on the
  • Family/Variant Labels: a list of family/variant la-
                                                                     creation of new processes and halt the execution of un-
    bels assigned to the malware by the different analy-
                                                                     wanted code.
    sis engines.

  • Behavioral Analysis: a list of host and network be-
                                                                     4.1.2     Network Service
    haviors observed during simulation. This may in-                 The second component of the prototype is a network ser-
    clude information about processes spawned, files                 vice responsible for analyzing executables using multiple
    and registry keys modified, network activity, and                detection engines and returning a comprehensive threat
    other state changes.                                             report on each executable. Incoming executables from

                                                                 3
AV Vendor      Version       Signatures    Detection Results              #    Detected   Antivirus Products
      Avast          4.7.1001      000740-0           84.7%                     1    86.59%     F-Secure
      ClamAV         0.90.2        3224               59.7%                     2    92.93%     Trend, Avast
      F-Prot         6.0.7.0       4.3.3              79.9%                     3    94.63%     Trend, F-Secure, Avast
      F-Secure       7.01.128      N/A                86.6%                     4    95.34%     ClamAV, Symantec, Trend, Avast
      Kaspersky      6.0.2.621     N/A                85.3%                     5    95.85%     ClamAV, Symantec, Trend, F-Secure, Avast
      McAfee         5100.0194     5027.0000          54.9%                     6    96.15%     F-Prot, ClamAV, Symantec, Trend, F-Secure, Avast
      Symantec       14.0.3.3      N/A                81.9%                     7    96.23%     Mcafee, F-Prot, ClamAV, Symantec, Trend, Kaspersky, Avast
      Trend Micro    15.00.1433    4.459.00           82.0%                     8    96.23%     all

Table 1: The antivirus protection engines, the versions used in the pro-       Table 2: Detection coverage of the malware samples using between one
totype, and the percentage of malware samples detected.                        and eight antivirus products.

end hosts are assigned to a request broker which is re-                        engines in our network service, we were able to detect
sponsible for delivering the executable to each analysis                       4875 executables as malicious, resulting in a detection
engine, collecting the results, and returning the threat re-                   rate over 96%. The dataset used in the paper only in-
port back to the client.                                                       cludes known malicious executables and so exploration
   We use eight antivirus and two behavioral engines in                        of false positives is an open research question.
our prototype. The eight antivirus engines are listed in                          While the eight antivirus engines were able to iden-
Table 1, and the behavioral analyzers include Norman                           tify 96% of the executables as malicious, 191 executa-
Sandbox Analyzer [8] and the behavioral profiling sys-                         bles were not flagged. This last 4% is arguably the most
tem described in [1]. The detection engines are run in                         difficult for current detection systems to identify. For-
parallel in virtualized environments.                                          tunately, the prototype also includes behavioral analy-
   Our prototype also includes an optimization aimed at                        sis and the capability to share information between de-
reducing the latency perceived by end users when run-                          tection engines. Behavioral reports on the 191 executa-
ning newly obtained executables. We implemented a net-                         bles revealed that 92 had similar behavior to known ma-
work stream sensor that promiscuously monitors a net-                          licious binaries already identified by the antivirus soft-
work tap (e.g., a switch span port) to acquire executables.                    ware. Such information could be used to classify those 92
By deploying such a component, executables can be ex-                          binaries as malicious resulting in a detection rate of over
tracted out of network transmissions on the fly and ana-                       98%, potentially extending the coverage to polymorphic
lyzed by the in-cloud network service before even reach-                       and 0-day malware that have evaded the antivirus en-
ing the destination end host. Clearly this approach does                       gines. Such sharing of information between detection en-
not speed up the analysis of all executables as network                        gines illustrates one interesting advantage of combining
traffic can be encrypted and the sensor currently handles                      heterogeneous detection systems into one service.
only a handful of common network protocols.
                                                                               4.2.2    Performance
4.2     Deployment and Results
                                                                               By moving analysis into the network, we incur the net-
4.2.1     Detection Coverage                                                   work latency of sending the binary to the service and
To evaluate the detection capabilities of the prototype                        the latency of using a multiple engines to analyze the
we obtained over 5000 malware executables from Ar-                             binary. In this subsection, we show that this latency
bor Network’s Arbor Malware Library (AML) [7]. The                             is small on average and is acceptable for clients con-
AML contains malware which has been captured in the                            nected to the high-speed, low-latency (
ware samples. The average size of the samples was 366                Finally, since our system integrates other detection
KB and the average analysis time per executable for the           engines, it also shares their limitations. For example,
antivirus engines was 0.48 seconds. Symantec was the              a behavioral profiler might not detect malware that de-
slowest taking 0.91 seconds per executable on average.            lays exhibiting behavior. However, since the architecture
Again, the analysis times and executable sizes indicate           is modular, improved detection engines are easily inte-
perceived execution latencies of a second or two.                 grated into the system.
   Thus far we have assumed that the executable under                We are currently working to refine our prototype and
analysis has never been observed before. Our prototype            are collaborating with different enterprises and service
includes both a local and remote cache of threat reports          providers to collect information on executable usage and
of previously analyzed executables to significantly speed         further evaluate our approach.
up execution of previously analyzed binaries. While we            ACKNOWLEDGMENTS
do not have data in this paper on the frequency of legit-
imate executables used in an organization, it would be            This work was supported in part by the Department of Home-
                                                                  land Security (DHS) under contract number NBCHC060090
logical that many employees within an organization use
                                                                  and by the National Science Foundation (NSF) under contract
a similar set of applications making caching an impor-            number CNS 0627445. We would like to thank Jose Nazario
tant performance enhancement. We were able to obtain              from Arbor Networks and Georg Wicherski from the mwcol-
data from the mwcollect Alliance [9] on the frequency             lect Alliance.
of malicious executables. Over a two month period on a
/18 network they observed 213 distinct executables over           R EFERENCES
2.5 million times. Of the 213 executables, 164 of them             [1] Michael Bailey, Jon Oberheide, Jon Andersen, Z. Morley Mao,
                                                                       Farnam Jahanian, and Jose Nazario. Automated classification and
(77%) were seen multiple times i.e., 49 were only ob-                  analysis of internet malware. To appear in Proceedings of the
served once. Given the vast number of duplicate encoun-                10th International Symposium on Recent Advances in Intrusion
tered, a blacklist cache for these 213 executables would               Detection (RAID’07).
have a cache hit rate of over 99.99%, resulting in a near-         [2] Barracuda Networks. Barracuda spam firewall. http://www.
                                                                       barracudanetworks.com, 2007.
instant response from the service.
                                                                   [3] Carsten Willems and Thorsten Holz. Cwsandbox. http://www.
                                                                       cwsandbox.org/, 2007.
5   D ISCUSSION                                                    [4] Cloudmark. Cloudmark authority anti-virus.           http://www.
This paper has presented a vision for moving from a host-              cloudmark.com, 2007.
centric antivirus paradigm to providing executable anal-           [5] Abhishek Kumar, Vern Paxson, and Nicholas Weaver. Exploit-
                                                                       ing underlying structure for detailed reconstruction of an internet-
ysis as an in-cloud network service. We constructed a                  scale event. Proceedings of the USENIX/ACM Internet Measure-
prototype and our initial results show potential for a sig-            ment Conference, October 2005.
nificant improvement in the detection of real malicious            [6] Microsoft. Microsoft security intelligence report: July-december
software. However, there are several areas that require                2006.      http://www.microsoft.com/technet/security/
                                                                       default.mspx, May 2007.
further investigation.
                                                                   [7] Arbor Networks. Arbor malware library (AML). http://www.
   First and foremost is the question of disconnected op-              arbornetworks.com, 2007.
eration. When a mobile user is not connected to the net-           [8] Norman Solutions.         Norman sandbox whitepaper.
work, has a high-latency, low-bandwidth connection, or                 http://download.norman.no/whitepapers/whitepaper_
is the victim of a denial of service attack, sending ex-               Norman_SandBox.pdf, 2003.
ecutables to the network service may not be feasible.              [9] Paul Bacher and Markus Kotter and Georg Wicherski. The mw-
                                                                       collect alliance. http://www.mwcollect.org, 2007.
While certain organizations may select a policy requir-
                                                                  [10] Honeynet Project. Know Your Enemy: Learning with VMware.
ing that users wait for network connectivity before run-               2003.
ning new applications, others may desire more flexibility.        [11] Niels Provos. Spybye. http://www.monkey.org/˜provos/
One solution is to have mobile users also run traditional              spybye, 2007.
antivirus software as a fail-over backup. Therefore, in the       [12] Paul Royal, Mitch Halpin, David Dagon, Robert Edmonds, and
disconnected state, users will have at least the same de-              Wenke Lee. PolyUnpack: Automating the Hidden-Code Extrac-
                                                                       tion of Unpack-Executing Malware. In The 22th Annual Com-
tection coverage as today.                                             puter Security Applications Conference (ACSAC 2006), Miami
   Another issue is that malicious software comes in                   Beach, FL, December 2006.
many forms. While our prototype currently operates only           [13] Symantec Corporation. Symantec security advisory (sym06-
on Win32 Portable Executables, we are working on ex-                   010).    http://www.symantec.com/avcenter/security/
                                                                       Content/2006.05.25.html, 2006.
tending it to handle external DLLs, interpreted scripting
                                                                  [14] Michael Vrable, Justin Ma, Jay Chen, David Moore, Erik Van-
languages, malicious web content, and other file types.                dekieft, Alex C. Snoeren, Geoffrey M. Voelker, and Stefan Sav-
Although our system cannot detect shellcode injected                   age. Scalability, fidelity and containment in the Potemkin virtual
into memory, such shellcode often fetches an executable                honeyfarm. In Proceedings of the 20th ACM Symposium on Op-
                                                                       erating System Principles (SOSP), Brighton, UK, October 2005.
which can be detected.

                                                              5
You can also read