Cicada: Introducing Predictive Guarantees for Cloud Networks
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Cicada: Introducing Predictive Guarantees for Cloud Networks
Katrina LaCurts* , Jeffrey C. Mogul† , Hari Balakrishnan* , Yoshio Turner‡
* MIT CSAIL, † Google, Inc., ‡ HP Labs
Abstract and [24], current cloud systems fail to offer even basic
In cloud-computing systems, network-bandwidth guaran- network performance guarantees; this inhibits cloud use
tees have been shown to improve predictability of applica- by enterprises that must provide service-level agreements.
tion performance and cost [1, 28]. Most previous work on Others have proposed mechanisms to support cloud
cloud-bandwidth guarantees has assumed that cloud ten- bandwidth guarantees (see §2); with few exceptions, these
ants know what bandwidth guarantees they want [1, 17]. works assume that tenants know what guarantee they want,
However, as we show in this work, application bandwidth and require the tenants to explicitly specify bandwidth
demands can be complex and time-varying, and many requirements, or to request a specific set of network re-
tenants might lack sufficient information to request a guar- sources [1, 17]. But do cloud customers really know their
antee that is well-matched to their needs, which can lead network requirements? Application bandwidth demands
to over-provisioning (and thus reduced cost-efficiency) or can be complex and time-varying, and not all application
under-provisioning (and thus poor user experience). owners accurately know their bandwidth demands. A
We analyze traffic traces gathered over six months from tenant’s lack of accurate knowledge about its future band-
an HP Cloud Services datacenter, finding that application width demands can lead to over- or under-provisioning.
bandwidth consumption is both time-varying and spatially For many (but not all) cloud applications, future band-
inhomogeneous. This variability makes it hard to predict width requirements are in fact predictable. In this paper,
requirements. To solve this problem, we develop a predic- we use network traces from HP Cloud Services [11] to
tion algorithm usable by a cloud provider to suggest an demonstrate that tenant bandwidth demands can be time-
appropriate bandwidth guarantee to a tenant. With tenant varying and spatially inhomogeneous, but can also be
VM placement using these predictive guarantees, we find predicted, based on automated inference from their previ-
that the inter-rack network utilization in certain datacenter ous history.
topologies can be more than doubled. We argue that predictive guarantees provide a better
abstraction than prior approaches, for three reasons. First,
1 I NTRODUCTION the predictive guarantee abstraction is simpler for the ten-
This paper introduces predictive guarantees, a new ab- ant, because the provider automatically predicts a suitable
straction for bandwidth guarantees in cloud networks. A guarantee and presents it to the tenant.
provider of predictive guarantees observes traffic along Second, the predictive guarantee abstraction supports
the network paths between virtual machines (VMs), and time- and space-varying demands. Prior approaches typi-
uses those observations to predict the data rate require- cally offer bandwidth guarantees that are static in at least
ments over a future time interval. The cloud provider can one of those respects, but these approaches do not capture
offer a tenant a guarantee based on this prediction. general cloud applications. For example, we expect tem-
Why should clouds offer bandwidth guarantees, and poral variation for user-facing applications with diurnally-
why should they use predictive guarantees in particu- varying workloads, and spatial variation in VM-to-VM
lar? Cloud computing infrastructures, especially public traffic for applications such as three-tier services.
“Infrastructure-as-a-Service” (IaaS) clouds, such as those Third, the predictive guarantee abstraction easily sup-
offered by Amazon, HP, Google, Microsoft, and others, ports fine-grained guarantees. By generating guarantees
are being used not just by small companies, but also by automatically, rather than requiring the tenant to specify
large enterprises. For distributed applications involving them, we can feasibly support a different guarantee on
significant inter-node communication, such as in [1], [23], each VM-to-VM directed path, and for relatively short
1time intervals. Fine-grained guarantees are potentially duce jobs are scheduled so that their network-intensive
more efficient than coarser-grained guarantees, because phases do not interfere with each other. Proteus assumes
they allow the provider to pack more tenants into the same uniform all-to-all hose-model bandwidth requirements
infrastructure. during network-intensive phases, although each such
Recent research has addressed these issues in part. phase can run at a different predicted bandwidth. Pro-
For example, Oktopus [1] supports a limited form of teus (and other related work [13, 19]) does not generalize
spatial variation; Proteus [28] supports a limited form to a broad range of enterprise applications as Cicada does.
of temporal-variation prediction. But no prior work, Hajjat et al. [9] describe a technique to decide which
to our knowledge, has offered a comprehensive frame- application components to place in a cloud datacenter,
work for cloud customers and providers to agree on ef- for hybrid enterprises where some components remain in
ficient network-bandwidth guarantees, for applications a private datacenter. Their technique tries to minimize
with time-varying and space-varying traffic demands. the traffic between the private and cloud datacenters, and
We place predictive guarantees in the concrete context hence recognizes that inter-component traffic demands
of Cicada1 , a system that implements predictive guaran- are spatially non-uniform. In contrast to Cicada, they do
tees. Cicada observes a tenant’s past network usage to not consider time-varying traffic nor how to predict it, and
predict its future network usage, and acts on its predic- they focus primarily on the consequences of wide-area
tions by offering predictive guarantees to the customer, traffic, rather than intra-datacenter traffic.
as well as by placing (or migrating) VMs to increase Cicada does not focus on the problem of enforcing
utilization and improve load balancing in the provider’s guarantees [8, 16, 22, 25] nor the tradeoff between guar-
network. Cicada is therefore beneficial to both the cloud antees and fairness [21]. Though these issues would arise
provider and cloud tenants. More details about Cicada for a provider using Cicada, they can be addressed with
are available in [15]. any of the above techniques.
2 R ELATED W ORK 3 T HE D ESIGN OF C ICADA
Throughout, we use the following definitions. A provider The goal of Cicada is to free tenants from choosing
refers to an entity offering a public cloud (IaaS) service. between under-provisioning for peak periods, or over-
A customer is an entity paying for use of the public cloud. paying for unused bandwidth. Predictive guarantees per-
We distinguish between customer and tenant, as tenant mit a provider and tenant to agree on a guarantee that
has a specific meaning in OpenStack [20]: operationally, varies in time and/or space. The tenant can get the net-
a collection of VMs that communicate with each other. A work service it needs at a good price, while the provider
single customer may be responsible for multiple tenants. can avoid allocating unneeded bandwidth and can amor-
Finally, application refers to software run by a tenant; a tize its infrastructure across more tenants.
tenant may run multiple applications at once.
Recent research has proposed various forms of cloud 3.1 An Overview of Cicada
network guarantees. Oktopus supports a two-stage “vir- Cicada has several components, corresponding to the
tual oversubscribed cluster” (VOC) model [1], intended steps in Figure 1. After determining whether to admit
to match a typical application pattern in which clusters a tenant—taking CPU, memory, and network resources
of VMs require high intra-cluster bandwidth and lower into account—and making an initial placement (steps 1
inter-cluster bandwidth. VOC is a hierarchical generaliza- and 2), Cicada measures the tenant’s traffic (step 3), and
tion of the hose model [5]; the standard hose model, as delivers a time series of traffic matrices to a logically
used in MPLS, specifies for each node its total ingress and centralized controller. The controller uses these measure-
egress bandwidths. The finer-grained pipe model specifies ments to predict future bandwidth requirements (step 4).
bandwidth values between each pair of VMs. CloudMir- In most cases, Cicada converts a bandwidth prediction
ror allows tenants to specific a tenant abstraction graph, into an offered guarantee for some future interval. Cus-
which reflects the structure of the application itself [17]. tomers may choose to accept or reject Cicada’s predictive
Cicada supports any of these models, and unlike these guarantees (step 5). Because Cicada collects measure-
systems, does not require the tenant to specify network ment data continually, it can make new predictions and
demands up front. offer new guarantees throughout the lifetime of the tenant.
The Proteus system [28] profiles specific MapReduce Cicada interacts with other aspects of the provider’s
jobs at a fine time scale, to exploit the predictable phased infrastructure and control system. The provider needs to
behavior of these jobs. It supports a “temporally inter- rate-limit the tenant’s traffic to ensure that no tenant un-
leaved virtual cluster” model, in which multiple MapRe- dermines the guarantees sold to other customers. We dis-
1 Thanksto Dave Levin, Cicada is an acronym: Cicada Infers Cloud tinguish between guarantees and limits. If the provider’s
Application Demands Automatically. limit is larger than the corresponding guarantee, tenants
2hypervisors enforce rate limits
1. customer makes 2. provider places tenant VMs and collect measurements
initial request and sets initial rate limits
3. hypervisors continually send
measurements to Cicada controller
5b. provider may update
5a. provider may offer the placement based on Cicadaʼs prediction 4. Cicada predicts bandwidth
customer a guarantee based for each tenant
on Cicadaʼs prediction
Figure 1: Cicada’s architecture.
can exploit best-effort bandwidth beyond their guarantees. Note that the predictive guarantees offered by Cicada
A provider may wish to place VMs based on their are limited by any caps set by the provider; thus, a pro-
associated bandwidth guarantees, to improve network posed guarantee might be lower than suggested by the
utilization. We describe a VM placement method in §6. prediction algorithm.
The provider could also migrate VMs to increase the A predictive guarantee entails some risk of either under-
number of guarantees that the network can support [6]. provisioning or over-provisioning, and different tenants
will have different tolerances for these risks, typically
3.2 Assumptions
expressed as a percentile (e.g., the tenant wants sufficient
In order to make predictions, Cicada needs to gather suffi- bandwidth for 99.99% of the 10-second intervals). Ci-
cient data about a tenant’s behavior. Based on our eval- cada’s prediction algorithm also recognizes conditions
uation, Cicada may need at least an hour or two of data under which the prediction is unreliable, in which case
before it can offer useful predictions. Cicada does not propose a guarantee for a tenant.
Any shared resource that provides guarantees must in-
clude an admission control mechanism, to avoid making 3.5 Recovering from Faulty Predictions
infeasible guarantees. We assume that Cicada will in- Cicada’s prediction algorithm may make faulty predic-
corporate network admission control using an existing tions because of inherent limitations or insufficient prior
mechanism [1, 12, 14]. We also assume that the cloud information. Because Cicada continually collects mea-
provider has a method to enforce guarantees, such as [21]. surements, it can detect when its current guarantee is
3.3 Measurement Collection inappropriate for the tenant’s current network load.
Cicada collects a time series of traffic matrices for each When Cicada detects a faulty prediction, it can take one
tenant, either passively, by collecting NetFlow or sFlow of many actions: stick to the existing guarantee, propose
data at switches within the network, or by using an agent a new guarantee, upgrade the tenant to a higher, more
that runs in the hypervisor of each machine. We would expensive guarantee, etc. How and whether to upgrade
like to base predictions on offered load, but when the guarantees, as well as what to do if Cicada over-predicts,
provider imposes rate limits, we risk underestimating peak is a pricing-policy decision, and outside our scope. We
loads that exceed those limits, and thus under-predicting note, however, that typical System Level Agreements
future traffic. We can detect when a VM’s traffic is rate- (SLAs) include penalty clauses, in which the provider
limited via packet drops, so these underestimates can be agrees to remit some or all of the customer’s payment if
detected, too, although not precisely quantified. Currently, the SLA is not met.
Cicada does not account for this potential error.
4 M EASUREMENT R ESULTS
3.4 Prediction Model We designed Cicada under the assumption that the traffic
Some applications, such as backup or database ingestion, from cloud tenants is predictable, but in ways that are not
require bulk bandwidth, and need guarantees that the captured by existing models. Before building Cicada, we
average bandwidth over a period of H hours will meet collected data from HP Cloud Services, to analyze the
their needs (e.g., a guarantee that two VMs will be able spatial and temporal variability of its tenants’ traffic.
to transfer 3.6Gbit in an hour, but not necessarily at a
constant rate of 1Mbit/s). Other applications, such as user- 4.1 Data
facing systems, require guarantees for peak bandwidth We have collected sFlow [27] data from HP Cloud Ser-
over much shorter intervals (e.g., a guarantee that two vices, which we refer to as the HPCS dataset. Our dataset
VMs will be able to transfer at a rate of 1Mbit/s for the consists of about six months of samples from 71 Top-of-
next hour). Rack (ToR) switches. Each ToR switch connects to either
Cicada’s predictions describe the maximum bandwidth 48 servers via 1GbE NICs, or 16 servers via 10GbE NICs.
expected during any averaging interval δ during a given In total, the data represents about 1360 servers, spanning
time interval H. If δ = H, the prediction is for the band- several availability zones. This dataset differs from those
width requirement averaged over H hours, but for an in- in previous work—such as [2, 3, 7]—in that it captures
teractive application, δ might be just a few milliseconds. VM-to-VM traffic patterns.
31 1 For each tenant in the HPCS dataset, we calculate its
0.8 0.8 temporal variation value, and plot the CDF in Figure 2.
0.6 0.6
CDF
CDF
As with spatial variation, most tenants have high temporal
0.4 0.4 1 hour
0.2 0.2 4 hours
variability. This variability decreases as we increase the
0 0
24 hours time interval H, but we see variability at all time scales.
0.01 0.1 1 10 100 0.01 0.1 1 10 100
These results indicate that a rigid model is insufficient
Spatial Variation Temporal Variation
for making traffic predictions. Given the high spatial
Figure 2: Variation in the HPCS dataset. cov values in Figure 2, a less strict but not entirely gen-
We aggregate the data over five-minute intervals, to eral model such as VOC [1] may not be sufficient either.
produce datapoints of the form htimestamp, source, des- Furthermore, the high temporal variation values indicate
tination, number of bytes transferred from source to des- that static models cannot accurately represent the traffic
tinationi. We keep only the datapoints where both the patterns of the tenants in the HPCS dataset.
source and destination are private IPs of virtual machines,
and thus all our datapoints represent traffic within the data- 5 C ICADA ’ S T RAFFIC P REDICTION M ETHOD
center. Making predictions about traffic traveling outside Much work has been done on predicting traffic matrices
the datacenter or between datacenters is a challenging from noisy measurements such as link-load data [26, 29].
task, which we do not address in this work. In these scenarios, prediction approaches such as Kalman
Under the agreement by which we obtained this data, filters and Hidden Markov Models—which try to estimate
we are unable to reveal information such as the total num- true values from noisy samples—are appropriate. Cicada,
ber of tenants, the number of VMs per tenant, or the however, knows the exact traffic matrices observed in past
growth rates for these values. epochs; its problem is to predict a future traffic matrix.
4.2 Spatial and Temporal Variability Cicada’s prediction algorithm adapts Herbster and War-
To quantify spatial variability, we compare the observed muth’s “tracking the best expert” idea [10], which has
tenants to a static, all-to-all tenant. This tenant is the easi- been successfully adapted before in wireless power-saving
est to make predictions for: every intra-tenant connection and energy reduction contexts [4, 18]. To predict the traf-
sends the same amount of data at a constant rate. Let Fi j fic matrix for epoch n + 1, we use all previously observed
be the fraction of this tenant’s traffic sent from V Mi to traffic matrices, M1 , . . . , Mn (below, we show that data
V M j . For this “ideal” all-to-all tenant, all of these values from the distant past can be pruned away without affect-
are equal; every VM sends the same amount of data to ing accuracy). In such a matrix, the entry in row i and
every other VM. As another example, consider a bimodal column j specifies the number of bytes that VM i sent to
tenant with some pairs that never converse and others that VM j in the corresponding epoch (for predicting average
converse equally. Let k = n/2. Each VM communicates bandwidth) or the maximum observed over a δ -length
with k VMs, and sends 1/k of its traffic; hence half of the interval (for peak bandwidth).
F values are 1/k, and the other half are zero. The algorithm assigns each of the previous matrices a
For each tenant in the HPCS dataset, we calculate the weight as part of a linear combination. The result of this
distribution of its F values, and plot the coefficient of combination is the prediction for Mn+1 . Over time, these
variation in Figure 2. The median cov value is 1.732, weights get updated; the higher the weight of a previous
which suggests nontrivial overall spatial variation (for matrix, the more important that matrix is for prediction.
contrast, the cov of our ideal tenant is zero, and the cov In the HPCS dataset, we found that the 12 most recent
of our bimodal tenant is one2 ). hours of traffic are weighted heavily, and there is also a
To quantify temporal variability, we first pick a time spike at 24 hours earlier. Weights are vanishingly small
interval H. For each consecutive H-hour interval, we cal- prior to this. Thus, at least in our dataset, one does not
culate the sum of the total number of bytes sent between need weeks’ worth of data to make reliable predictions.
each pair p of VMs, which gives us a distribution Tp of We refer the reader to [15] for a full description of our
bandwidth totals. We then compute the coefficient of vari- prediction algorithm, including the method for updating
ation for this distribution, cov p . The temporal variation the weights, as well as an evaluation against a VOC-style
for a tenant is the weighted sum of these values. The prediction algorithm [1]. To summarize those results, Ci-
weight for cov p is the bandwidth used by pair p, to reflect cada’s prediction method decreases the median relative
the notion that tenants where only one small flow changes error by 90% compared to VOC when predicting either
over time are less temporally-variable than those where average or peak bandwidth. Cicada also makes predic-
one large flow changes over time. tions quickly. In the HPCS dataset, the mean prediction
speed is under 10 msec in all but one case (under 25 msec
p
2 For the bimodal tenant, µ = 1/2k and σ = 1/n ∑n (1/2k)2 = 1/2k. in all cases).
i=1
4Algorithm 1 Cicada’s VM placement algorithm
Inter-rack Bandwidth Available
600000
bw factor 1x
1: P = set of VM pairs, in descending order of their 500000 bw factor 25x
bw factor 250x
bandwidth predictions
(GByte/hour)
400000
2: for (src, dst) ∈ P do
300000
3: if resources aren’t available to place (src, dst)
then 200000
4: revert reservation 100000
5: return False 0
1 1.4142 1.5 1.666 1.75 2 4 8 16
6: A = All available paths Oversubscription Factor
7: if src or dst has already been placed then
8: restrict A to paths including the appropriate end- Figure 3: Inter-rack bandwidth available after placement.
point
9: Place (src, dst) on the path in A with the most bandwidth—what remains unallocated after the VMs are
available bandwidth placed—varies with O p , the physical oversubscription
factor. In all cases, Cicada’s placement algorithm leaves
more inter-rack bandwidth available. When O p is greater
6 E VALUATION than two, both algorithms perform comparably, since this
Although Cicada’s predictive guarantees allow cloud is a constrained environment with little bandwidth avail-
providers to decrease wasted bandwidth, it is not clear able overall. However, with lower over-subscription fac-
that these providers treat wasted intra-rack and inter-rack tors, Cicada’s algorithm leaves more than twice as much
bandwidth equally. Inter-rack bandwidth may cost more, bandwidth available, suggesting that it uses network re-
and even if intra-rack bandwidth is free, over-allocating sources more efficiently in this setting.
network resources on one rack can prevent other tenants Over-provisioning reduces the value of our improved
from being placed on the same rack. placement, but it does not remove the need for better guar-
To determine whether wasted bandwidth is intra- or antees. Even on an over-provisioned network, a tenant
inter-rack, we need a VM-placement algorithm. For VOC, whose guarantee is too low for its needs may suffer if its
we use the placement algorithm detailed in [1], which VM placement is unlucky. A “full bisection bandwidth”
tries to place clusters on the smallest subtree that will network is only that under optimal routing; bad routing
contain them. For Cicada, we developed a greedy place- decisions or bad placement can still waste bandwidth.
ment algorithm, detailed in Algorithm 1. This algorithm
is similar in spirit to VOC’s placement algorithm; Ci- 7 C ONCLUSION
cada’s algorithm tries to place the most-used VM pairs This paper described the rationale for predictive guar-
on the highest-bandwidth paths, which in a typical dat- antees in cloud networks, and the design of Cicada, a
acenter corresponds to placing them on the same rack, system that provides this abstraction to tenants. Cicada
and then the same subtree. However, since Cicada uses provides fine-grained, temporally- and spatially-varying
fine-grained, pipe-model predictions, it has the potential guarantees without requiring the clients to specify their
to allocate more flexibly; VMs that do not transfer data to demands explicitly. We outlined a prediction algorithm
one another need not be placed on the same subtree, even where the prediction for a future epoch is a weighted lin-
if they belong to the same tenant. ear combination of past observations, with the weights
We compare Cicada’s placement algorithm against updated and learned online in an automatic way. Using
VOC’s on a simulated physical infrastructure with 71 traces from HP Cloud Services, we showed that the fine-
racks with 16 servers each, 10 VM slots per server, and grained structure of predictive guarantees can be used by
(10G/O p )Gbps inter-rack links, where O p is the physical a cloud provider to improve network utilization in certain
oversubscription factor. For each algorithm, we select a datacenter topologies.
random tenant, and use the ground truth data to determine
this tenant’s bandwidth needs for a random hour of its ac- ACKNOWLEDGMENTS
tivity, and place its VMs. We repeat this process until 99%
We are indebted to a variety of people at HP Labs and HP
of the VM slots are filled. Using the ground-truth data
Cloud for help with data collection and for discussions of
allows us to compare the placement algorithms explicitly,
this work. In particular, we thank Sujata Banerjee, Ken
without conflating this comparison with prediction errors.
Burden, Phil Day, Eric Hagedorn, Richard Kaufmann,
To get a sense of what would happen with more network-
Jack McCann, Gavin Pratt, Henrique Rodrigues, and Re-
intensive tenants, we also evaluated scenarios where each
nato Santos. This work was supported in part by the
VM-pair’s relative bandwidth use was multiplied by a
National Science Foundation under grant IIS-1065219 as
constant bandwidth factor (1×, 25×, or 250×).
well as an NSF Graduate Research Fellowship.
Figure 3 shows how the available inter-rack
5R EFERENCES ogy, 2014.
[1] BALLANI , H., C OSTA , P., K ARAGIANNIS , T., AND [16] L AM , T. V., R ADHAKRISHNAN , S., PAN , R.,
ROWSTRON , A. Towards Predictable Datacenter VAHDAT, A., AND VARGHESE , G. NetShare and
Networks. In SIGCOMM (2011). Stochastic NetShare: Predictable Bandwidth Allo-
[2] B ENSON , T., A KELLA , A., AND M ALTZ , D. A. cation for Data Centers. SIGCOMM CCR 42, 3
Network Traffic Characteristics of Data Centers in (2012).
the Wild. In IMC (2010). [17] L EE , J., L EE , M., P OPA , L., T URNER , Y., BANER -
JEE , S., S HARMA , P., AND S TEPHENSON , B.
[3] B ENSON , T., A NAND , A., A KELLA , A., AND
Z HANG , M. Understanding Data Center Traffic CloudMirror: Application-Aware Bandwidth Reser-
Characteristics. In WREN (2009). vations in the Cloud. In HotCloud (2013).
[4] D ENG , S., AND BALAKRISHNAN , H. Traffic-aware [18] M ONTELEONI , C., BALAKRISHNAN , H., F EAM -
STER , N., AND JAAKKOLA , T. Managing the
Techniques to Reduce 3G/LTE Wireless Energy
Consumption. In CoNEXT (2012). 802.11 Energy/Performance Tradeoff with Machine
[5] D UFFIELD , N. G., G OYAL , P., G REENBERG , Learning. Tech. Rep. MIT-LCS-TR-971, Mas-
A., M ISHRA , P., R AMAKRISHNAN , K. K., AND sachusetts Institute of Technology, 2004.
VAN DER M ERWE , J. E. A Flexible Model for Re-
[19] N IU , D., F ENG , C., AND L I , B. Pricing Cloud
source Management in Virtual Private Networks. In Bandwidth Reservations Under Demand Uncer-
SIGCOMM (1999). tainty. In Sigmetrics (2012).
[6] E RICKSON , D., H ELLER , B., YANG , S., C HU , J., [20] OpenStack Operations Guide - Managing Projects
E LLITHORPE , J., W HYTE , S., S TUART, S., M C K- and Users. http://docs.openstack.
EOWN , N., PARULKAR , G., AND ROSENBLUM ,
org/trunk/openstack-ops/content/
M. Optimizing a Virtualized Data Center. In SIG- projects_users.html.
COMM Demos (2011). [21] P OPA , L., K UMAR , G., C HOWDHURY, M., K RISH -
NAMURTHY, A., R ATNASAMY, S., AND S TOICA , I.
[7] G REENBERG , A., H AMILTON , J. R., JAIN , N.,
K ANDULA , S., K IM , C., L AHIRI , P., M ALTZ , FairCloud: Sharing the Network in Cloud Comput-
D. A., PATEL , P., AND S ENGUPTA , S. VL2: A ing. In SIGCOMM (2012).
Scalable and Flexible Data Center Network. In SIG- [22] R AGHAVAN , B., V ISHWANATH , K., R AMABHAD -
RAN , S., YOCUM , K., AND S NOEREN , A. C.
COMM (2009).
[8] G UI , C., L U , G., WANG , H. J., YANG , S., KONG , Cloud Control with Distributed Rate Limiting. In
C., S UN , P., W U , W., AND Z HANG , Y. SecondNet: SIGCOMM (2007).
A Data Center Network Virtualization Architecture [23] R ASMUSSEN , A., C ONLEY, M., K APOOR , R.,
with Bandwidth Guarantees. In CoNEXT (2010). L AM , V. T., P ORTER , G., AND VAHDAT, A.
[9] H AJJAT, M., S UN , X., S UNG , Y.-W. E., M ALTZ , Themis: An I/O-Efficient MapReduce. In SOCC
D., R AO , S., S RIPANIDKULCHAI , K., AND (2012).
TAWARMALANI , M. Cloudward Bound: Planning [24] R ASMUSSEN , A., P ORTER , G., C ONLEY, M.,
for Beneficial Migration of Enterprise Applications M ADHYASTHA , H., M YSORE , R. N., P UCHER , A.,
AND VAHDAT, A. TritonSort: A Balanced Large-
to the Cloud. In SIGCOMM (2010).
[10] H ERBSTER , M., AND WARMUTH , M. K. Tracking Scale Sorting System. In NSDI (2011).
the Best Expert. Machine Learning 32 (1998). [25] RODRIGUES , H., S ANTOS , J. R., T URNER , Y.,
[11] HP Cloud. http://hpcloud.com. S OARES , P., AND G UEDES , D. Gatekeeper: Sup-
[12] JAMIN , S., S HENKER , S. J., AND DANZIG , P. B. porting Bandwidth Guarantees for Multi-tenant Dat-
Comparison of Measurement-based Admission Con- acenter Networks. In WIOV (2011).
trol Algorithms for Controlled-load Service. In In- [26] ROUGHAN , M., T HORUP, M., AND Z HANG , Y.
focom (1997). Traffic Engineering with Estimated Traffic Matrices.
[13] K IM , G., PARK , H., Y U , J., AND L EE , W. Vir- In IMC (2003).
tual Machines Placement for Network Isolation in [27] sFlow. http://sflow.org.
Clouds. In RACS (2012). [28] X IE , D., D ING , N., H U , Y. C., AND KOMPELLA ,
[14] K NIGHTLY, E. W., AND S HROFF , N. B. Admission R. The Only Constant is Change: Incorporating
Control for Statistical QoS: Theory and Practice. Time-Varying Network Reservations in Data Cen-
IEEE Network 13, 2 (1999). ters. In SIGCOMM (2012).
[15] L AC URTS , K., M OGUL , J. C., BALAKRISHNAN , [29] Z HANG , Y., ROUGHAN , M., D UFFIELD , N., AND
H., AND T URNER , Y. Cicada: Predictive Guaran- G REENBERG , A. Fast Accurate Computation of
tees for Cloud Network Bandwidth. Tech. Rep. MIT- Large-Scale IP Traffic Matrices from Link Loads.
CSAIL-TR-004, Massachusetts Institute of Technol- In Sigmetrics (2003).
6You can also read