Opening Up the Sky: A Comparison of Performance-Enhancing Features in

Page created by Annette Hawkins
 
CONTINUE READING
Opening Up the Sky:
     A Comparison of Performance-Enhancing Features in
                  SkyDrive and Dropbox
                                                             Herman Slatman
                                                             University of Twente
                                                     P.O. Box 217, 7500AE Enschede
                                                            The Netherlands
                                                   H.Slatman@student.utwente.nl

ABSTRACT                                                                    increasingly making use of cloud storage services, like
Cloud storage services are increasing in popularity and using a             Dropbox, Google Drive and Microsoft SkyDrive, to store and
growing amount of bandwidth on the Internet. Insights on how                share files with great ease. Those cloud storage services already
much traffic is generated is needed for a number of reasons.                generate quite some traffic on the Internet - an educated guess 1
Cloud storage providers are interested in serving their clients             on the total amount of traffic generated by uploading files to
efficiently and effectively, and they want to know how their                Dropbox, is estimated at about 54Gbps - and it is to be expected
product is performing and how they can improve their service.               the amount of traffic due to cloud storage services will further
Internet Service Providers need an indication of the amount of              increase in the future. To maintain the quality of the Internet in
traffic generated by cloud storage. Lastly, users of cloud storage          terms of available bandwidth and latency, predicting the impact
services might want to know how their favorite service                      cloud storage services have and will have on the Internet is
performs. At the moment not much is known about the                         important. To gain a better understanding of the impact of cloud
performance of different cloud storage providers, but this paper            storage services, having knowledge of the internals and the
aims at getting a thorough understanding of those services and              performance of those services is necessary. Not much is known
their impact on the Internet. This paper focuses on Microsoft               about the internals of cloud storage services, but [1] gives a
SkyDrive, as this is the second most popular cloud storage                  great insight in Dropbox’s internals, which is shown to be the
service [1] and because it has been neatly integrated in the                most popular cloud storage provider.
Microsoft Windows operating system.
Microsoft SkyDrive will be compared to Dropbox in terms of                  The goal of this paper is to get a thorough understanding of the
performance-enhancing features. As shown in [1], Dropbox                    Microsoft SkyDrive service, its internals and, specifically, the
storage servers are all located in the United States, which is not          performance of aforementioned service. The main reasons
an optimal solution for clients spread around the world. Also,              Microsoft SkyDrive has been chosen as the research topic, are
the way SkyDrive manages and transfers its files will be                    that it is the second largest service [1] in terms of traffic
analyzed to assert whether SkyDrive has deployed more                       generated and because it has the potential to grow substantially
efficient synchronization strategies than Dropbox.                          in the near future. The latter is because SkyDrive can be
This research contributes to getting to know which technologies             accessed via the Web and several client applications are
the state-of-the-art cloud storage services have or have not                available for different operating systems and because it has
deployed to increase performance and to gain a thorough                     been neatly integrated in the Microsoft Windows operating
understanding of the performance of Microsoft SkyDrive                      system.
compared to Dropbox’s.
                                                                            To say something about the performance of SkyDrive, a
                                                                            comparison against Dropbox will be performed. The main
Keywords                                                                    research question reads the following:
Cloud Storage, Performance, SkyDrive
                                                                                   How does SkyDrive compare to Dropbox, in terms of the
                                                                                    presence of performance-enhancing features?
1. INTRODUCTION
Recent developments show an increased interest in the use of                The main research question is focused on generated traffic and
cloud storage services. Both individuals and enterprises are                the efficiency with which SkyDrive handles files on its service.
                                                                            To answer this research question, the research is split up in
                                                                            three parts, answering the following questions:
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are          How are the administration and transfer of files controlled
not made or distributed for profit or commercial advantage and that                 in SkyDrive?
copies bear this notice and the full citation on the first page. To copy           How are the servers in SkyDrive distributed over the
otherwise, or republish, to post on servers or to redistribute to lists,
                                                                                    world?
requires prior specific permission and/or a fee.
18thTwente Student Conference on IT, January 25, 2013, Enschede, The
Netherlands.                                                                1
Copyright 2013, University of Twente, Faculty of Electrical Engineering,         http://www.extremetech.com/computing/129183-how-big-is-
Mathematics and Computer Science.                                               the-cloud, accessed on 03-10-2012
   Does SkyDrive deploy specific technologies to enhance         3. METHODOLOGY
        its performance, and how do these compare to Dropbox?         Active and passive measurements have been carried out to
                                                                      assess the performance of SkyDrive and to compare it with
The first question was included to gain an understanding of the       Dropbox. These included uploading files to the SkyDrive
operation of SkyDrive. Information gained from this was used          servers and measurements to determine the location of the
to setup the experiments for the two remaining research               servers.
questions. Together these questions give an insight of the
performance of the Microsoft SkyDrive service compared to             Before conducting any active or passive experiments, a lab
Dropbox in terms of features.                                         environment suitable for those experiments was setup. This lab
                                                                      environment consisted of a host PC running Debian
An overview of the SkyDrive will be given in Section 2.               GNU/Linux version 6.0, kernel 2.6.32-5-amd, on which
Section 3 describes the methodology used to conduct this              Wireshark, a popular packet sniffer and network protocol
research. In Section 4 technologies that enhance the                  analyzer, was installed. Windows 7 was installed as a virtual
performance of cloud services are introduced. Section 5               machine. On this virtual machine, the SkyDrive client
compares SkyDrive and Dropbox. The subsequent Section                 application was installed, together with Charles, a web
introduces related work. Section 7, lastly, summarizes the            debugging proxy. Charles is a shareware application that allows
conclusions of this paper.                                            for setting up a local proxy to capture, for example, all data that
                                                                      is sent via SSL/TLS encrypted connections.

2. A BIRD’S EYE VIEW OF SKYDRIVE                                      The setup described above allowed for capturing and analyzing
SkyDrive was initially released by Microsoft in 2007 and has          all traffic that was exchanged during the various experiments,
since then been known under a few different names. At the time        including the encrypted traffic.
of writing, it offers 7 GB of storage for free to new users,
whereas early users could opt-in for a free 25GB if they had
                                                                      3.1 File Administration and Transfers
                                                                      Files differing in size and containing random text were
used the service before the 22nd of April of 2012.
                                                                      uploaded to SkyDrive to determine the way SkyDrive handles
Client applications are available for Windows Vista and               file administration and transfers. At first these uploads were
Windows 7, which can be used to integrate SkyDrive inside             analyzed only using Wireshark, which showed all transfers and
those operating systems. In Windows 8, Microsoft’s newest             administration of files were carried over encrypted connections.
iteration of the operating system, the SkyDrive has been              Charles was used to gain a more thorough understanding of the
integrated natively. Client applications are also available for the   information sent over those encrypted connections. Hostnames
OS X, iOS, Windows Phone and Android operating systems,               used in the service were recorded and the corresponding IP-
covering a broad spectrum of devices. This paper focusses on          addresses were added together with the functionality they
the desktop client for Windows 7.                                     provide.
A web interface to the SkyDrive service is also available, which
is built on HTML5 technologies 2. Amongst other things, it            3.2 Distribution of Servers
supports email-integration, integration with Microsoft Office         Active measurements were performed to assess the
and it features Office Web Apps, in which users can create,           geographical distribution of servers in the SkyDrive service.
view and edit documents right in the browser and store them on        This was done in two consecutive steps. The first step was to
SkyDrive.                                                             find out what hostnames the SkyDrive application in the lab
                                                                      environment would connect to. Wireshark was used to analyze
Users can login to the service with their Microsoft Account           the relevant Internet traffic and it then showed some of the
which is used in all other services provided by Microsoft. Files      hostnames that SkyDrive connects to. Some online
on the service can be shared with other people that have a            investigation showed more hostnames 4,5 to incorporate in this
Microsoft Account, but it is also possible to share files on social   research.
networks, such as Twitter, LinkedIn and Facebook. SkyDrive
maintains an Access Control List (ACL) for every file and             The second step involved setting up a test bed of Planet-Lab
folder 3, which is used to grant users the privileges needed to       servers spread over the world. On those machines the
execute the associated operations on a file. It is possible, for      traceroute and dig commands were executed against the
example, to create an URL for a file which has the property that      hostnames found during the first step, to determine whether the
everyone is allowed to read the file, but not change it. It is also   hostnames would always resolve to the same IP. The IP-
possible to mandate to be logged in before access is granted.         addresses that resulted from this step were all queried against
                                                                      the databases on MaxMind.com and Route.IM to get
On the 15th of November 2012, Microsoft introduced selective          information on their geographical location.
sync, enabling users to control which files are being
synchronized amongst their devices. Updates to the SkyDrive           The results gained from querying those two websites were not
applications for Windows Phone 8 and Android were also rolled         taken for granted though, as research [5] shows these GeoIP
out. According to Microsoft, on November the 15th the amount          services are not always precise, especially on the city-level.
of SkyDrive storage had doubled since the introduction of the         Instead, the results of the queries on those websites have been
desktop and mobile applications on April the 22nd of 2012.            complemented by traceroute timings, to further establish the
                                                                      outcomes.

2                                                                     4
    http://bit.ly/SD-Modern-Web, Introducing SkyDrive for the               http://bit.ly/Upload-Issues-For-ISP,   Microsoft   Answers,
    modern web, built using HTML5, accessed on 29-10-2012                 accessed on 29-10-2012
3                                                                     5
     http://bit.ly/Rebuilding-Permissions, Designing app-centric            http://bit.ly/Low-Bandwidth-Areas,     Microsoft   Answers,
    sharing for SkyDrive, accessed on 07-11-2012                          accessed on 29-10-2012
3.3 Comparison With Dropbox                                            Cloud storage providers can implement a feature called delta
The comparison between Dropbox and SkyDrive has been                   updates, with which it becomes possible to upload a chunk of
based both on a literature survey and active measurements. The         data that has been changed, while leaving the unchanged
literature survey was performed first to create an understanding       chunks of data untouched [1]. An example of an algorithm that
of features that in general improve the performance of cloud           can be used to implement delta updates is the rsync algorithm
services. Google Scholar was used primarily to search for              [7]. Less Internet traffic is generated when delta updates are
relevant sources. Starting point were the very generic terms           implemented in a cloud storage service, as there is no need to
cloud storage and cloud service. Then some more terms were             upload an entire file when only a small part is changed.
introduced in the search queries: for example performance and
infrastructure.
                                                                       4.3 Data Compression
                                                                       Data compression is the act of encoding data in such a way that
Active measurements were then conducted to determine if the            the encoded data takes less bytes to store the same information
features that were found during the literature survey are present      that is present in the original data [4]. When data compression
in SkyDrive. This involved uploading a series of different files       is deployed on the client side of a cloud storage service, files
that were carefully crafted in order to ensure the features would      that are exchanged with the cloud storage facility are
be exploited when they were present in the service. The files          compressed before they are sent over the Internet. This allows
that were uploaded as part of these measurements are described         for less Internet traffic to be generated as, in general, files can
in Section 5.3 and can also be found in Table 3.                       indeed be compressed. RFC2616 describes the HTTP 1.1
Another part of the comparison is the assessment of the                specification [2], which includes a section on compression of
popularity of SkyDrive compared to Dropbox. This is not part           files sent via HTTP. Compression is in fact in widespread 6 use
of the research questions, but was included to be able to say          by websites, saving their users bandwidth and time.
something about the usage of the service. The dataset that was
analyzed as part of this was produced by capturing flow data           4.4 Server Distribution
from a building on the campus of the University of Twente, in          In general, services on the web perform faster and more
which 982 unique IP-addresses were present. These IP-                  efficiently when the client connecting to such a service is close
addresses are assigned statically. The number of unique IP-            to the server [8]. Services that are used on a world wide scale
addresses that connected to a storage server in the SkyDrive           should therefore, ideally, deploy servers distributed all over the
service was recorded. This was put against the number of IP-           world, to guarantee a good performance and quick response for
addresses that connected to a Dropbox storage server. Also, the        all users spread. This is no different in cloud storage services, in
amount of traffic generated in flows was captured.                     which a big amount of data has to be uploaded and downloaded,
                                                                       and therefore server distribution is an important part of the
                                                                       performance of those services.
4. CLOUD STORAGE TECHNOLOGIES                                          4.5 Storage Protocol
A literature survey was conducted to gain an understanding of
                                                                       The storage protocol that is at the heart of a cloud storage
what features affect the performance of a cloud service, and
                                                                       service, and can therefore severely impact the performance of
more specifically, a cloud storage service. A selection of those
                                                                       the service [4]. At a high-level, the protocols may be
features has been made and they are discussed and elaborated
                                                                       implemented in an Application Programming Interface (API).
on in the following subsections.
                                                                       Several options are available, such as Web- and File-based
4.1 Data Deduplication                                                 APIs. Figure 1 shows a diagram with some of the available
Many users store a lot of files in the cloud nowadays. It is           options categorized on access method, and some technologies
perfectly possible some files are uploaded to a cloud storage          corresponding to those options. The most popular APIs are
facility by two or more different users or that it is being stored     REST and SOAP, which are employed by Amazon S3 and
twice or more times by a single user. This could be the case for       Windows Azure for example. The APIs provide for ways to
an e-book for example; it is then unnecessary to save more than        connect to services via a specific interface and specify how
one copy of the e-book in the storage service. This kind of            systems have to communicate with each other, including how
administration is known as data deduplication, in this case,           data is exchanged between each entity and how data is saved on
server-side data deduplication [3], [4]. Data deduplication            the cloud storage servers. Other APIs include Block-based
allows for less Internet traffic to be generated, as files will not    access to cloud storage.
have to be uploaded when they are already present on the cloud         Another part of the storage protocol is the transport protocol
storage facility. In this paper, only client side data deduplication   that is used to transfer the files from a client to the storage
will be considered. Client-side data deduplication can be              servers. An example is of course the TCP/IP stack of protocols,
implemented by creating a mechanism that checks if a file is           that is also being used in HTTP to power the Web. Dropbox, for
already stored on the service, and only uploads a file when it is      example, uses the HTTP and HTTPS application layer protocols
not already present. This saves bandwidth, as files will not be        to transfer its files [1]. The use of HTTP(S) introduces round-
uploaded unnecessarily.                                                trip times, as messages are acknowledged upon receipt. The
                                                                       duration of those round-trip times also influences the
4.2 Delta Updates                                                      performance of cloud storage services.
When a file is created to be stored on a cloud storage facility, it
can in general be assumed that all bytes have to be transferred
over the Internet. In general, files will change over time and
those changes have to be synchronized to the cloud storage
facility.

                                                                       6
                                                                           http://w3techs.com/technologies/details/ce-compression/all/all
Intelligent Transfer Service (BITS) 7. BITS defines new headers
                                                                       on top of the standard HTTP headers. In BITS new sessions are
                                                                       started for every file that has to be uploaded via a Create-
                                                                       Session packet. Files are uploaded in Fragment packets, which
                                                                       contain information on the part of the file that is being uploaded
                                                                       and the data itself. The Fragment packets contain the blocks that
                                                                       were described in the previous paragraph and, as such, are
                                                                       around 1MB in size in the SkyDrive service. Although
                                                                       SkyDrive uses the BITS headers, it does not seem to run on the
                                                                       BITS protocol. Connections to the storage servers use remote
                                                                       port 443, and data is sent encrypted over the network.
                                                                       Connections to the storage server are closed a little after the file
                                                                       transfer is completed.
 Figure 1: Cloud storage access methods showing Web- File-             A continuous connection is present whenever the SkyDrive
and Block-based APIs and others. Figure taken and slightly             application is running. This connection also uses remote port
adapted from [4].                                                      443. It periodically polls a notification server for notifications
                                                                       the application has subscribed for. These notifications include
                                                                       the amount of disk space used, the disk space quota and
5. SKYDRIVE VS. DROPBOX                                                information on files that have been uploaded or updated.
This Section compares SkyDrive and Dropbox. Subsection 5.1
                                                                       When the SkyDrive application is started, authentication is
describes SkyDrive internals. In subsection 5.2 the geographical
                                                                       performed via login.live.com, based on a Windows Live ID.
distribution of servers in SkyDrive will be assessed. In
                                                                       After successfully authenticating, the application registers itself
subsection 5.3 a comparison of data deduplication, delta
                                                                       for notifications on act-3.blu.mesh.com. Notifications are sent
updates and data compression is performed. In Section 5.4 the
                                                                       by a host suffixed with wns.windows.com. Storage operations
popularity of both services is featured. Lastly, subsection 5.5
                                                                       are all performed against a host suffixed with storage.msn.com,
shows how SkyDrive stacks up against Dropbox in a conclusive
                                                                       except in the case of storage via the web interface, which are
summary. It also discusses the results and introduces future
                                                                       performed against hosts suffixed with storage.live.com. Other
work.
                                                                       hostnames associated with the web interface have been omitted
5.1 SkyDrive In-Depth                                                  for brevity.
This section describes technical details of SkyDrive that were of      Table 1 shows the hostnames that are in use by SkyDrive,
interest during the research and is in its totality an answer to the   together with services that are provided by those hostnames.
first research question. Knowledge about the internals of
SkyDrive was needed to setup experiments for the other two
research questions.                                                             Table 1: Hostnames and their use in SkyDrive
Users are identified by a 16-character identifier. This identifier                 Hostname                             Service
is also used for identifying every single file or folder that is       login.live.com                         Authentication
stored on the service. When used as an identifier for files and        *.mesh.com                             Notification subscription
folders, a numerical suffix is added to identify the right entity.     *.wns.windows.com                      Notifications
An example is B222AADFECF84486!1514, where the                         skydrivesync.policies.live.net         Client Policy updates
exclamation mark separates the user- and file-identifier.              skyapi.live.net                        API functions
The application stores a local database in which file metadata         ssw.live.com                           Debug/Statistics
are kept. This metadata includes filename, client-identifier, file-    *.storage.msn.com                      Storage
identifier and a 32-character hash-value. When a file is added, it     *.storage.live.com                     Storage via web
is assigned a provisional file-identifier. These look like
#b18dd088-9f1f-4bb9-aba1-1206. The file is then uploaded to
the storage server and, as soon as the upload is finished, the file    5.2 Server Distribution
is assigned a final file-identifier, which looks like the one          Table 2 shows the hostnames that are in use by SkyDrive to
described in the previous paragraph.                                   store files. The client application runs uploads to exactly one of
                                                                       those hostnames; the one that is used can change over time
When a file gets altered, its hash value is checked. When this         though, as the hostname that should be used by the service is
value is not the same as the one that is present in the database,      explicitly stated in the ClientPolicy.ini file. Every hostname is
the file is uploaded to the storage server again.                      associated with at least two distinct IP-addresses. Together with
Files stored using the native application on Windows are split         the option to change the storage server at runtime due to a
up in blocks. The maximum block-size is set in a configuration         ClientPolicy.ini update, this indicates load balancing is
file (ClientPolicy.ini), which can be found in the local               performed in the service. All hostnames have been traced to the
application data folder. The currently assigned block-size is          United States, using MaxMind.com data, Route.IM data and by
1MB. The SkyDrive application periodically checks online               running traceroute. Most of them resolve to the state of
whether the policies that are set in ClientPolicy.ini have to be       Washington, whereas two IP-addresses where traced to the state
updated, so the block-size might be subject to changes.                California. The region the hosts behind dm1.storage.msn.com
The transfer of files is carried out via HTTPS. Analyzing the          originate from could not be resolved on MaxMind.com, but
headers of the packets using Charles showed that SkyDrive uses
special headers that are defined in Microsoft’s Background
                                                                       7
                                                                           http://bit.ly/Microsoft-BITS, Microsoft TechNet, accessed on
                                                                           14-01-2013.
response times on Route.IM suggest that they are closer to            Stage 1 – LI6000.txt, containing 6000 paragraphs of ‘Lorem
California than Washington.                                           Ipsum’, was uploaded. This resulted in approximately 3.7
                                                                      megabytes being uploaded to the SkyDrive storage server.

    Table 2: Hostnames, associated IP-addresses and locations         Stage 2 – The contents of LI6000.txt were copied, appended to
                 for storage servers in SkyDrive                      the original LI6000.txt and saved, basically doubling the size of
                                                                      the file. This resulted in 7.5 megabytes getting transferred to the
Hostname(s)                  IP address          Ctry.      Rgn.      SkyDrive storage server, which is about equal to the file size of
by1.storage.msn.com          65.54.191.46        US         WA        LI12000.txt.
by2.storage.msn.com          65.54.191.47        US         WA        Stage 3 – The resulting file from Stage 2 was again appended
                                                                      with 6000 paragraphs of ‘Lorem Ipsum’ and saved. The file
blu1.storage.msn.com         65.55.195.238       US         WA        now contains 18000 paragraphs. This resulted in 11.2
blu2.storage.msn.com         65.55.195.239       US         WA        megabytes being sent to the SkyDrive storage server, which is
                                                                      about equal to the file size of LI18000.txt.
dm1.storage.msn.com          157.55.246.46       US         -
                                                                      Stage 4 – Consisted of cutting off the last 15000 paragraphs
                             157.55.246.47       US         -         from the Stage 3 file and saving, resulting in 1.9 megabytes
                             157.55.241.174      US         -         being sent to the SkyDrive storage server, which is about equal
                                                                      to LI3000.txt.
                             157.55.241.175      US         -
                                                                      The above experiment shows SkyDrive does not use delta
sn2.storage.msn.com          207.46.0.174        US         CA        updates. The same measurement was performed with Dropbox
                             207.46.0.175        US         CA        as the storage service.
                                                                      Figure 3 shows the results of both measurements. From the
                                                                      figure it can be concluded that Dropbox does indeed employ
                                                                      delta updates, as the amount of upload traffic does not double
As the above table shows, all storage servers are located in the
                                                                      when doubling the amount of data in the file and that SkyDrive
United States. This means files from all over the world need to
                                                                      does not employ delta updates, as every single byte is sent when
be send there to be stored. As SkyDrive uses TCP at the
                                                                      a file is changed.
transport layer, and closes the connection to the storage server
after a file transfer is completed, this might cripple performance
for users that are not close to the United States. This is because     Table 3: Files and their size as used during measurements
TCP employs a slow-start mechanism. Performance is affected
by the round-trip time between the client application and
                                                                       Filename          File size (Bytes)         File size Rounded
storage server.
                                                                                                                           (MB)
                                                                        LI3000.txt                 1.866.358                       1.9
5.3 Technology Comparison                                               LI6000.txt                 3.732.718                       3.7
Our experiments showed that SkyDrive does not employ data               LI9000.txt                 5.599.089                       5.6
deduplication, delta updates and data compression. The latter          LI12000.txt                 7.465.438                       7.5
can be established from inspecting Figure 2. The file sizes on         LI15000.txt                 9.331.798                       9.3
the horizontal axis correspond to the file sizes in Table 3. They      LI18000.txt                11.198.158                      11.2
all contained a specific number of paragraphs of ‘Lorem Ipsum’         LI21000.txt                13.064.518                      13.1
- text that is often used as dummy text on websites when
                                                                       LI24000.txt                14.930.878                      14.9
designing page layouts 8 -, according to the number that is
                                                                       LI27000.txt                16.797.238                      16.8
present in their filename. The reason ‘regular’ text and no
random data was inside the files, is because of the possible data      LI30000.txt                18.663.598                      18.7
compression on files in the service. When random data is
inside, the compression rate might well be 0%, which is not the
case when regular text is used. Files were built in a modular
manner to exploit the features of data deduplication and delta
updates. The graph shows a linear progress in upload traffic
when the size of the file that is being uploaded increases. The
amount of upload traffic is bigger than the file size. This
overhead contains the information needed to administer the
upload of the file. The absence of data compression can be
concluded from the fact that the amount of bytes uploaded is
bigger than the amount of bytes the files consist of. The
derivative, or direction coefficient, in Figure 2 is about equal to
1.006, whereas employment of data compression would have
shown a derivative smaller than 1.0. In Dropbox, according to
[1], data compression is present. This can also be established        Figure 2: Upload Traffic observed when uploading the
from inspecting Figure 2.                                             ‘Lorem Ipsum’ files to SkyDrive.
To discover whether SkyDrive employs delta updates an
experiment was setup that consisted of four stages:
                                                                      The absence of client-side data deduplication in the SkyDrive
                                                                      service has been established by uploading LI3000.txt to the
                                                                      storage servers five times, each time to a different folder.
8
    http://www.lipsum.com/                                            Analysis of the traffic generated showed that the file was
uploaded to the storage servers in its entirety each time. From
this fact can be concluded that SkyDrive does not keep track of
files that are already present on the storage servers for a specific
user and so does not perform client-side data deduplication to
save upload bandwidth. Dropbox does employ client-side data
deduplication on a per-user basis. The results of this experiment
are shown in Figure 4. Note that only during the first upload the
(compressed) bytes are uploaded to Dropbox, whereas the entire
file is sent uncompressed every time to SkyDrive.

Figure 3: Amount of uploaded bytes under common file
operations, e.g. appending and deleting text.

                                                                       Figure 5: Number of unique IP-addresses connecting to a
                                                                       SkyDrive or Dropbox storage server during two different
                                                                       timespans

Figure 4: Upload traffic observed when uploading
LI3000.txt to five different folders.

5.4 Service Popularity
The popularity of the SkyDrive service was measured by
monitoring the unique IP-addresses that connected to a storage
server each day, in a building on the campus of the University
of Twente. This gives a good indication of popularity, as clients
would only connect to a storage server when they upload a file.
The same was performed with Dropbox as storage service. The
top part of Figure 5 shows the measurement for the period from
the 1st of June till the 5th of July. It shows more unique IP-
addresses connecting to a Dropbox server than there are unique
IP-addresses connecting to a SkyDrive storage server. The
bottom part of Figure 5 shows the number of unique IP-
addresses that connect to a storage server in the SkyDrive and
Dropbox service in the period spanning from September the
19th till October the 22th. The graph shows SkyDrive is roughly
at 1/6th of unique IP-addresses as compared to Dropbox. An
decrease of about 10.7% was observed. The number of IP-
addresses connecting to a Dropbox storage server remained
pretty stable. A decrease of 2.3% was observed.                        Figure 6: Amount of traffic generated during two different
                                                                       timespans
The amount of traffic generated during flows was also                 paper. In this paper SkyDrive has been researched, and it is
measured. Figure 6 shows the sum of downloaded and                    shown to be the second most popular cloud storage provider.
uploaded MB to SkyDrive and Dropbox storage servers. Two              Another paper on the performance of cloud storage is [3]. In
different timespans were used again, and they roughly                 this paper Dropbox is discussed amongst three other cloud
correspond to the timespans in Figure 5. The amount of traffic        storage providers. The performance was measured while
generated by interacting with SkyDrive storage servers during         making and restoring an online backup. The methodology is
the September/October timespan has increased by                       very similar to the one in this research, but the SkyDrive service
approximately 178.8% compared to the measurements from                is used and examined in this research. Also, we address some
June. This includes both up- and downloaded bytes. The                features that enhance the performance of cloud storage
amount of traffic generated by interacting with Dropbox storage       providers.
servers decreased by approximately 14.0% during that same
timespan.                                                             A paper in which the optimization of cloud storage systems is
                                                                      discussed is [6]. This paper describes which factors influence
5.5 Discussion                                                        the performance of cloud storage systems and current issues on
Table 4 shows the described features and briefly summarizes           existing services. These were used to understand the
the findings of Sections 5.2 and 5.3. As written before,              performance of Microsoft SkyDrive and compare it effectively
SkyDrive does not employ client-side data deduplication, delta        with Dropbox.
updates nor data compression, as opposed to Dropbox. Reasons          In [4],[9] and [10] a couple of features that make cloud storage
for this, and these are conjectures only, could include that the      services perform more efficiently are introduced and discussed.
development team of SkyDrive was under the impression the             These features were used in the literature survey on the
current state of the Internet provides for enough bandwidth to        performance comparison between SkyDrive and Dropbox.
handle the operation of the service in its current form. Also,
Microsoft owns an infrastructure that provides a lot of storage
space and bandwidth and can therefore offer SkyDrive in its
current form. In contrast, there is Dropbox, which has to
                                                                      7. CONCLUSIONS
                                                                      We have performed an analysis of the SkyDrive application to
squeeze out every single bit of bandwidth as it has to pay for the
                                                                      gain an understanding of its internals. We have established the
rent of storage space and the amount of bandwidth uploaded to
                                                                      service stores files over HTTPs using headers available in the
these servers to Amazon S3, which indicates why Dropbox
                                                                      Microsoft BITS service and maintains a local database of files
employs various technologies to reduce the amount of
                                                                      stored online. As soon as a file is changed, the file is sent
bandwidth generated and bytes stored.
                                                                      encrypted to the storage server. This answers the question how
Both SkyDrive and Dropbox do not employ geographical                  SkyDrive administers and handles its files.
distribution of the user’s data on a world-wide scale, as both
                                                                      The measurements have also shown that the storage servers,
services store files in the United States. Reason for this could be
                                                                      against which most traffic in the SkyDrive service is performed,
that Microsoft’s infrastructure is based there, and they felt no
                                                                      are all located in the United States. This does not differ from
need to distribute the data geographically. As explained, the
                                                                      Dropbox however, as both services do not employ the strategy
distance packets have to travel, influences the speed with which
                                                                      that Content Delivery Networks employ to speed up the up- and
this happens, and thus influences the speed with which files can
                                                                      download speed by deploying servers close to clients. This
be uploaded to the service.
                                                                      answers the question how the servers in the SkyDrive service
                                                                      are distributed over the world.
 Table 4: Technologies and their presence in SkyDrive and             Experiments conducted during this research have shown that
                         Dropbox                                      SkyDrive does not employ client side data deduplication, data
                                   Cloud Storage Provider             compression nor delta updates, as opposed to Dropbox. This
       Technology                                                     answers the third research question.
                                 SkyDrive             Dropbox
Client-Side Data Dedupl.      No                  Yes                 From those three conclusions we conclude that the Microsoft
Delta Updates                 No                  Yes                 SkyDrive service is inferior to Dropbox in terms of the presence
                                                                      of performance-enhancing features. The distribution of storage
Data Compression              No                  Yes
                                                                      servers in SkyDrive is setup in the same way as in Dropbox.
Server Distribution           US                  US
                                                                      However, no performance-enhancing features that are available
Storage Protocol              Via HTTPS           Via HTTPS
                                                                      in Dropbox are available in SkyDrive. This results in quite
                                                                      some bandwidth being squandered by SkyDrive.
Future work in this field could be conducted on other providers
of cloud storage, to determine whether other technologies have
been deployed to enhance performance of those services. The           8. ACKNOWLEDGEMENTS
usage of the web interface to SkyDrive could be investigated          This paper has been written as part of the ‘Broadband for All’
also, to get a more thorough understanding of the service and         track of the Bachelorreferaat course at the University of
how it performs compared to Dropbox. Also, the way cloud              Twente, which is supervised by the ‘Design and Analysis of
storage services are being utilized by clients could be               Communication Systems’-group (DACS) of the University of
investigated, to gain a better understanding of the typical usage     Twente. I would like to thank my supervisor, I. Drago, for his
of cloud storage services.                                            continuing support and insights during my work.

6. RELATED WORK                                                       9. REFERENCES
As written before, [1] provides for a thorough understanding of       [1] Drago, I., Mellia, M., Munafò, M.M., Sperotto, A., Sadre,
the Dropbox service. Its performance is clearly discussed in the          R. and Pras, A. 2012. Inside Dropbox: Understanding
Personal Cloud Storage Services. In Proceedings of the      [6] Spillner, J., Müller, J., Schill, A. 2012. Creating optimal
    12th ACM SIGCOMM Conference on Internet                          cloud storage systems. Future Generation Computer
    Measurement. IMC ’12. Pages 481-494. DOI=                        Systems. 16 June 2012.
    http://dx.doi.org/10.1145/2398776.2398827                        DOI=http://dx.doi.org/10.1016/j.future.2012.06.004
[2] Fielding, R., e.a. 1999. RFC2616 - Hypertext Transfer       [7] Tridgell, A., Mackerras, P. 1996. The rsync algorithm.
    Protocol – HTTP 1.1. Available on                                Joint Computer Science Technical Report Series, TR-CS-
    http://www.ietf.org/rfc/rfc2616.txt                              96-05
[3] Hu, W., Yang, T., Matthews, J.N. 2010. The good, the bad    [8] Vakali, A., Pallis, G. 2003. Content delivery networks:
    and the ugly of consumer cloud storage. ACM SIGOPS               status and trends. IEEE Internet Computing, Vol 7, Issue 6,
    Operating Systems Review, Vol 44, Issue 3, July 2010,            Nov-Dec 2003, pages 68-74. DOI=
    pages 110-115.                                                   http://dx.doi.org/10.1109/MIC.2003.1250586
    DOI=http://dx.doi.org/10.1145/1842733.1842751               [9] Wang, L., et. al. 2010. Cloud Computing: a Perspective
[4] Jones, M. T., 2010. Anatomy of a cloud storage                   Study. New Generation Computing, Vol 28, Issue 2, April
    infrastructure. IBM developerWorks. Available on                 2010, pages 137-146.
    http://www.ibm.com/developerworks/cloud/library/cl-              DOI=http://dx.doi.org/10.1007/s00354-008-0081-5
    cloudstorage/. Also available as PDF.                       [10] Zeng, W., Zhao, Y., Ou, K., Song, W. 2009. Research on
[5] Poese, I., Uhlig, S., Kaafar, M.A., Donnet, B., Gueye, B.        cloud storage architecture and key technologies.
    2011. IP geolocation databases: unreliable? ACM                  Proceedings ICIS ’09, pages 1044-1048. DOI=
    SIGCOMM Computer Communication Review, Vol 41,                   http://dx.doi.org/10.1145/1655925.1656114
    Issue 2, April 2011, pages 53-56. DOI=
    http://dx.doi.org/10.1145/1971162.1971171
You can also read