The carbon footprint of a distributed cloud storage

 
CONTINUE READING
The carbon footprint of a distributed cloud storage

                                                                       Lorenzo Posania,b,c , Alessio Paccoiab , Marco Moschettinib
                                                                   a Laboratoire
                                                                             de Physique Statistique, École Normale Supérieure, Paris (FR)
                                                                            b Research
                                                                                   and tech development, Cubbit srl, Bologna (IT)
                                                               c to whom correspondence should be addressed, email: lorenzo.posani@cubbit.io

                                         Abstract
arXiv:1803.06973v2 [cs.DC] 10 Mar 2019

                                         The ICT (Information Communication Technologies) ecosystem is estimated to be responsible, as of today,
                                         for 10% of the total worldwide energy demand - equivalent to the combined energy production of Germany
                                         and Japan. Cloud storage, mainly operated through large and densely-packed data centers, constitutes a
                                         non-negligible part of it. However, since the cloud is a fast-inflating market and the energy-efficiency of
                                         data centers is mostly an insensitive issue for the collectivity, its carbon footprint shows no signs of slowing
                                         down. In this paper, we analyze a novel paradigm for cloud storage (implemented by cubbit.io) [1, 2], in
                                         which data are stored and distributed over a network of p2p-interacting ARM-based single-board devices. We
                                         compare Cubbit’ distributed cloud to the traditional centralized solution in terms of environmental footprint
                                         and power-usage efficiency. We demonstrate how a distributed architecture is beneficial as regards impact
                                         reduction on both data transfers and data storage. Compared to the centralized cloud, the distributed
                                         architecture of Cubbit achieves a ∼ 87% reduction of the carbon footprint for data storage and a ∼ 50%
                                         reduction for data transfers, providing an example of how a radical paradigm shift can benefit both the final
                                         consumer and society as a whole.
                                         Keywords: Carbon footprint, Cloud Storage, Distributed, Peer-to-peer

                                         1. Introduction                                                charging costs, an equivalent energy consumption of
                                                                                                        an additional operating household fridge [5]. Con-
                                            Over the last decades, the acknowledgment of cli-           trary to the electronics market, however, the foot-
                                         mate change by the general audience has driven the             print caused by our online life is much less tangible
                                         law regulations of most western countries towards              and, as a consequence, much less opposed. That,
                                         an increasing awareness of consumptions and effi-              combined with the fast-increasing trend of online
                                         ciency. As a result of this increasingly-tight poli-           presence and internet-accessing devices per capita,
                                         cies, the average per-device consumption of house-             results in an increasing and mostly uncharged en-
                                         hold appliances (fridge, cooling, etc.) has consis-            vironmental impact that shows no signs of slowing
                                         tently decreased in the last 20 years [3]. However,            down [6]. To gain an intuition of the impact of
                                         there is a mostly underestimated factor that sen-              the digital life, one has to consider that any time a
                                         sibly contributes to the environmental impact of               video is streamed from Youtube servers to an iPad,
                                         our daily lives: our use of the Information Commu-             or a photo is accessed on Google Photos or Drop-
                                         nication Technology (ICT) ecosystem or, in other               box, the whole infrastructure that separates the fi-
                                         words, our digital life. An estimation of the to-              nal user to the corporate data center, and the data
                                         tal impact of the ICT ecosystem approaches 1500                center itself, has to be powered to reliably transmit
                                         TWh of annual consumption [4, 5], which roughly                information in both directions.
                                         amounts for 10% of the world energy consumption,
                                         more than the total energy production of Germany                 The process of transmitting information can be
                                         and Japan combined.                                            orders of magnitude more demanding, in terms of
                                            The computation of the per-capita consumption               energy consumption, than storage itself, depending
                                         shows that the sole fact of owning and using a                 on the relative location of the exchanging nodes. In
                                         smartphone constitutes, without considering the                this document, we analyze, using an adaptation of
                                         Preprint submitted to -                                                                               March 12, 2019
the model of Baliga et al. [7], the energy consump-          model of [7]. The analysis relies on the definition of
tion of cloud storage services, and compare it to            the consumption per bit, which is computed by di-
an alternative setup where data is stored on peer-           viding the operating power (W) for the total trans-
to-peer low-consumption devices located in users’            fer capacity (Gb/s), resulting in a Joule/bit mea-
houses and implemented by Cubbit, a technologi-              sure, then converted in J/GB. These units are taken
cal startup focused on distributed cloud services.           from the manufacturer specs sheet, shown in table
                                                             1. These quantities are combined with a set of co-
                                                             efficients that reflect the redundancy of the packet
2. Analysis of centralized cloud consump-                    transmission, the under-operating regime of the in-
   tions                                                     frastructure and the cooling energy, as well as the
                                                             multiplicity of some devices in a single transmis-
  The energy consumption of a cloud storage ser-
                                                             sion (e.g. two ethernet switches at entry points
vice can be divided into two main factors:
                                                             plus another one inside the data center). The av-
 1. the cost of storing the data, i.e. powering and          erage distance between core routers on the network
    cooling the data center (Storage consumption)            is estimated of c.a. 800Km. For full description of
 2. the cost of sending the data from the user to            coefficients and estimations we refer to [7]
    the server and back (Transfer consumption)
   While the first can be estimated from techni-                 Etransfer
                                                                  dcenter =                                (2)
cal specifications of storage equipment, the second                                                        
                                                                    Pes     Pbg     Pg    Ppe      Pc    Pw
needs a more detailed analysis that takes into ac-           = 6× 3      +        +    +2     + 18    +4
                                                                    Ces     Cbg     Cg    Cpe      Cc    Cw
count the public internet infrastructure and the ge-
ographical distance between the user and the server.                       kJ
                                                                 ' 23.9         ,
For both these estimations we refer to the model of                        GB
Baliga et al. [7], where energy consumption is com-             where the prefactor of 6 accounts for redundancy
puted accounting for several factors, including the          (× 2), cooling and other overheads (× 1.5), and the
multiplicity of involved devices, redundancy, cool-          fact that todays network typically operate at under
ing, overbooking (see below).                                50% utilization (× 2); the addends represent, in or-
   To delineate the calculation, we start from the           der, the ethernet switch, the broadband gateway,
storage consumption, i.e. the average power, ex-             the data center gateway, the provider edge router,
pressed in W/TB, necessary to store the payload              the core network, and the relay optical fiber trans-
in hot storage. We updated the technical specifica-          mission. The detailed analysis of pre-factors can be
tions with respect to [7], as hard-disk storage capc-        found in [7]. Briefly, the factor 3 in the ethernet
ity has dramatically improved in the last years. As          switch accounts for the two routers involved in the
a model for data center rack we consider the HP              access to the public internet plus the router located
StoreOnce rack [8]. We take the full-operating es-           inside the data center; the factor 18 in the core net-
timation since it is the one available in the manu-          work accounts for an average of 9 hops (2 baseline
facturer specification sheet, and we consider full ca-       + 7 for the 800km distance between core nodes) of
pacity (no under-usage overhead) for the 12 HDD              internet packets from source to destination, times
in the rack. As an average storage per disk we con-          2 for the redundancy.
sider 8 TB, estimated from the mean of HDD di-
mensions in blackblaze report [9] (' 7.2 TB/Disk).
The consumption per TB is therfore estimated from            3. Analysis of Distributed Cloud consump-
the specs resumed in table 1, considering a factor              tions
2× for cooling [10] and a factor 2 for redundancy
[7]. This estimation gives                                      The distributed architecture of the Cubbit net-
                                                             work relies on the same public internet infrastruc-
                                                             ture delineated in the previous chapter. In the dis-
                          607 W          W                   tributed paradigm, there are two key differences
   Pstorage
    dcenter = 2 × 2 ×             ' 25.3          (1)
                        12 × 8 TB        TB                  with the server-based cloud storage:
  Similarly, we compute the transfer energy, ex-              1. The low energy consumption of ARM devices
pressed in J/GB, following the public internet                   (Marvell ESPRESSObin)
                                                         2
Equipment                   Capacity (mean)       Consumption
           Storage rack powering      HP SO 3620                  12 Disks x 8 TB       607 W (peak)
           Cubbit Cell                Marvell ESPRESSObin         /                     1 W (peak)
           Cubbit Cell                WD Blue HDD                 1.5 TB                1.55 W (peak)

                          Table 1: Equipments - power and capacity of storage equipments

                                                Equipment            Capacity      Consumption
               Data Center gateway router       Juniper MX-960       660 Gb/s      5.1 kW
               Ethernet Switch                  Cisco 6509           160 Gb/s      3.8 kW
               BNG                              Juniper E320         60 Gb/s       3.3 kW
               Provider Edge                    Cisco 12816          160 Gb/s      4.21 kW
               Core router                      Cisco CRS-1          640 Gb/s      10.9 kW
               WDM (800 km)                     Fujitsu 7700         40 Gb/s       136 W/channel

                   Table 2: Equipments - power and capacity of routing equipments. Data from [7]

 2. The geographical proximity between the user              can assume an average number of 2 packet hops in
    and his/her stored data                                  core network routers. This lowers the correspond-
We consider a network of Cubbit Cells [11], each             ing factor 18 in Eq. 4 to a factor 4, accounting for
composed by an ARM-based SBC and a HDD                       two core hops and the redundancy of the packets
(Western Digital Blue) of 1TB or 2TB (estimated              on the network (factor 2). For the same reason,
average 1.5 TB/disk). Each Cell is located in a              the 800km-relay consumption Pw is not taken into
user’s house and connected by ISP internet connec-           account. With respect to Eq. 4 we also ignore the
tion. Files on the cloud are stored with a redun-            data-center-specific terms: one ethernet switch and
dancy factor of 1.5 (Reed Solomon erasure coding             the data center gateway. However, we need to con-
with 24+12 redundancy shards [1, 12]). As done for           sider an additional BNG, since transfers are per-
the centralized cloud, we analyze the consumption            formed through p2p connections between endpoints
of both storage (W/GB), and transfer (J/GB).                 located within an ISP network. The transfer energy
   The Marvell ESPRESSObin has a single-core                 per GB is therefore computed as
peak consumption of ∼ 1W [13], while the embed-
ded WD Blue HDD has a peak consumption of 1.4                                                                
                                                                                  Pes      Pbg     Ppe     Pc
W (1 TB) and 1.7 W (2 TB). We here assume that                 Etransfer
                                                                cubbit   = 6 ×  2      + 2     + 2     + 4
half of the network is composed by 1TB devices and                                Ces      Cbg     Cpe     Cc
half by 2TB devices, giving an average storage of                                                              (4)
1.5TB for an average peak consumption of 1.55W.                                 kJ
                                                                         ' 11.9      .
The storage energy consumption of the Cubbit net-                               GB
work is therefore computed as

                     1 + 1.55 W        W                     4. Comparison between centralized cloud
   Pstorage
    cubbit = 1.5 ×              ' 2.55    .        (3)          and Cubbit distributed cloud
                       1.5 TB          TB
   In Cubbit, shards of the distributed payloads are
preferably distributed in Cubbit Cells that are lo-             The reduction of carbon footprint of Cubbit com-
cated in geographical proximity of the user, since           pared to centralized solutions can be computed by
the distribution of the shards is controlled by the          comparing the storage power and the transfer en-
AI optimization routines of a coordinator server[1].         ergy for typical use case, such as backup plans and
We therefore consider the scenario where data is             frequent access of, for example, a web-hosted video.
stored in nodes at an average distance of 80 km                 By comparing the power needed to store 1 TB on
from the user’s access point. In this scenario, we           the centralized cloud with the corresponding value
                                                         3
for the Cubbit cloud we find
                                                                                           105               DCenter emissions
                                                  W                                        104               Cubbit emissions
               storage    storage
                                                                                                             Saved emissions

                                                                 CO2 emissions (kg/year)
 ∆P storage = Pdcenter − Pcubbit  ' 22.75            , (5)
                                                  TB                                       103
which roughly corresponds to a 87% reduction of                                            102
overall storage consumption:
                                                                                           101
                 ∆P storage
                   storage =' 0.90 .
                  Pserver
                                                       (6)                                 100
  Similarly, the difference in terms of transfer en-                                       10   1
ergy per GB is
                                                                                           10   2

                                                                                                    10   3   10   2  10 1 100 101 102 103
      ∆E   transfer
                      =    transfer
                                −
                          Edcenter     transfer
                                      Ecubbit          (7)                                                        Stored cloud space (TB)
                             kJ        kWh
                      ' 12.0    = 3.33     ,
                             GB         TB                       Figure 1: Comparison between centralized (DCenter) and
                                                                 distributed (Cubbit) clouds in terms of annual carbon emis-
  which corresponds to a 50% reduction of the en-                sions per TB of stored data.
ergy needed to transfer data from the cloud to the
user, and back.
                                                                 average 10 TB of data per day (e.g. 10,000 visual-
4.1. Backup                                                      izations of 100 MB each) the saved energy per year
                                                                 would be
  A backup service hosted on the cloud is char-
acterized by large volumes that are not frequently
accessed. In the context of the carbon footprint,                                               ∆E(10 TB streaming) =
the consumption of a backup plan will, therefore,
be dominated by the storage term. If we consider                                                         10 TB × ∆P storage × 365 × 24h
a storage plan for a professional backup of 10 TB,                                                       + 365 × ∆E transfer × 10 TB
with very small daily access, we find that the total                                                     ' 14, 100 kWh ,                  (9)
energy saved in a year is
                                                                   which roughly corresponds to -7,000 kgCO2 emit-
                                                                 ted per year. Note that these computations assume
 ∆E(10 TB backup) =                                    (8)       that streaming are broadcast to a local audience.
                  storage
   10 TB × ∆P                × 365 × 24h ' 1992 kWh .            While this might be the case for university data,
                                                                 local news, or targeted marketing, it has a limited
By considering a rough factor of 0.5 KgCO2 for                   range of applicability that has to be taken into ac-
each kWh of consumed energy[14], choosing a dis-                 count.
tributed cloud over a centralized one would corre-
spond, for such a backup plan, to a reduced carbon               4.3. Large scale
emission of c.a. -1000 kgCO2/year. On a data cen-                   Finally, if we speculate about the overall data
ter scale, the reduction of kgCO2 emitted per year               volume of a global consumer cloud storage service
for a PB (1000 TB) of cloud storage would therefore              like, for example, Dropbox or Google, values rise
correspond to c.a. -100,000 kg/year.                             dramatically. Such interpolations have to be taken
                                                                 with due caution, since estimations are based on
4.2. Streaming                                                   undisclosed values. For the sake of speculation, we
  The reduction in consumed energy and, conse-                   consider an use base of c.a. 600 millions users. The
quently, in carbon emission, is significantly larger             last disclosed conversion rate from free (2 GB) to
when considering large volumes of data transfers.                premium (2 TB) is around 3%. This results in a
For example, if we consider a regional newspaper                 theoretical data volume of ca. 37.2 · 106 TB of stor-
service hosting 10 TB of data and streaming, on                  age. Considering a factor 5 due to overbooking, it
                                                             4
[4] G. International, How Clean is Your Cloud? Catalysing

                          106            Dcenter emissions                               an energy revolution, Technical Report, 2012.
                                         Cubbit emissions                            [5] M. P. Mills, The cloud begins with coal, Digital Power
                                                                                         Group (2013).
                                         Saved emissions
CO2 emissions (kg/year)

                          105                                                        [6] Cisco vni global fixed and mobile internet traffic fore-
                                                                                         casts,    https://www.cisco.com/c/en/us/solutions/
                          104                                                            service-provider/visual-networking-index-vni/
                                                                                         index.html, web page.
                                                                                     [7] J. Baliga, R. W. Ayre, K. Hinton, R. S. Tucker, Green
                          103                                                            cloud computing: Balancing energy in processing, stor-
                                                                                         age, and transport, Proceedings of the IEEE 99 (2011)
                          102                                                            149–167.
                                                                                     [8] Hpe storeonce data sheet, https://www.hpe.com/, web
                          101                                                            page.
                                                                                     [9] Blackblaze report, https://www.backblaze.com/blog/
                          100                                                            2018-hard-drive-failure-rates/, web page.
                                                                                    [10] Report     on    cooling  costs   in   data    centers,
                                10   3   10   2  10 1 100 101 102         103            https://www.electronics-cooling.com/2007/02/
                                              Transferred data (TB/day)                  in-the-data-center-power-and-cooling-costs-more-than-the-it-eq
                                                                                         web page.
                                                                                    [11] Cubbit website, https://www.cubbit.io/technology,
                                                                                         web page.
Figure 2: Comparison between centralized (DCenter) and
                                                                                    [12] M. Monti, S. Rasmussen, M. Moschettini, L. Posani, An
distributed (Cubbit) clouds in terms of annual carbon emis-
                                                                                         alternative information plan, Technical Report, Work-
sions per TB of daily streamed data.
                                                                                         ing paper Santa Fe Institute, 2017.
                                                                                    [13] Marvell espressobin specs sheet, https://www.http://
                                                                                         espressobin.net/, web page.
gives an estimation of 7.4 106 TB of effective cloud                                [14] Carbon trust energy and carbon conversions, http://
storage. We can make a conservative estimation                                           www.knowlton.org.uk/wp-content/filesEnergy, web
that each user transfers, on average, 50 MB of files                                     page.
from/to the cloud, which implies a daily transfer
volume of c.a. 190 TB. If we plug these estimations
in our model, we obtain a total saved annual en-
ergy, using a distributed architecture rather than a
centralized one, of ∼ 1.5 · 109 kWh, equivalent to
saving carbon emissions in the order of 700 millions
kgCO2 per year.

Acknowledgements

  The authors are grateful to S. Baldi (Cynny
Space srl), A. Rovai and A. Albani for valuable
comments and discussion.

Bibliography

References
 [1] L. Posani, M. Moschettini, A. Paccoia, Cubbit: a dis-
     tributed, crowd-sourced cloud storage service, Invited
     Talk at CERN Workshop on Cloud Storage Services
     (CS3 2018) (2018).
 [2] L. Posani, M. Moschettini, A. Paccoia, Cubbit, the dis-
     tributed cloud, Invited Talk at CERN, CNR Workshop
     on Cloud Storage Services (CS3 2019) (2019).
 [3] Energy      efficiency    and     energy     consump-
     tion in the household sector,            https://www.
     eea.europa.eu/data-and-maps/indicators/
     energy-efficiency-and-energy-consumption-5/
     assessment, web page.

                                                                                5
You can also read