PonD : Dynamic Creation of HTC Pool on Demand Using a Decentralized Resource Discovery System

Page created by Nicholas Walker
 
CONTINUE READING
PonD : Dynamic Creation of HTC Pool on Demand Using a
        Decentralized Resource Discovery System

                     Kyungyong Lee                                      David Wolinsky                    Renato Figueiredo
                  University of Florida                               Yale University                     University of Florida
                 ACIS Lab. Dept of ECE.                            Computer Science Dept.                ACIS Lab. Dept of ECE.
                   klee@acis.ufl.edu                           david.wolinsky@yale.edu                   renato@acis.ufl.edu

ABSTRACT                                                                             Categories and Subject Descriptors
High Throughput Computing (HTC) platforms aggregate                                  C.2.4 [COMPUTER-COMMUNICATION NETWORKS]:
heterogeneous resources to provide vast amounts of comput-                           Distributed Systems
ing power over a long period of time. Typical HTC systems,
such as Condor and BOINC, rely on central managers for                               1.   INTRODUCTION
resource discovery and scheduling. While this approach sim-
                                                                                        High Throughput Computing (HTC) refers to comput-
plifies deployment, it requires careful system configuration
                                                                                     ing environments that deliver vast amounts of processing
and management to ensure high availability and scalability.
                                                                                     capacities over a non-negligible period of time (e.g., days,
In this paper, we present a novel approach that integrates a
                                                                                     months, years). This continuous computing throughput is
self-organizing P2P overlay for scalable and timely discov-
                                                                                     a vital factor in solving advanced computational problems
ery of resources with unmodified client/server job scheduling
                                                                                     for scientists and researchers. Widely-used HTC middle-
middleware in order to create HTC virtual resource Pools on
                                                                                     ware such as Condor [1] and BOINC [2] enable sharing of
Demand (PonD). This approach decouples resource discov-
                                                                                     commodity resources connected over a local or wide-area
ery and scheduling from job execution/monitoring — a job
                                                                                     network. Commonly, these systems use central managers to
submission dynamically generates an HTC platform based
                                                                                     aggregate resource information and to schedule tasks. While
upon resources discovered through match-making from a
                                                                                     this approach provides a straightforward means to discover
large “sea” of resources in the P2P overlay and forms a
                                                                                     available resources and schedule jobs, it can impose admin-
“PonD” capable of leveraging unmodified HTC middleware
                                                                                     istrative overheads and scalability constraints. Namely, the
for job execution and monitoring. We show that job schedul-
                                                                                     central manager needs to be closely administered, and the
ing time of our approach scales with O(log N ), where N is
                                                                                     failure of the manager node can render its managed resources
the number of resources in a pool, through first-order analyt-
                                                                                     unavailable for scheduling new tasks.
ical models and large-scale simulation results. To verify the
                                                                                        The growing adoption of on-demand computing-as-a-service
practicality of PonD, we have implemented a prototype us-
                                                                                     cloud provisioning models and a large number of resources
ing Condor (called C-PonD), a structured P2P overlay, and
                                                                                     in data centers motivate the need for solutions that scale
a PonD creation module. Experimental results with the pro-
                                                                                     gracefully and tolerate failures, while limiting management
totype in two WAN environments (PlanetLab and the Fu-
                                                                                     overheads. These traits are commonly found in peer-to-
tureGrid cloud computing testbed) demonstrates the utility
                                                                                     peer (P2P) systems; nevertheless, the current prevailing ap-
of C-PonD as a HTC approach without relying on a central
                                                                                     proaches for job scheduling rely on specialized configuration
repository for maintaining all resource information. Though
                                                                                     of responsibilities for resources, e.g. for information col-
the prototype is based on Condor, the decoupled nature of
                                                                                     lection and job match-making. With proper configuration,
the system components - decentralized resource discovery,
                                                                                     such approaches have been demonstrated to scale to several
PonD creation, job execution/monitoring - is generally ap-
                                                                                     thousands of resources.
plicable to other grid computing middleware systems.
                                                                                        Scalability limitations of HTC middleware beyond cur-
                                                                                     rent targets have been addressed in the literature. Raicu et.
                                                                                     al. [3] addressed the inherent scalability constraint in exist-
Keywords                                                                             ing HTC schedulers by proposing a fast and lightweight task
Resource discovery; self-configuration; P2P; virtual resources;                      execution framework which demonstrated improved scala-
high-throughput computing                                                            bility in a pool of tens of thousands resources processing
                                                                                     millions of tasks. Sonmez et. al [4] measured the workflow
                                                                                     scheduling performance in multi-cluster grids in simulated
                                                                                     and real environments. The real-environment experiment
Permission to make digital or hard copies of all or part of this work for            was executed on the DAS-3 (a multi-cluster grid system in
personal or classroom use is granted without fee provided that copies are            the Netherlands) revealing that limited capabilities of head-
not made or distributed for profit or commercial advantage and that copies           nodes result in overall system performance degradation as
bear this notice and the full citation on the first page. To copy otherwise, to      workflow size increases.
republish, to post on servers or to redistribute to lists, requires prior specific      In this paper, we propose a novel HTC middleware ap-
permission and/or a fee.
HPDC’12, June 18–22, 2012, Delft, The Netherlands.                                   proach, PonD, which leverages a scalable and flexible P2P
Copyright 2012 ACM 978-1-4503-0805-2/12/06 ...$10.00.                                decentralized resource discovery system to form a job execu-
tion Pool-on-Demand (PonD) dynamically from a large-scale            N4O" ./01"!"21302"4"512/6571"0827/915:""
                                                                                                                          N?O" B12/6571"#//C"D81
ments (ClassAds) [7] that support a flexible and expressive                                         t1
                                                                                        T %%%%
                                                                             !"#$%&'(%)%%%%%%%%%%
                                                                                              i
                                                                                                                       !"#$%&'(%)%*%%%%%%%%%
way to describe job characteristics and requirements as well
as resource information in order to determine matches.
   In Condor, nodes can play the role of central managers,                    t1                                       t2
worker nodes, or job submitters. Roles are determined by              0                  Ti                                                    Td
running the appropriate daemons (i.e., condor startd, con-
dor schedd, condor collector, and condor negotiator ), and a       Figure 2: Condor negotiation procedure. Td means the de-
single node may assume multiple roles.                             fault scheduling period, and Ti means the minimum inter-
   condor startd: This daemon is responsible for advertis-         scheduling time. A job submitted at t1 , where t1 < Ti , has
ing resource capabilities and policies expressed using Clas-       to wait Ti − t1 for matchmaking.
sAds to the condor collector daemon of central manager as
well as receiving and executing tasks delegated by the cen-
tral manager. A node that runs this daemon is regarded as
                                                                     Though the approach of flocking provides opportunities
a worker node.
                                                                   to share resources across different domains with bounded
   condor schedd: A node which wants to submit a job
                                                                   management overheads at a central manager, the cost of
runs this daemon. condor schedd is in charge of storing
                                                                   sequentially traversing pools in a flock list introduces non-
user-submitted jobs at a local queue and sending resource
                                                                   negligible additional scheduling latencies at scale. Condor
requests to a condor collector at a central manager node.
                                                                   maintains a default negotiation period, Td , and a minimum
   condor collector: The role of this daemon is collecting
                                                                   inter-negotiation time, Ti where Td ≥ Ti , as system config-
resource information expressed using ClassAds and a list
                                                                   urations. Upon job submission, the negotiator checks if Ti
of condor schedd with inactive jobs in their queue. A con-
                                                                   has elapsed since the last negotiation cycle. If so, it starts a
dor startd periodically updates resource information Clas-
                                                                   new negotiation cycle immediately. Otherwise, it waits until
sAds to their condor collector. condor collector also retrieves
                                                                   Ti elapses. The procedure is explained in Figure 2. In order
a list of inactive jobs from condor schedd daemons at the
                                                                   to analyze the effect of the minimum inter-negotiation time
time of negotiation.
                                                                   to the overall waiting time in a queue until a matchmaking,
   condor negotiator: This daemon is responsible for match-
                                                                   the expected waiting time can be computed as follows.
making between inactive jobs and idle resources. At each ne-
gotiation cycle, this daemon retrieves a list of inactive jobs                                      Z       Td
and resource ClassAds from condor collector daemon.                                           1
                                                                          E(Twait ) =                            (Twait at time t) dt
   Usually, a condor pool consists of a single central manager                                Td        0
that runs both an instance of the condor collector and con-                                     Z Ti               Z Td
                                                                                           1
dor negotiator daemons along with many worker nodes and                                  =    ×(     (Ti − t) dt +      0 dt)                       (1)
submitters which run condor startd and condor schedd dae-                                  Td    0                  Ti

mons, respectively. In order to provide reliable services in                                    Ti 2
case of central manager node failure, fail-over central man-                             =                                                          (2)
                                                                                              2 × Td
agers can be deployed using the high availability daemon.
                                                                      Equation 1 describes the waiting time of a job upon arrival
   In order to enable cross-domain resource sharing, Con-
                                                                   at time t that can be explained with Figure 2. In the default
dor has two extensions, Condor-Flock [8] and Condor-G [9].
                                                                   Condor setup, Td is 60 seconds and Ti is 20 seconds. Thus,
With these approaches, each node maintains a pre-configured
                                                                   the expected waiting time at each central manager during a
list of other pools’ central managers. Each central manager
                                                                   flocking procedure is 3.3 seconds. Considering the sequential
node also has to keep a list of allowed remote submitters.
                                                                   traversal of central managers during flocking, the waiting
To overcome the pre-configuration requirements of condor
                                                                   time increases linearly.
flocking, Butt et al. [10] presented a self-organizing Con-
dor flock that was built upon a structured P2P network. A          2.2    P2P system
locality-aware P2P pool is formed with central managers of
                                                                      P2P systems can provide a scalable, self-organizing, and
each condor pool. When a central manager has available
                                                                   fault-tolerant service by eliminating the distinction between
resources, it announces to other managers via the P2P over-
                                                                   service provider and consumer. C-PonD uses a structured
lay. When a central manager requires additional resources,
                                                                   P2P network, Brunet [12], which implements the Kleinberg
it checks announcement messages from other central man-
                                                                   small-world network [13]. Each Brunet node maintains two
agers to determine where available resources are located.
                                                                   types of connections: (1) a constant number of near connec-
While our approach shares similar features to self-organizing
                                                                   tions to its nearest left and right neighbors on a ring using
Condor-flock [10] (self-configuration, and dynamic resource
                                                                   the Euclidian distance in a 160-bit node ID space, and (2)
instantiation in response to jobs in the queue), the core
                                                                   approximately log(N ) (where N is the number of nodes) far
mechanism for discovering nodes and self-organizing pools
                                                                   connections to random nodes on a ring, such that the rout-
based on unstructured P2P queries with a logarithmically
                                                                   ing cost is O(log(N )). The overlay uses recursive greedy
increasing overhead is a key differentiating factor.
                                                                   routing in order to deliver messages.
   Condor Glide-in [11] also provides a way to share idle
resources across different administrative domains without a        2.3    Tree-based Multicast on P2P
pre-configured list of other pools’ central managers. When
                                                                     By leveraging existing connections in a structured P2P
jobs are waiting in a queue of the condor schedd daemon,
                                                                   system, Vishnevsky et. al [14] presented a means to create
pilots are created to find available resources for execution. In
                                                                   an efficient distribution of messages in Chord and Pastry us-
order to find available resources, a Glidein Factory manages
                                                                   ing a self-organizing tree. Each node recursively partitions
a list of available resources across multiple Condor pools.
                                                                   responsible multicast regions by using the routing informa-
tion at each node. DeeToo [15] presents a similar tree cre-                !"#$%&"'"()*+,+-."'/&012345+
ation method on a small-world style P2P network, which is                  !6(7+,+."'/&08+!")$&(+)/9+:+&6(7+;6
discovery. NodeID:Memory Size (underline) is the current                                                   J               I              H
                                                                                          K                                                          G
resource status of each node, and NodeID:Memory Size (italic)            L
                                                                                                                                                                    F
is an aggregation result. Node1 (the root node) initiates a                     Loosely Coupled Structured P2P Connection
resource discovery query by specifying a requirement: e.g.              M
                                                                                                                                                         E
a node’s memory should be larger than 1GB. Nodes satis-                                A                   B                   C            D
fying the requirement are ordered based on Memory size,                                                                               Nodes that will Join a PonD
                                                                              schedd           startd
and the two with the largest sizes are returned. The narrow
                                                                             negotiator       collector
line shows query propagation using a self-organizing multi-
cast tree. The Aggregation module orders child nodes’ result                                        Resource Discovery Query to All Nodes
and returns a list of required number of nodes (thick line).        Job Submission                        Send Invitation to join PonD
                                                                         event
                                                                                                               ACK for Joining the PonD
3.1.3    First-Fit Mode                                                                                                                                 Change
                                                                                                          Job Scheduling and Execution
   In order to maximize the efficiency of aggregation and to         Job Complete                                                                    CONDOR_HOST
discover the list of nodes that maximize rank, a parent node             event                                          ...
in a recursive-partitioning tree waits until all child nodes’ ag-                                         Send PonD Destroy Message
gregated results are returned. However, if it is acceptable to                                                 ACK for leaving the PonD
execute tasks on resources that satisfy the requirement but
might not be the best-ranked ones, the query response time          Figure 4: Steps to create a C-PonD through dynamic condor
can be improved by executing a resource discovery query in          re-configuration.
First-Fit mode. With First-Fit, an intermediate node in a
tree can return an aggregated result to its parent node as
soon as the number of discovered nodes are larger than the            Figure 4 shows an overall procedure of creating a PonD
number of requested nodes.                                          with Condor for job execution. In the figure, all resources
                                                                    are connected through a structured P2P network, and every
3.1.4    Redundant Topology to Improve Fault-Tolerance              node runs condor negotiator, condor collector, condor startd,
   While a tree architecture provides a dynamic and scalable        and condor schedd daemon locally. We will show how Node
basis for a parallel query distribution and result aggregation,     A creates a PonD for a job execution.
it might suffer from result incompleteness due to internal
node failures during an aggregation task. For instance, in          ◦ Upon detecting a job submission, Node A sends a re-
Figure 3, the failure of Node 3 in the middle of query pro-          source discovery query with a job requirement, rank cri-
cessing would result in the loss of the query results for nodes      teria, and the number of required nodes, which are derived
6, 7, and 8. In order to reduce the effect of internal node          from the ClassAds of the local submission queue in the
failure to completeness of the query result, we propagate a          condor schedd.
query concurrently through a redundant tree topology.               ◦ Upon the query completion, Node A sends a PonD join in-
   One such method is to execute parallel requests in oppo-          vitation to nodes that are discovered from the query. When
site directions in the overlay. In a recursive-partitioning tree     receiving the invitation, nodes that accept it set “CON-
discussed in Section 2.3, a node is responsible for a region in      DOR HOST” value of condor startd daemon to the address
its clockwise-direction; node n2 is assigned a region [n2, n3),      of Node A.
note that n2 < n3. In a counter-clockwise direction tree, a         ◦ Nodes send ACK message to confirm joining the PonD.
node n2 is allocated a region (n1, n2]. This approach can           ◦ When a PonD is created, Node A becomes the central
compensate missing results in one tree from another if the           manager of the PonD, and condor negotiator daemon of
missing region is not assigned to failed nodes in both trees.        Node A schedules inactive jobs to available resources. The
                                                                     scheduled jobs are executed using condor starter of worker
3.2     PonD self-configuration                                      nodes and condor shadow daemon of Node A.
   After acquiring a list of available resources from a re-         ◦ Upon job completion, Node A sends a PonD destroy mes-
source discovery module, invitations are sent to those nodes         sage to nodes in the PonD. When receiving the message,
to join a PonD of a job submitter. As invited nodes join a           nodes set “CONDOR HOST” value to the address of itself.
PonD, deployed HTC middleware takes roles of task execu-
                                                                    ◦ Each node sends an ACK message to Node A when leaving
tion and monitoring. In C-PonD, which is an implementa-
                                                                     the PonD.
tion of PonD with Condor, we leverage system configuration
“CONDOR HOST” to create/join a PonD dynamically.                      Note that every node in the resource pool can create a
   Condor job scheduling is performed at condor negotiator          PonD. It is also possible for a node to discover available
daemon with available resource information and inactive             resources for other PonDs in case a central manager of a
job lists that are fetched at the time of negotiation from          PonD is not capable of executing resource discovery query.
the condor collector daemon. Generally, the two daemons
run at the central manager node, while worker nodes (con-           3.2.1      Leveraging virtual networking
dor startd daemon) and job submit nodes (condor schedd                C-PonD is designed to provide an infrastructure that sup-
daemon) identify the central manager using a condor con-            ports scaling to a large number of geographically distributed
figuration value “CONDOR HOST”, which can be set inde-              resources in a wide area network — a common environ-
pendently for daemons on the same resource. The “CON-               ment in Desktop Grids. Due to the constraints of Net-
DOR HOST” configuration can be changed at run-time by               work Address Translation (NAT)/firewalls, however, sup-
using Condor commands (e.g., condor config val and con-             porting direct communication between peers for PonD in-
dor reconfig).                                                      vitation/join, job execution/monitoring using Condor dae-
mons is one of essential features for a successful deployment                   !"#$%&'#()&*'%                   2*4'1)5+,6%
in the real-world. In order to address this requirement, we                  +,-(&."/(,%)01"$'%   2(.'%'3',$#%   *(.05'$'%
adopt IP-over-P2P (IPOP) [26, 27] to build a virtual net-
working overlay that enables routing IP messages atop a                       t1                   t2            t3
structured P2P network. This approach has benefits with
respect to self-management, resilience to failures, and ease-
                                                                  Figure 5: Based on the event at t2 (e.g., attribute value
of-use from the user’s perspective. The practicality of this
                                                                  change, node crash), scheduling result at t3 might be stale.
approach is well presented by GridAppliance [28] as the au-
thors demonstrated over several years of operation in a WAN
environment.
   These network challenges present a problem for systems         scripts. By using Condor intact, we reduce the possibility
deriving from Condor GlideIn or Flocking, which need pub-         of bugs in scheduling and job execution/monitoring due in
licly available IP addresses on a pilot node, worker/submitter,   large part to the over 20 years use and development of Con-
or central manager node. C-PonD’s use of virtual network-         dor. This feature differentiates C-PonD from many other
ing allows direct messaging between peers after an IP-layer       decentralized HTC middleware which try to build a sys-
virtual overlay network has been established.                     tem from scratch. Reusing the ClassAds module for match-
                                                                  making allows us to inherit most of characteristics of Clas-
3.2.2    Multiple Distinct Jobs Submission                        sAds, such as regular expression matching and dynamic re-
                                                                  source ranking mechanisms. No source code modification
   In the case that multiple jobs with distinct requirements
                                                                  also implies that our approach can be applied to other HTC
are submitted concurrently by a condor schedd, discovered
                                                                  schedulers (e.g., XtremWeb [29], Globus toolkit, and PBS).
resources need to be distinguishable by the initiating query.
                                                                     Resource fair sharing : In order to guarantee fair re-
Let us assume that two distinct jobs, Ji and Jk , are submit-
                                                                  source sharing amongst nodes, a manager node in Condor
ted with requirements Ri and Rk , and a list of discovered
                                                                  records job priority and user priority. Based on the priority,
resources is Ni and Nk , respectively. If Ri ⊂ Rk satisfies,
                                                                  different users can get different levels of services when there
jobs in Ji might suffer from the dearth of resources when
                                                                  is a resource contention. While there is no central point
jobs in Jk are scheduled to resources in Ni ; Ri ⊂ Rk means
                                                                  that keeps global information across all nodes in C-PonD,
that jobs in Jk can be executed in resources of Ni , but there
                                                                  we can leverage DHT, a distributed shared storage in a P2P,
is no guarantee that jobs in Ji can be executed in nodes of
                                                                  by mapping a job submitter ID to a DHT key.
Nk . In order to deal with this issue, we add a ClassAds at-
tribute dynamically to the condor startd about the ID of a
job that matched during resource discovery. Jobs matching         4.    EVALUATION
the ID will be prioritized to run on that resource.                  In this section, we evaluate PonD from different perspec-
                                                                  tives. We use first-order analytical models to determine re-
3.2.3    Minimum Inter-Scheduling Time                            source discovery latency and bandwidth costs. From there,
   Equation 2 describes the effect of the minimum inter-          we perform a simulation-based quantitative analysis of C-
scheduling to the expected queueing time of a job before          PonD in terms of scalability of scheduling throughput with
scheduling. Unlike in a typical Condor environment, where         respect to the number of resources and the matchmaking
one negotiator may deal with multiple job submitters, a ne-       result staleness under various attribute dynamics and node
gotiator in a C-PonD will handle a single job submitter so        uptime. By staleness, we check if scheduled resources still
that we can decrease the minimum inter-scheduling time sig-       satisfy query requirements after matchmaking. Due to dis-
nificantly without posing large processing overheads to a         crepancy between the timestamp of resource information
condor negotiator daemon.                                         and matchmaking, scheduled nodes might no longer satisfy
                                                                  requirements after matchmaking. For instance, in Figure 5,
3.3     Discussion                                                a scheduling is completed at t3 with resource information
   Overheads : While Condor provides a scalable service           that was updated at t1 , where t1 < t3 . If an event happens
by deploying multiple managers and fault-tolerant capabil-        at t2 that changes status of the scheduled nodes, such as at-
ity with duplicated fail-over servers, the administrative over-   tribute value change or crash, the scheduling result that was
head to configure and manage these servers is non-negligible      conducted with resource information at t1 might not satisfy
and can incur significant communication costs at scale. In        job requirements at t3, and we define this result discrepancy
contrast, C-PonD self-configures manager nodes on demand,         as stale. We complete the evaluation by examining the per-
and because of its ability to reuse unmodified middleware,        formance of C-PonD prototype deployed on actual wide-area
it can be extended to discover and self-configure nodes for       network testbeds.
high-availability fail-over services. Though a PonD is still
a centralized architecture with a central manager node (a         4.1    System analysis
PonD creator) and multiple worker nodes, failures are iso-           Scheduling Latency : The scheduling time of C-PonD
lated. A crash of a central manager node does not keep            with respect to the number of resources (N ) is primarily de-
other nodes from creating other PonDs. Worker nodes that          termined by the latency of a resource discovery query. A
are registered at a PonD periodically check the status of the     recursive partitioning tree is known to have O(log N ) tree
central manager of the PonD. If the central manager is de-        depth [15], which determines the query latency, as the num-
tected to be offline or inactive, it withdraws from the PonD.     ber of nodes increases. Note that the query latency can
   Middleware reuse : C-PonD uses unmodified Condor               be decreased using the First-Fit mode described in Sec-
binary files. This is made possible by the use of a decentral-    tion 3.1.3. After discovering available resources, pool in-
ized resource discovery module and run-time configuration         vitations are sent in parallel (with O(log N ) routing cost)
and a PonD is created as worker nodes join with a cost of                                                @8A08>6B81C34D6B81C3.E>5$    @6F80G34D3.E>5$

O(1). Thus, the asymptotic complexity of the time to create                                          #!!!!!$
a C-PonD is O(log N ).
                                                                                                      #!!!!$

                                                                       ,-./012$34/145$6$789$41-7/$
   Condor performs matchmaking sequentially node by node,
                                                                                                       #!!!$
which results in linearly increasing pattern with respect to
the number of resources. Though the scheduling latency                                                  #!!$
might not be a dominant factor in a pool with modest num-                                                #!$
ber of resources considering the overall job execution time                                               #$
(such as within a C-PonD), the linearly increasing pattern
                                                                                                        !"#$
can become a non-negligible overhead as the pool size in-
creases.                                                                                               !"!#$
                                                                                                            %!!$      #&!!$    &%!!$    '(&!!$     #!'%!!$ %!)&!!$   #&*+%!!$
   If we consider Condor-flocking in a large scale environ-                                                                   :;$8?$>/48;>1/4$6$789$41-7/$
ment, the matchmaking time at each central manager might
not be a prevailing factor for a scheduling latency, because     Figure 6: Scheduling latency increasing pattern of Condor-
we can assume that the number of resources will be evenly        flock, C-PonD, and Condor in various pool sizes. sim shows
distributed amongst flocking servers. However, as Equa-          a simulation result, and thr shows a theoretical analysis.
tion 2 implies, the scheduling time can be affected by the       A bottom-right inner-graph shows a logarithmic pattern of
minimum inter-scheduling time while traversing different cen-    C-PonD resource discovery time.
tral managers sequentially. Thus, we can expect that the
scheduling time increases with O(Sf ), where Sf means the
number of flocking servers to traverse.                          ting), and Tu is the update period (300 seconds in default
   Other than the number of resources, the scheduling la-        setting), the update bandwidth is N × Sc × T1u per second at
tency is dependent on the number of jobs for matchmaking.        the central manager node. As Tu gets longer, the overhead
Condor has auto-clustering feature to improve job schedul-       of the central manager decreases, but the longer update pe-
ing throughput. At the time of scheduling, condor schedd         riod can result in a stale scheduling result in case dynamic
daemon checks SIGNIFICANT ATTRIBUTES (SA), which                 attributes are specified as requirements.
is a subset of attributes of condor startd and condor schedd.       Matchmaking Result Staleness : Condor relies on pe-
Job ClassAds whose values of SA are same are grouped in          riodically updated resource information for matchmaking,
a cluster, and they are scheduled at once, which improves        therefore the scheduling result correctness depends on the
the scheduling throughput [30]. The auto-clustering can be       dynamics of attributes and the information update period.
also applied to C-PonD to decrease the number of resource        In addition, the varying uptime of worker nodes can also
discovery queries.                                               influence the result correctness. Condor adopts soft-state to
   Network Bandwidth Overheads : C-PonD incurs band-             keep condor startd ClassAds of available resources, and a re-
width consumption during resource discovery query process-       source that has not updated its ClassAds during the past 15
ing, PonD join invitation messages, and condor startd Clas-      minutes (default configuration) are removed from the worker
sAds updates while a PonD is active. In order to understand      node list, which can also result in stale matchmaking results
query overheads, we assume that the number of resources is       during the soft-state time span.
N , the size of default arguments (e.g., task name, query           In C-PonD, the matchmaking is performed using local
range) is SD , a size of query requirement is SQ , a P2P ad-     ClassAds, so the only component that influences result stal-
dress size is SA , and the number of required resources is       eness is the query result transmission time. We compare
k. The bandwidth consumption between a child and parent          the matchmaking result staleness of Condor and C-PonD
node is (SD + SQ + SA × k) for a query distribution and          through simulations under a controlled environment with
a result aggregation. Using a recursive-partitioning tree, a     synthetic attribute dynamics and resource uptime.
message is routed only to 1-hop neighbors, so the total query
bandwidth usage is (N − 1) × (SD + SQ + SA × k). In or-          4.2                                   Simulations
der to analyze per edge bandwidth consumption, dividing             In order to evaluate C-PonD on a large scale simulated
total bandwidth consumption by the number of edges gives         environment, we implemented an event-driven simulation
constant factors (SD + SQ + SA ∗ k). In other words, (N-1)       framework using C++ language and Standard Template Li-
edges are traversed in a query, and each edge corresponds        brary (STL) aimed at minimizing the memory footprint in
to two messages (send/receive), thus the average per-edge        order to perform a simulation of millions of nodes2 . We used
bandwidth consumption follows O(SD + SQ + SA × k). In            Archer [31] in order to run simulations on distributed com-
the typical case where k is a constant, the average per-edge     puting resources efficiently. After a resource pool is formed
bandwidth consumption is O(1). In a tree, nodes located          with a given number of resources, nodes remain in the pool
closer to the root node are likely to have more child nodes.     until a simulation finishes.
Based on the underlying P2P-topology of C-PonD, the num-
ber of neighboring connections of a node is bounded by the       4.2.1                                    Evaluation on Scheduling Scalability
log of total number of nodes. Accordingly, in the worst case,       Figure 6 presents the scheduling latency of Condor and
a node is responsible for O(log N ) edges.                       Condor with flocking, and resource discovery time of C-
   Unlike C-PonD, Condor incurs inactive job fetching over-      PonD in the Y axis and the number of resources in a pool
heads during the negotiation phase and periodic condor startd    in the X axis (both represented in log-scale). The dotted
ClassAds update regardless of job submission status. As-         lines show expected latency based on the analysis, and the
suming that a pool size is N , Sc is the size of condor startd   solid lines show the response time from simulations. Because
ClassAds (usually several KBytes in a typical Condor set-        2
                                                                     https://github.com/kyungyonglee/PondSim
there currently is no enviromnent capable of precise simula-
tion of Condor at the scale of millions of nodes, and because                                                                 D;E9:F#+C3#    D9:,96#+C3#   D9:,96#+2=63#
                                                                                                              ("!#
the latency of each method is dependent on constants as-

                                                                          56/789:#9.#:9:;
234567389"      GHI89J"?7539@"A510"                                                                                              72C87-7$        2D./2E.$        78487-7$                                                          $"#I-.;?$@-?78::8;4$A23.$6:-?78::8;4:B784-3.=$                                                                                50>47?#@ABC#D0EF7#-G#9F.61H7;#,-.#

                     (a) Job waiting time and utilization                                                                                                      (b) Resource discovery latency                                                    (c) Job wait time based on ZIPF values

                                                      Figure 8: The performance metrics measured from a deployment of C-PonD on FutureGrid

formed a simulation in a pool of 106 worker nodes setting                                                                                                                                              attribute value change rate is the same as Condor ClassAds
edge latencies between 50 and 300 msec uniformly with the                                                                                                                                              update period (300 seconds).
different value of worker node’s join/leave event rate λ −
  1
     , 1 , 1 , 1 , 1 per second. For a Condor pool, we
300 600 900 1200 1500                                                                                                                                                                                  4.3              Deployment on FutureGrid
adopted the default system configuration value. For C-
                                                                                                                                                                                                          In order to demonstrate the correctness of the implemen-
PonD, we assume the node join and departure are handled
                                                                                                                                                                                                       tation and the feasibility of C-PonD approach as a real-world
via the underlying P2P network topology. Figure 7a shows
                                                                                                                                                                                                       HTC middleware in a wide area network environment, we
the fraction of non-stale scheduling results according to the
                                                                                                                                                                                                       conducted experiments on a cloud computing infrastructure,
different average uptime. The left-most solid-bar shows the
                                                                                                                                                                                                       FutureGrid [6]. A total of 240 Virtual Machine (VM) in-
simulated value of C-PonD, the dotted grid bar shows the
                                                                                                                                                                                                       stances were deployed across USA using a VM image that
simulation result of Condor, and the diagonal bar shows an
                                                                                                                                                                                                       has C-PonD modules installed. Specifically, we setup 80
expected value calculated based on Equation 4 for Condor.
                                                                                                                                                                                                       VMs at the University of Chicago, 80 VMs at Texas Ad-
We can observe that C-PonD has almost no impact for the
                                                                                                                                                                                                       vanced Computing Center (TACC), 70 VMs at San Diego
given worker nodes uptime, because the only factor that can
                                                                                                                                                                                                       Supercomputer Center (SDSC), and 10 VMs at the Univer-
influence the result staleness is the aggregated result prop-
                                                                                                                                                                                                       sity of Florida. Each VM instance was assigned a single-core
agation time that is of the order of tens of seconds in a 106
                                                                                                                                                                                                       CPU and 1 GB of RAM.
pool. In case of Condor, the staleness of the scheduling re-
                                                                                                                                                                                                          In order to evaluate the system in a realistic setup created
sult has a non-negligible impact based on the dynamics of
                                                                                                                                                                                                       a synthetic job submission scenario by referencing DAS-2
worker nodes. For instance, 20% of scheduling results are
                                                                                                                                                                                                       trace from the grid workload archive [34]. From the trace, we
stale when the average uptime of a node is 25 minutes. The
                                                                                                                                                                                                       extracted memory usage information, the number of inde-
results also claim the correctness of the analytical model of
                                                                                                                                                                                                       pendent concurrent execution for each job, and the running
the scheduling result non-staleness in Equation 4 - the sim-
                                                                                                                                                                                                       time of each job. An extra resource attribute named ZipfAt-
ilar value of Condor(sim) and Condor(thr).
                                                                                                                                                                                                       tribute was added to resource ClassAds to allow the experi-
    Next, we present the effect of attribute value dynamics
                                                                                                                                                                                                       ments to vary the likelihood of resource discovery queries to
to the scheduling result. In C-PonD, the matchmaking is
                                                                                                                                                                                                       find available resources. During the experiment, each sub-
performed with resource information that is available from
                                                                                                                                                                                                       mitted job selects an integer target value from one to ten
the local condor startd daemon. Otherwise, in Condor, the
                                                                                                                                                                                                       following a Zipf-distribution as a requirement. Each worker
matchmaking result might be as stale as the information
                                                                                                                                                                                                       node also selects ZipfAttribute value at the C-PonD initial-
update period (300 seconds in the default Condor configu-
                                                                                                                                                                                                       ization time.
ration). In order to model scheduling with a requirement of
                                                                                                                                                                                                          During the match-making process, nodes that satisfy re-
non-uniform likelihoods of resource matching, we add a syn-
                                                                                                                                                                                                       quirements extracted from the DAS-2 trace file and whose
thetic ClassAd attribute, ZipfAttribute whose value follows
                                                                                                                                                                                                       ZipfAttribute is same as the selected Zipf value are returned.
Zipf-distribution. A worker node selects an integer value
                                                                                                                                                                                                       In a real grid computing scenario, finding nodes with high
from one to ten with skew value of 1.0 and heavy portion
                                                                                                                                                                                                       ZipfAttribute value can be thought of as searching for best
of the distribution at one (35% of nodes). At the simula-
                                                                                                                                                                                                       machines with higher capabilities than other machines in
tion, a node changes the ZipfAttribute value at a rate of λ
                                                                                                                                                                                                       a resource pool. We differentiated average job inter-arrival
following the exponential distribution. The scheduling stal-
                                                                                                                                                                                                       time that follows the exponential distribution to observe the
eness is measured as the fraction of nodes that still satisfies
                                                                                                                                                                                                       system performance under different system utilizations. The
the requirement after the matchmaking. We do not show
                                                                                                                                                                                                       experiment is conducted for two hours per each job submis-
detailed analysis for this model due to the similarity with
                                                                                                                                                                                                       sion rate. A job that is composed of multiple independent
Equation 3.
                                                                                                                                                                                                       concurrent tasks is submitted at an arbitrary node while
    The vertical axis of Figure 7b shows the fraction of non-
                                                                                                                                                                                                       keeping the overall job submission rate of the system.
stale scheduling result, and the horizontal axis shows the
                                                                                                                                                                                                          Figure 8a shows the average job waiting time of C-PonD
mean time of attribute change event. As shown in the fig-
                                                                                                                                                                                                       at the primary vertical axis under different job submission
ure, Condor has a higher impact than C-PonD as an at-
                                                                                                                                                                                                       rates shown in the horizontal axis. The job waiting time
tribute value changes more dynamically. In case of Condor,
                                                                                                                                                                                                       is the elapsed time since the job submission until the job
about 20% of scheduling results are stale when the average
                                                                                                                                                                                                       execution begin time. The secondary vertical axis shows
system utilization that is calculated as                                                                            Number of Running Jobs            America            Europe        Asia
                    P                                                                                 300                                                                                      600
                      JobRunningT ime
        ExperimentT ime ∗ N umberof Resources                                                         250                                                                                      500

                                                                      Number of Available Resources

                                                                                                                                                                                                     Number of Running Jobs
   As the job submission rate increases, the system utiliza-                                          200                                                                                      400

tion and job waiting time also increase due to resource con-
                                                                                                      150                                                                                      300
tention. In order to clarify the reason for longer waiting
time as the job submission rate increases, we present the                                             100                                                                                      200
average, minimum, and maximum query latency for differ-
ent job submission rates in Figure 8b. In Figure 8b, we                                                50                                                                                      100

can observe that the resource discovery query latency does                                             0                                                                                       0
not have a strong correlation with the job submission rate;                                                 1   6    11    16     21     26    31       36      41      46   51   56      61
regardless of the contention, it took less than a second to                                                           Req.1                   Req.2                  Req.3        Req.4
                                                                                                                                         Elapsed Time (Hours)
traverse 240 resources distributed in a WAN network. We
can also observe the occasional longer query latency that
might be caused by internal lagging nodes in a tree during       Figure 9: The number of available resources and running
the aggregation task, which can be improved by heuristics        jobs status snapshots of C-PonD execution on 440 PlanetLab
dealing with abnormalities in a tree, such as First-Fit and      nodes during 65 hours of experiment. In Req. 2, 3, and
redundant topology in Section 3.1.3 and 3.1.4 (which were        4 region, jobs runs on America, Europe, and Asia nodes,
not applied in this experiment).                                 respectively.
   Figure 8c shows an average job waiting time of C-PonD
based on the different target ZipfAttribute value under dif-
ferent job submission rates. The horizontal axis shows the       the vertical bar whose value can be referenced at the sec-
target Zipf value at a submission, and the vertical axis shows   ondary vertical axis.
the corresponding wait time. As we can see, the waiting time       We controlled the job submission rate not to overwhelm
gets longer when the number of requirement satisfying re-        PlanetLab nodes, as they are shared resources not dedicated
sources is reduced (e.g., higher ZipfAttribute value). It can    to a single user. Based on observation with varying require-
be explained as follow: a job is composed of a large number      ments, C-PonD showed a flawless operation for a PonD cre-
of independent concurrent tasks that can run on different        ation and job execution.
machines. Thus, when a job with requirements of high Zip-          From the figure, we can observe that the number of run-
fAttribute is submitted, a resource contention is likely to      ning jobs at a time is more than the number of claimed
happen among multiple independent tasks in a job.                machines (equal to total number of nodes minus the num-
   These experiments validates the functionality of C-PonD       ber of available nodes). The reason is as follows: if a worker
in a real-world WAN environment while showing a good             node has multi-core CPU, the condor startd daemon recog-
query response time regardless of the job submission rate        nizes each core as a distinct worker node, and those distinct
and resource utilization.                                        worker nodes execute jobs independently. Thus, a node with
                                                                 multiple cores can run multiple jobs at a time.
4.4   C-PonD running on PlanetLab
   In this experiment, we observe the dynamic behavior of
C-PonD creation through resource status snapshots over a         5.                                    RELATED WORK
65 hour period while running C-PonD on 440 PlanetLab                Decentralized resource discovery methods are proposed
nodes. After installing C-PonD module on each PlanetLab          from various aspects. In a structured P2P network, Sword [19]
node, we add the geographic coordinate information that is       and Mercury [21] enhance DHTs to support range queries
extracted from the IP address of each node as condor startd      by mapping attribute names and values to the DHT key.
ClassAds attributes. We leverage this attribute to build a       Kim et. al [20], Squid [22], and Artur et. al [35] discuss
query requirement. In a HTC pool that is deployed on a           resource locating methods on a multi-dimensional P2P net-
widely distributed environment, we can explicitly specify a      work. Kim et. al [20] maps a resource attribute to one
region of job execution using a geospatial-aware query based     dimension in CAN [36], and a requirement confirming zone
on latitude/longitude range. This is a useful scenario in case   is created from a query requirement. Squid [22] leverages
a communication cost is a substantial factor determining         Space Filling Curve (SFC) to convert a multi-dimensional
job performance. The flexibility of adding new attributes        space to a one-dimensional ring space, where the locality
addresses an advantage of C-PonD over other decentralized        preserving feature of SFC allows a range query. However,
resource discovery systems.                                      the limitation of locality-preservation for dimensions over
   In Figure 9, Req.1 region in the horizontal axis, jobs are    five limits scalability of Squid. Artur et. al [35] converts
submitted with a requirement that says ZipfAttribute is one      multi-dimensional spaces into one dimension ring space. The
of the values from one to ten. In the Req.2 region, jobs         correlation between multiple dimensions and attribute val-
are submitted with a requirement of worker nodes being in        ues are leveraged for match-making. This algorithm has a
America. In Req.3 region, jobs are submitted with a require-     scalability limitation for a large number of attributes. Simi-
ment of worker nodes being in Europe, and Req.4 requires         lar to our work, Armada [18] use a tree-structure for match-
worker nodes being in Asia. The vertical axis shows the          making by assigning an Object ID based on an attribute
number of available resources in America (top horizontal         value, where a partition tree is constructed based on the
line), Europe (middle line), and Asia (bottom line) conti-       proximity of the object ID. The resource discovery module
nent. The number of running jobs at the time is shown at         of PonD has advantages in the aspects of rich query pro-
cessing capacity and appropriateness for dynamic attributes     The system combines a P2P overlay for resource discovery
over these works.                                               across a loosely-coupled resource “sea” in order to create a
   In an unstructured network, Iamnitchi et. al [37] proposes   small “pond” of resources using unmodified HTC middle-
a flooding-based query distribution for match-making on a       ware for job execution and monitoring. Our approach re-
P2P network with query forwarding optimization method           moves the need for a central server or set of servers which
leveraging previous query result and similarity of queries.     monitor the state of all resources in an entire pool. A de-
Due to the characteristics of flooding, duplicated messages     centralized resource discovery module provides a mechanism
hurt the query efficiency. Shun et. al [38] uses super-peers    to discover a list of nodes in the pool which satisfy job re-
which keep the resource status of worker nodes. In order        quirements by leveraging a self-organizing tree for a query
to minimize information update overhead, they use thresh-       distribution and result aggregation. Through the first-order
old to decide whether to distribute updated resource in-        performance analysis and simulations, we compared our ap-
formation. Thomas et. al [39] uses hybrid approach of           proach with an existing HTC approach, Condor. In terms of
flooding-based approach and structured id based propaga-        scalability, a job execution pool creation time of PonD grew
tion method. Different from the resource discovery module       logarithmically as the number of resources increases. The
of PonD, these methods do not guarantee that a resource         evaluation on scheduling result correctness under a dynamic
discovery query can be resolved even there is a node that       environment presents the robustness of our approach to dy-
satisfies job requirements. Periodic resource information up-   namic attribute value changes and worker node churn. In
date can also result in result-staleness.                       order to demonstrate the validity of our proposed approach,
   MyGrid [40] provides a software abstraction to allow a       we deployed and experimented a prototype implementation
user to run tasks on resources that are available to the user   of C-PonD using unmodified Condor binary files leveraging
regardless of scheduler middleware. Based on MyGrid work,       run-time configurations setup in the real-world - PlanetLab
OurGrid [41] toolkit provides a mechanism to allow users to     and FutureGrid. Although C-PonD implementation is cen-
gain access to computing resources on other administrative      tered on a Condor, the clean separation of decentralized re-
domains while guaranteeing a fair resource sharing among        source discovery and PonD creation module from the HTC
resources that are connected through P2P network.               middleware makes this approach generalizable and applica-
   BonjourGrid [42] and PastryGrid [23] are decentralized       ble to other HTC middleware, and a further differentiation
grid computing middlewares. For resource discovery, Bon-        from other decentralized HTC platforms that try to cover
jourGrid [42] uses publish/subscribe multicast mechanism.       all phases of operation (e.g., resource discovery, scheduling,
Without structured multicast mechanism, resource discov-        job execution and monitoring).
ery results can flood the query initiating node. Due to re-
liance on IP-layer multicast, it has no guarantee to work       7.   ACKNOWLEDGMENTS
on WAN environment. We solved this issue by leveraging
                                                                  We would like to thank our shepherds Douglas Thain and
P2P-based virtual overlay network. PastryGrid [23] is built
                                                                Thilo Kielmann, anonymous reviewers, and Alain Roy for
on a structured P2P network, Pastry [43], and the resource
                                                                their insightful comments and feedback. This work is spon-
discovery is performed by sequentially traversing nodes until
                                                                sored by the National Science Foundation (NSF) awards
a required number of nodes is discovered. Though they did
                                                                0751112 and 0910812.
not measure efficiency of resource discovery method, the cost
of sequential traversing (e.g., O(N ), where N is the number
of query satisfying nodes) is much more expensive than our      8.   REFERENCES
parallel resource discovery mechanism, which is O(logN ).        [1] Jim Basney and Miron Livny. Deploying a High
   MyCluster [44] presents a method to submit and run jobs           Throughput Computing Cluster. Prentice Hall, 1999.
across different administrative cluster domains. In their ap-    [2] David P. Anderson. Boinc: A system for
proach, a proxy is used to reserve a specific amount of CPUs         public-resource computing and storage. In Fifth
in other clusters for some periods of time. The proxy is also        IEEE/ACM Inter. Workshop on GRID, 2004.
responsible for setting task execution environment across        [3] I. Raicu, Y. Zhao, C. Dumitrescu, I. Foster, and
different clusters in a user-transparent way. Due to the re-         M. Wilde. Falkon: a fast and light-weight task
source reservation mechanism for a specific amount of time,          execution framework. In Proceedings of the 2007
resource underutilization can happen after job completion.           ACM/IEEE conference on Supercomputing, 2007.
   Celaya et. al. [33] proposes a decentralized scheduler for    [4] O. Sonmez, N. Yigitbasi, S. Abrishami, A. Iosup, and
HTC using a statically built tree for resource information           D. Epema. Performance analysis of dynamic workflow
aggregation and resource discovery. A parent node keeps              scheduling in multicluster grids. HPDC ’10, 2010.
aggregated resource information of a sub-tree rooted at it-      [5] B. Chun, D. Culler, T. Roscoe, A. Bavier, L. Peterson,
self, and the information is leveraged for scheduling. Due           M. Wawrzoniak, and M. Bowman. Planetlab: an
to the periodic resource information update, the schedul-            overlay testbed for broad-coverage services.
ing result might be misleading, and it supports a limited            SIGCOMM Comput. Commun. Rev., 2003.
number of attributes for resource discovery (e.g., memory,       [6] Gregor von Laszewski and et. al. Design of the
disk space). Without a solid mechanism to handle internal            futuregrid experiment management framework. In
node failures in a statically built tree, the system can not         GCE2010 at SC10, New Orleans, 11/2011 In Press.
guarantee a reliable service.                                    [7] Solomon. M. Ruman. R, Livny. M. Matchmaking:
                                                                     distributed resource management for high
6.   CONCLUSIONS                                                     throughputcomputing. In 7th HPDC, 1998.
  This paper presents and evaluates PonD, a novel approach       [8] D.H.J. Epema, M. Livny, R. van Dantzig, X. Evers,
for scalable, self-organizing and fault-tolerant HTC service.        and J. Pruyne. A worldwide flock of Condors: Load
sharing among workstation clusters. Future                        Experiences with self-organizing, decentralized grids
       Generation Computer Systems, 12:53–65, 1996.                      using the grid appliance. HPDC ’11. ACM, 2011.
 [9]   J. Frey, T. Tannenbaum, M. Livny, I. Foster, and           [29]   G. Fedak, C. Germain, V. Neri, and F. Cappello.
       S. Tuecke. Condor-g: a computation management                     Xtremweb: A generic global computing system.
       agent for multi-institutional grids. In HDPC, 2001.               CCGRID ’01, 2001.
[10]   Ali R. Butt, Rongmei Zhang, and Y. Charlie Hu. A           [30]   condor auto clustering.
       self-organizing flock of condors. J. Parallel Distrib.            https://condor-wiki.cs.wisc.edu/index.cgi/
       Comput., 66:145–161, January 2006.                                wiki?p=AutoclustingAndSignificantAttributes.
[11]   I. Sfiligoi. glideinWMS a generic pilot-based workload     [31]   Renato J. Figueiredo and et al. Archer: A community
       management system. Journal of Physics Conference                  distributed computing infrastructure for computer
       Series, 119(6):062044–+, July 2008.                               architecture research and education. In Collaborative
[12]   P.Oscar Boykin and et al. A symphony conducted by                 Computing, 2009.
       brunet, 2007.                                              [32]   D Bradley, T St Clair, M Farrellee, Z Guo, M Livny,
[13]   J. M. Kleinberg. Navigation in a small world. Nature,             I Sfiligoi, and T Tannenbaum. An update on the
       406, August 2000.                                                 scalability limits of the condor batch system. Journal
[14]   V. Vishnevsky, A. Safonov, M. Yakimov, E. Shim, and               of Physics: Conference Series, 331(6):062002, 2011.
       A. D. Gelman. Scalable blind search and broadcasting       [33]   J. Celaya and U. Arronategui. A highly scalable
       over distributed hash tables. Comput. Commun., 2008.              decentralized scheduler of tasks with deadlines. In
[15]   T. Choi and P. O. Boykin. Deetoo: Scalable                        Grid Computing (GRID), 2011, 2011.
       unstructured search built on a structured overlay. In      [34]   A. Iosup, H. Li, M. Jan, S. Anoep, C. Dumitrescu,
       7th Workshop on Hot Topics in P2P Systems, 2010.                  L. Wolters, and D. H. J. Epema. The grid workloads
[16]   A. I. T. Rowstron, A. Kermarrec, M. Castro, and                   archive. Future Gener. Comput. Syst., 2008.
       P. Druschel. Scribe: The design of a large-scale event     [35]   A. Andrzejak and Z. Xu. Scalable, efficient range
       notification infrastructure. In Proceedings of the Third          queries for grid information services. In P2P ’02, 2002.
       International COST264 Workshop on Networked                [36]   S. Ratnasamy, P. Francis, M. Handley, R. Karp, and
       Group Communication, 2001.                                        S. Schenker. A scalable content-addressable network.
[17]   J. Kim, B. Bhattacharjee, P. J. Keleher, and                      In SIGCOMM, 2001.
       A. Sussman. Matching jobs to resources in distributed      [37]   A. Iamnitchi and I. T. Foster. On fully decentralized
       desktop grid environments. 2006.                                  resource discovery in grid environments. GRID, 2001.
[18]   D. Li, J. Cao, X. Lu, and K. C. C. Chen. Efficient         [38]   Shun K. K. and Jogesh K. M. Resource discovery and
       range query processing in peer-to-peer systems. IEEE              scheduling in unstructured peer-to-peer desktop grids.
       Trans. on Knowl. and Data Eng., 2009.                             Parallel Processing Workshops, 2010.
[19]   J. Albrecht, D. Oppenheimer, A. Vahdat, and D. A.          [39]   T. Fischer, S. Fudeus, and P. Merz. A middleware for
       Patterson. Design and implementation trade-offs for               job distribution in peer-to-peer networks. In Applied
       wide-area resource discovery. ACM Trans. Internet                 Parallel Computing. State of the Art in Scientific
       Technol., 2008.                                                   Computing. 2007.
[20]   Jik-Soo Kim, Peter Keleher, Michael Marsh, Bobby           [40]   Lauro B. C., Loreno F., Eliane A., Gustavo M.,
       Bhattacharjee, and Alan Sussman. Using                            Roberta C., Walfredo C., and Daniel F. Mygrid: A
       content-addressable networks for load balancing in                complete solution for running bag-of-tasks
       desktop grids. In HPDC, 2007.                                     applications. In In Proc. of the SBRC, 2004.
[21]   A. R. Bharambe, M. Agrawal, and S. Seshan.                 [41]   N. Andrade, W. Cirne, F. Brasileiro, and
       Mercury: supporting scalable multi-attribute range                P. Roisenberg. Ourgrid: An approach to easily
       queries. SIGCOMM Comput. Commun. Rev., 2004.                      assemble grids with equitable resource sharing. In
[22]   C. Schmidt and M. Parashar. Squid: Enabling search                Proceedings of the 9th Workshop on Job Scheduling
       in dht-based systems. J. Par. Distrib. Comput., 2008.             Strategies for Parallel Processing, 2003.
[23]   Heithem A., Christophe C., and Mohamed J. A                [42]   Heithem A., Christophe C., and Mohamed J.
       decentralized and fault-tolerant desktop grid system              Bonjourgrid: Orchestration of multi-instances of grid
       for distributed applications. In Concurrency and                  middlewares on institutional desktop grids. Parallel
       Computation: Practice and Experience, 2010.                       and Distributed Processing Symposium, 2009.
[24]   J. Dean and S. Ghemawat. Mapreduce: simplified data        [43]   Antony Rowstron and Peter Druschel. Pastry:
       processing on large clusters. Commun. ACM, 2008.                  Scalable, distributed object location and routing for
[25]   K. Lee and et. al. Parallel processing framework on a             large-scale peer-to-peer systems, 2001.
       p2p system using map and reduce primitives. In 8th         [44]   E. Walker, J.P. Gardner, V. Litvin, and E.L. Turner.
       Workshop on Hot Topics in P2P Systems, 2011.                      Creating personal adaptive clusters for managing
[26]   A. Ganguly, A. Agrawal, P. Boykin, and R. Figueiredo.             scientific jobs in a distributed computing environment.
       Ip over p2p: enabling self-configuring virtual ip                 In Challenges of Large Applications in Distributed
       networks for grid computing. In IPDPS, 2006.                      Environments, 2006.
[27]   David Isaac Wolinsky and et. al. On the design of
       scalable, self-configuring virtual networks. In Proceed.
       of Super Computing, SC ’09, 2009.
[28]   David Isaac Wolinsky and Renato Figueiredo.
You can also read