CDAC Scientific Cloud: On Demand Provisioning of Resources for Scientific Applications

Page created by Pedro Chang
 
CONTINUE READING
CDAC Scientific Cloud: On Demand Provisioning of
             Resources for Scientific Applications
                  A. Payal Saluja, B. Prahlada Rao B.B, C. Ankit Mittal and D. Rameez Ahmad
                              SSDH, Centre for Development of Advanced Computing
                               C-DAC Knowledge Park, Bangalore, Karnataka, India

                                                                resource utilization and allows scientists to scale up to solve
Abstract - Scientific applications have special requirements
                                                                larger science problems. It also enables the system software
of availability of a massive computational power for
                                                                to be configured as needed for individual application
performing large scale experiments and huge storage
                                                                requirements.For research groups, cloud computing will
capacity to storage terabyte or petabyte range of outputs.
                                                                provide convenient access to reliable, high performance
Scientific Cloud provides scientists computational, storage
                                                                clusters and storage, without the need to purchase and
and network resources with a inbuilt capability of utilizing
                                                                maintain sophisticated hardware. It has been said by Pete
the infrastructure. The scientific applications can be
                                                                Beckman, director of Argonne’s Leadership Computing
dynamically provisioned with the required cloud solutions
                                                                Facility    that “Cloud computing has the potential to
that are tailored to the application needs. Centre for
                                                                accelerate discoveries and enhance collaborations in
Development of Advanced Computing (CDAC) under
                                                                everything     from      providing     optimized      computing
Department of IT, is the pioneer in HPC in India with ~70TF
                                                                environment for scientific applications to analyzing data
compute power. The authors of this paper have discussed the
                                                                from climate research, while conserving energy and lowering
need and benefits of scientific cloud. Authors have explained
                                                                operational costs”. However, there are various challenges of
the model, architecture and components of CDAC scientific
                                                                HPC on demand [29] like performance, power consumption
cloud. CDAC HPC resources can be provisioned on-demand
                                                                and collaborative work environments. In this approach
to the scientific research community and released when they
                                                                paper, we present the concept of scientific clouds, HPC as a
are not required. For Indian researchers and scientists,
                                                                service and its benefits to the scientific research community.
CDAC scientific cloud model will provide convenient access
                                                                The authors also propose a prototype for CDAC scientific
to reliable, high performance clusters and storage, without
                                                                cloud that will provide the following offerings
the need to purchase and maintain sophisticated hardware.
                                                                   I. Infrastructure as a Service(IaaS)[18] by providing
                                                                         traditional MPI enabled HPC with parallel file
Keywords: HPC, HPC as a Service, Map Reduce, Cloud
                                                                         system like GlusterFS[19] and by provisioning
Vault
                                                                         Hadoop[20] clusters with map reduce[21] with the
                                                                         support      of         Hadoop       distributed   file
                                                                         system(HDFS)[20].
1    Introduction                                                 II. Storage as a service (StaaS) [22] to provide petabytes
      High Performance Computing (HPC) allows scientists                 of data storage to the scientific communities.
and engineers to solve complex science, engineering and
business problems using applications that require high          The rest of the sections of this paper are organized as
bandwidth, low latency networking, very high compute and        follows: Section 2 describes the concept of HPC as a Service,
storage capabilities. Scientists in the areas of high-energy    the challenges of HPC on cloud and how cloud computing
physics [13], astronomy [14], climate modeling [15], chemo      benefits the scientific community. Section 3 talks about the
informatics [16] and other scientific fields, require massive   other scientific cloud projects and their objective and the
computing power to run experiments and huge data centers        relevant work. Section 4 details about the CDAC scientific
to store data. Typically, scientists and engineers must wait    cloud and its offerings. Section 5 details the proposed model
in long queues to access shared clusters or acquire expensive   and architecture for the CDAC scientific cloud. Section 6
hardware systems.                                               talks about the applications that will be enabled on CDAC
                                                                scientific cloud. Section 7 concludes and tells about the
Cloud computing [17] is a model for on-demand access to a       future plan of the work.
shared pool of configurable computing resources (e.g.,
networks, servers, storage, applications, services, and
software) that can be easily provisioned as and when needed.
Cloud computing aggregates the resources to gain efficient
2    HPC as a Service on Cloud                                       •    Reduction in overall Job execution time: Jobs will
                                                                          be scheduled using intelligent data aware job
   Bringing HPC facilities to cloud will provision the                    scheduling algorithms.
scientists and researchers with a crucial set of resources and
enable them to solve large-scale, data-intensive, advanced
computation problems on research topics across the               Figure 1 depicts the layered architecture of scientific cloud.
disciplinary spectrum. HPC as a service is an on-demand          The lowest layer of the stack is the physical resources
provisioning      of    high-performance,     scalable    HPC    (compute, storage and network) that will be connected
environment with high-density compute nodes and huge             through a high speed link. The first software layer above the
storage on high performance interconnects like Infiniband        physical hardware is the host operating system. Since
[4] and Myrinet [5]. HPC as a service is provisioned to meet     scientific cloud will be catering HPC applications,
the HPC application demands, whether one server (Virtual         performance of such applications on such infrastructure will
machine) or a large cluster (Virtual cluster). A Virtual         be of prime importance. Hence, Type 1 or bare-metal
cluster is a collection of Virtual Machines configured to        hypervisor should be preferred for virtualization that will
interact with each other as a traditional Linux cluster.         run directly on the host's hardware to control the hardware
Scientific cloud or HPC as a Service enables greater systems     and to manage the guest operating systems.
flexibility and eliminates the need for dedicated hardware
resources per applications and would help researchers cope
with exploding volumes of data that need to be analyzed to                 `SCIENTIIFC APPLICATIONS (bioinformatics,
                                                                                          Climate modeling)
yield meaningful results. It also simplifies usage models and
enables dynamic allocation per given task.
[6] Described a demonstration of a low-order coupled                 A        User Interfaces ( APIs,Web Interface
atmosphere-ocean simulation running in parallel on an EC2            U       Mobile Interface, Portals, Workflows &
                                                                     T                        PSE )                                Mana-
system. It highlights the significant way in which cloud             H                                                             gement
                                                                             SaaS             PaaS                      IaaS
computing could impact traditional HPC computing                                                                     (Clusters -
paradigms. The results show that the performance is below                                                            MPI & MR      SLA &
                                                                     &                                               , storage)    Policy
the level seen at dedicated, supercomputer centers, however,
performance is comparable with low-cost cluster systems.                                                                           Mana-
Also it has been concluded that it is possible to envisage           S         Cloud Middleware Software Stack
                                                                                                                                   gemen
                                                                     E       (Resource provisioning, scheduling, File              t
cloud systems more closely targeted to HPC applications,
                                                                     C                system, monitoring)
that feature a specialized interconnect such as Myrinet or           U                                                             Acco-
Infiniband.                                                          R
                                                                                                                                   unting
     Scientific Cloud benefits to the Scientists & research          I
                                                                     T
                                                                              Operating System/Hypervisors
     Community:                                                      Y
     • Dynamic Provisioning of HPC Clusters: Access to                                                                             Meteri
          on-demand cloud resources enables automatic                         Compute         Network           Storage            ng &
                                                                                                                                   Billin
          provisioning of additional resources from the HPC
                                                                                                                                   g
          service to process peak application workloads,
                                                                                        Ethernet and INFINIBAND
          reducing the need to provision data center capacity                       (10 -20Gbps interconnect)
          according to peak demand. Hence, scientists will
          benefit from the ability to scale up and down the
          computing infrastructure according to the
          application requirements and the budget of users.
     • Virtual ownership of resources : Virtual                            Figure 1 Scientific Cloud Architecture
          ownership of cloud resources will reduce
          uncertainty concerning access to those resources       A guest operating system will run on another level above the
          when you need to use them                              hypervisor. Hypervisor actually controls the host processor
                                                                 and resources, allocating what are needed to each operating
     • Ease of deployment and access: The use of virtual
                                                                 system in turn and making sure that the guest operating
          machine images offers the ability to package the
          exact OS, libraries, patches, and application codes    systems (called virtual machines) cannot disrupt each other.
                                                                 The virtualized resources include the basic cloud computing
          together for deployment. Scientists can have easy
                                                                 services such as processing power, storage, and network. The
          access to large distributed infrastructures and
          completely customize their execution environment,      Cloud middleware software stack is the key component that
                                                                 handles resource provisioning and scheduling, volume
          thus providing the perfect setup for their
                                                                 management, system monitoring for all the higher-level
          experiments.
                                                                 components and services.
Cloud management is a crucial component as it monitors and        provides two major functionalities of Compute Grids and In-
manages all the cloud resources at physical and virtual level.    Memory Data Grids
The various management components that will be part of
scientific cloud are : Resource Inventory search, Hardware        3.4 StratusLab
monitoring & Management , Storage maps and reports,
Alerts & notifications with automated rectification,              Stratus Lab [31] is developing a complete, open-source
accounting and billing(to recover costs, capacity planning to     cloud distribution that allows grid and non-grid resource
ensure that consumer demands will be met) , policy                centers to offer and to exploit an Infrastructure as a Service
management & SLA(Service level Agreements- management             cloud. It basically enhances the grid infrastructure with
to ensure that the terms of service agreed to by the provider     virtualization and cloud technologies. It is particularly
and consumer are adhered to, and reporting for                    focused on enhancing distributed computing infrastructures
administrators).                                                  such as the European Grid Infrastructure (EGI).
                                                                  Each of the above mentioned projects focuses either on
                                                                  provisioning data centers on cloud or compute power on
3     Science Cloud Projects                                      cloud. Amazon Web Services alone provisions the various
The following are some of the science cloud projects have         services required for variety of HPC applications like
been executed in the direction to achieve HPC as a Service:       Amazon Elastic Compute cloud EC2, Amazon Elastic Map
                                                                  Reduce (EMR), Amazon Simple Storage Service (S3) [32].
                                                                  CDAC Scientific cloud is an effort to provide the services
3.1    Cumulus                                                    like compute and storage for the HPC community along with
                                                                  the software technologies like map reduce, MPI , mobile
Cumulus [2] is a project to build a Scientific Cloud for a        applications that will accelerate discoveries and enhance
Data Center. It is a storage cloud system that adapts existing    collaborations in science.
storage      implementations      to     provide      efficient
upload/download interfaces compatible with S3.It provides         4    CDAC Scientific Cloud (CSC)
features such as quota support, fair sharing among clients,
and an easy to- use, easy-to-install approach for                 C-DAC [7] is the pioneer in HPC in India and its HPC
maintenance. The most important feature of Cumulus is its         facilities on cloud can be linked by a 1 Gbps National
well-articulated back-end extensibility module. It allows         Knowledge Network (NKN) [8], developed by NIC. The
storage providers to configure Cumulus with existing              bandwidth offered by NKN will facilitate rapid transfer of
systems such as GPFS [9], PVFS [10], and HDFS [11], in            data between geographically dispersed clouds and enable
order to provide the desired reliability, availability or         scientists to use available computing resources regardless of
performance trade-offs. Cumulus is part of the open source        location. In addition, CDAC Scientific cloud will provide
Nimbus toolkit [12]. Cumulus is implemented in the python         data storage resources that will be used to address the
programming language as a REST service. The Cumulus               challenge of analyzing the massive amounts of data being
API is a set of python objects that are responsible for           produced by scientific applications and instruments. Storage
handling specific user requests.                                  as a service is of particular importance to scientific research,
                                                                  where volumes of data produced by one community can
3.2    OpenCirrus                                                 reach the scale of terabytes per day .CDAC will make the
Open Cirrus [3] tested is a collection of federated datacenters   Scientific cloud storage available to science communities by
for open-source systems and services research. It is designed     aggregating a set of storage servers .It will make use of
to support research into the design, provisioning, and            advanced technologies to provide fast random access storage
management of services at a global, multi-datacenter scale. It    to support more data-intensive problems. The test bed will be
is designed to encourage research into all aspects of service     a mix of virtual clusters and storage options, traditional HPC
and datacenter management.                                        cluster, Hadoop cluster, distributed and global disk storage,
                                                                  archival storage. The system provides both a high-
3.3 GridGain                                                      bandwidth, low-latency InfiniBand network as well as a
                                                                  commodity Gigabit Ethernet network. This configuration is
GridGain[30] is Java based open source middleware for real        different from a typical cloud infrastructure but is more
time big data processing and analytics that scales up from        suitable for the needs of scientific applications.
one server to thousands of machines. It enables the               Using CDAC Scientific cloud instances, users can expedite
development of compute and data intensive High                    their HPC workloads on elastic resources as needed .Users
Performance    Distributed    Applications.   Applications        can choose from Cluster Compute or Cluster Hadoop
developed with Gridgain can scale up on any infrastructure -      instances within a full-bisection high bandwidth network for
from a single Android device to a large cloud. Gridgain           tightly-coupled and IO-intensive workloads or scale out
                                                                  across thousands of cores for throughput-oriented
applications. This will let scientists focus on running their    online and through mail about their login credentials and the
applications and crunching or analyzing the data generated       IP address for the ssh access to the compute cluster. The
by applications without having to worry about time-              allocation of the cluster and its nodes (master & worker
consuming set-up, management or tuning of clusters or            nodes) will depend upon the CPU, memory, IO requirements
storage capacity upon which they sit. Users will be able to      of the application. The applications that will need more of
run HPC applications on these instances including molecular      data processing and less of communications will provided
modeling, genome sequencing & analysis, and numerical            with the best suited map reduce cluster. The applications
modeling across many industries including Biopharma, Oil         that are more compute and memory intensive will be
and Gas, Financial Services and Manufacturing. In addition,      provisioned by the MPI enabled clusters with parallel IO
academic researchers will be able to perform research in         facility.
physics, chemistry, biology, computer science, and materials
science.
Following will be the supported features of CDAC High                    Physical   Compute
                                                                         Resource pools                                            Storage
Performance Computing as a service (HPCaaS):                                                                                       Nodes
      Dynamic Provisioning of clusters :On demand                         Hyperviso
                                                                                                                 Infiniband
         Provisioning MPI and Map reduce clusters to                           Hyperviso

         support compute intensive and data intensive                                 Hyperviso                                      Image
                                                                                                                                     Repository
         applications
      On-demand dynamic provisioning of storage
         volumes: Dynamic provisioning of clusters and                                                                             Parallel File System
                                                                    User                                 Virtual cluster           (GlusterFS)
         storage will be handled by the Cloud Resource              Request from                         request
                                                                    cloud portal
         Broker (CRB) or cloud metaschedular.
      Security : Simple , Secure and quick access to HPC
                                                                           Security               Cloud resource broker       Infiniband /
         clusters                                                          module                 And scheduler               Ethernet
      Provisioning of customized libraries, softwares                                                                        Interconnec
                                                                                                                              t
         workflows ,etc on HPC clusters as per the
         applications requirement Users will be provided                                            Ssh access
         with an option of selecting the specific MPI versions                                      to cluster

         or compiler versions to suffice the application
         requirements
      Performance: To reduce the hypervisor overhead
         type-1 kind of hypervisor will be used. The
                                                                                                          Virtual Cluster
         distributed locations will be connected with 1Gbps                                               MPI/ Map Reduce
         link and within the site nodes will be connected
         with infiniband interconnect to reduce the
         latencies.VM allocation to form a cluster will be                Figure 2 Infrastructure as a Service (IaaS)
         done by the cloud scheduler based on nearness to
         storage nodes to minimize the data movement on
         cloud.                                                  4.2    Storage As a Service (StaaS)
Following are the services that will be provisioned on the
CDAC Scientific Cloud                                            A service of supplying data storage capacity over Internet is
                                                                 Storage as a Service. In context of scientific cloud, StaaS
                                                                 provisions petabytes of data storage to the scientific
4.1    Infrastructure as a Service (IaaS)                        communities. CDAC’s Cloud Vault based on OpenStack
      C-DAC has its HPC facilities at various CDAC               Swift Object Storage software will provide scientists and
locations like Bangalore, Pune, Chennai, and Hyderabad           researchers partners with a convenient and affordable way to
with approximately 70TF. Figure 2 depicts the prototype          store, share, and archive data, including extremely large data
model for dynamic provisioning of the computational              sets. CDAC Cloud Vault is an object based storage system
resources when requested by the user. Users will be able to      and multiple interface methods make the Cloud Vault easy to
access the CDAC scientific cloud services through cloud          use for the average user. It also provides a flexible,
portal. First time users will have to register with their        configurable, and expandable solution to meet the needs of
required details and also the details about the kind of          more demanding applications. In this, files (also known as
applications they want to run on the cluster. Based on the       objects) are written to multiple physical storage arrays
type of the application mentioned by the user resources will     simultaneously, ensuring at least two verified copies exist on
be allocated by the cloud broker and the cluster instance will   different servers at all times. Figure 3 depicts the flow of the
be created on the fly. Immediately user will be intimated        Storage as a service (StaaS).The user registers himself by
providing the required details and the required amount of                     Cloud Vault will also be accessed by mobiles using mobile
storage. After the users request gets validated and approved,                 application for the basic file operations like list, upload,
user is sent the access details of the storage through email.                 download, and synchronize. There will also be a facility to
The various interfaces through which user can access Cloud                    auto synchronize users mobile with his cloud vault files so
Vault are as follows:                                                         that he can keep his mobile backup on cloud vault.
4.2.1 Web Interface
Web interface will allow access to the cloud vault files                      4.2.5 APIs
through browser. User will be able to list, create containers,                Files of any size can be stored in the Cloud Vault, from small
Upload/Download files, and Delete files using this interface.                 personal document collections to multi-terabyte backup sets
There will not be any need of installation of any clients to                  routed directly to the cloud using Rack space or S3 API in
access cloud files.                                                           applications.

4.2.2   Desktop GUI Application                                               5    CSC Architecture and Components
Cloud Vault files will be accessible using open source                        Figure 4 depicts the components of CDAC scientific cloud.
desktop application called cyberduck. It is an FTP-like stand                 The various components of CDAC scientific cloud are as
alone GUI application for accessing files. It supports file/                  follows:
directory listing, upload, download, synchronize, editing, etc.
Cyberduck is a open source desktop application available for
MAC and Windows system                                                        5.1 Hypervisor
         1 Registration                                                       A hypervisor, also known as a virtual machine
         Request                                                              manager/monitor (VMM), is computer hardware platform
                                                                              virtualization software that allows several operating systems
                                             Cloud Vult                       to share a single hardware host. The hypervisor controls the
                                             web
                                             Interface                        host processor and resources so that systems/virtual
                                                                              machines are unable to disrupt each other. As virtualization
               2 credentials                                                  adds overheads to the cluster performance, we choose to use
               are sent via
               email                  3   Login with                          type-1 or bare-metal hypervisors for virtualization. Type-1
                                      credentials
                                                                              hypervisors run directly on the host's hardware to control the
                                                                              hardware and to manage guest operating systems. Some of
                                                                   CDAC       the examples of type-1 hypervisors are Citrix XenServer
                                          4 Access Cloud
                                          vault Files            CloudVault   [24], VMware ESX/ESXi [25], and Microsoft Hyper-V
                                                                  Storage
                                                                              hypervisor. CDAC scientific cloud will be using Xen
          5     Access            GUI desktop
          Files                application to access
                                                                              hypervisor for the same.
          Through              CDAC Cloud Vault
          Desktop GUI

                                    4 Access Files
                                                                              5.2 Cloud middleware
                                    Through mobile                            Cloud Middleware or Cloud OS: Cloud middleware is the
                                                                              software stack for provisioning the large networks of virtual
                                                                              machines on demand. It also handles             scalability &
                                             Cloud       Vault
                                                                              reliability of the resources provided to the users. There are
                                             mobile Interface                 various open source & commercial cloud middleware
                                                                              available like Nimbus [12], Open Nebula [26], and vCentre
           Figure 3 Storage as a Service (StaaS)                              [27], Eucalyptus [28].

4.2.3   Command Line                                                          5.3 Cloud resource broker
                                                                              Cloud Resource Broker and Meta scheduler: Cloud resource
Command line access will allow the access to cloud vault                      broker is a common gateway to provision access to the HPC
files with the UNIX shell. Client installation of the scripts                 resources like compute clusters, storage on cloud .It is an
needs to be done on the user machine or laptop.                               intelligent scheduler that will provision the best pool of
                                                                              available resources to the users by using policy based
4.2.4   Mobile Interface                                                      decision. The various components that will build up a cloud
                                                                              resource broker are as follows:
5.3.1 Resource Discovery                                                                5.4 Cloud Management and Monitoring
Resource discovery of the available resources based on the
kind of user application that can be Compute intensive or                               Cloud Infrastructure monitoring & management tool is the
Data intensive or Memory Intensive                                                      control point for the virtual environment in cloud. This tool
                                                                                        will provide a single point access for administrators to
                                                                                        monitor & manage the resources of cloud. The following
    HDFS / Glusterfs Storage for Hadoop                                                 features that will be supported :
                                                               Cloud                          Resource Inventory search:          inventory including
                                                               Management                         virtual machines, hosts, data stores, and networks at
     Hadoop                       Hadoop                       & Monitoring
     Data
     Node
                    Hadoop
                    Data
                                  Data
                                  Node                         tool
                                                                                                  the administrators fingertips from anywhere
                    Node
                                                                                             Hardware monitoring & Management
         ETHERNET / InfiniBand                                                               Storage maps and reports: Provides storage usage,
           INTERCONNECT                                                                       connectivity and configuration. Customizable
 Map Reduce                      MPI enabled                                                  topology views give you visibility into storage
Virtual Cluster                 Virtual Cluster                                               infrastructure and assist in diagnosis and
                                                                          Cloud
                                                                                              troubleshooting of storage issues.
                                                                          Vault
                                                                          Portal             Alerts & notifications with automated rectification
                                                    Gluster                                  Utilisation, Performance & Energy Consumption
                                                    Mount
                                                     over                                     Trends
                                                  Infiniband
                                                                                             Accounting and billing (to recover costs, capacity
                                                                                              planning to ensure that consumer demands will be
                                                                                              met), Policy management & SLA.
Cloud Broker or         Automated dynamic
Meta Scheduler          provisioning scripts
Virtual Machines
Cloud Middleware
                                                                                        5.5 Cloud Portal

 Virtual        Virtual                                          Cloud vault
                                                                 Storage As             5.5.1  Portal for IaaS Provisioning and Problem Solving
Machines       Machines
 Cloud          Cloud                                            A Service
                                                                Cloud                            Environments (PSE)
Middleware     Middleware                                       Middleware
Openstack      Openstack                                        Openstack                   The scientific cloud portal will be the access point for
Nova           Nova                                             SWIFT                       the users for requesting & accessing the on demand HPC
Hypervisor         Hypervisor                                                               clusters. There will also be customized PSEs for
                                                                                            bioinformatics & climate modeling domains that will
                                                                                            provide the complete environment and workflow for the
                                                                                            domain specific applications.

                                                            Ethernet                    5.5.2 Portal for Storage as a Service
        In house development                      Storage
                                                                                             The portal for storage as a service will an access point
                                                  Node                        Storage
                                                                Storage       Node
                                                                                             for the cloud storage through which user can register
         Open source tool                                       Node
                                                                                             himself and ask for the required amount of storage .Also
                                                                                             user will be allowed to request for expanding the
                                                                                             allocated storage on the fly
  Figure 4 Components of CDAC Scientific Cloud (CSC)
5.3.2 Policy based resource selection                                                   6       Target Applications                 on      CDAC
Resource       selection and provisioning will be done                                          Scientific Cloud
considering the various aspects like Load balancing,
resources utilization, power aware.                                                           On-demand cloud computing can add new dimension
                                                                                        to HPC, in which virtualized resources can be sequestered,
5.3.3 Data aware Job scheduling                                                         in a form customized to target a specific application
                                                                                        requirement, at any point of time. [6] Described the
Data aware scheduling enables computation to be done
                                                                                        feasibility of running Coupled Atmosphere-Ocean Climate
nearest to the location of the data .In this case, the cloud
                                                                                        Models on an EC2 computing cloud and found that the
resource broker will talk to the cloud file system components
                                                                                        performance is below the level seen at dedicated clusters.
to find out the nearest storage nodes where data resides
                                                                                        However, cloud systems that feature a specialized
interconnect such as Myrinet or Infiniband and support MPI            [9]      http://www.darwinproject.mit.edu/wiki/images/2/2e/Gpfs
or Map reduce are more closely targeted to HPC                        _overview.pdf
applications.[23] states that Life Sciences are very good             [10]       Philip H. Carns, Walter B. Ligon III, Robert B. Ross
candidates for Map Reduce on cloud including sequence                 Rajeev Thakur, PVFS: A Parallel File System for Linux Clusters, In
assembly and the use of BLAST and similar algorithms for              Proc. of the Extreme Linux Track: 4th Annual Linux Showcase and
sequence alignment. On the other hand partial differential            Conference, October 2000
equation solvers, particle dynamics and linear algebra                [11]      Konstantin Shvachko, Hairong Kuang, Sanjay Radia,
require the full MPI model for high performance parallel              Robert Chansler,
implementation on cloud. The two application domains that             http://moodle.openfmi.net/file.php/331/lectures/lecture_4/The_Had
have been identified as pilot applications for CDAC                   oop_Distributed_File_System.pdf
scientific cloud are Bioinformatics applications like Blast,          [12]     The Nimbus Toolkit: www.nimbusproject.org
Climate Modeling like Seasonal Forecast model (SFM).                  [13]     http://www.nersc.gov/assets/HPC-Requirements-for-
Seasonal Forecast Model (SFM) is an atmosphere general                Science/Spentz.pdf
circulation model used for predicting the Indian summer               [14]     http://www.stfc.ac.uk/resources/pdf/ctreport.pdf
monsoon rainfall in advance of a season. It involves the
                                                                      [15]   http://www.mmm.ucar.edu/events/indo_us/PDFs/0630_S
single operation on multiple data sets that makes it a suitable
                                                                      KDash_HPC-USA-final.pdf
case for using map reduce in this particular application
                                                                      [16]       http://www.daylight.com/cheminformatics/casestudies/inf
                                                                      inity.html
7     Conclusions and Future Plans                                    [17]     http://www.gartner.com/it-glossary/cloud-computing/
Scientific applications require the availability of massive           [18]      http://searchcloudcomputing.techtarget.com/definition/Inf
compute and storage resources. Cloud computing can be of              rastructure-as-a-Service-IaaS
great help in on demand provisioning of the HPC resources.            [19]      http://www.gluster.org/about/
The applications can scale up heavily using HPC as a service
                                                                      [20]     Hadoop, http://en.wikipedia.org/wiki/Apache_Hadoop
on cloud. However, the performance related challenges have
to be addressed by fine tuning the cloud middleware stack             [21]      ]Map Reduce,
and the software libraries. The proposed model of CDAC                http://hadoop.apache.org/common/docs/current/mapred_tutorial.htm
                                                                      l
scientific cloud is an attempt to address the requirements and
challenges of HPC as a service on cloud. Currently, the test          [22]      http://searchstorage.techtarget.com/definition/Storage-as-
bed setup for the same is in progress and in future we plan to        a-Service-SaaS
develop the cloud system software components like Cloud               [23]    http://grids.ucs.indiana.edu/ptliupages/publications/Cloud
Resource Broker and Meta scheduler, management and                    sandMR.pdf
monitoring tools, portal & PSEs                                       [24]     Citrix Xenserver,
                                                                      http://www.citrix.com/English/ps2/products/product.asp?contentID
                                                                      =683148
8     References
                                                                      [25]     VMWare ESXi,
                                                                      http://www.vmware.com/files/pdf/VMware-ESX-and-VMware-
[1]       K. Keahey1, R. Figueiredo2, J. Fortes2, T. Freeman1, M.     ESXi-DS-EN.pdf
Tsugawa2, Science Clouds: Early Experiences in Cloud Computing
for Scientific Applications, 1University of Chicago, 2University of   [26]     OpenNebula:http://opennebula.org/
Florida                                                               [27]      vCentre, http://www.vmware.com/products/vcenter-
[2]      Cumulus: John Bresnahan, David LaBissoniere                  server/overview.html
http://www.nimbusproject.org/files/bresnahan_sciencecloud2011.pd      [28]     Eucalyptus, http://www.eucalyptus.com/
f                                                                     [29]    http://www.penguincomputing.com/files/whitepapers/PO
[3]      Roy Campbell,5 Indranil Gupta,et. Al, Open CirrusTM,         DWhitePaper.pdf
Cloud Computing Testbed: Federated Data Centers for Open              [30]     http://www.gridgain.com/features/
Source Systems and Services Research,.
                                                                      [31]      http://stratuslab.eu/doku.php/start
[4]       http://en.wikipedia.org/wiki/InfiniBand
                                                                      [32]     http://aws.amazon.com/hpc-applications/
[5]      http://en.wikipedia.org/wiki/Myrinet
[6]      Constantinos Evangelinos and Chris N. Hill, Cloud
Computing for parallel Scientific HPC Applications: Feasibility of
running Coupled Atmosphere-Ocean Climate Models on Amazon’s
EC2, CCA-08 in Chicago
[7]       www.cdac.in
[8]      www.nkn.in
You can also read