DataGrid WP1 Massimo Sgaravatto INFN Padova

Page created by Veronica Rowe
 
CONTINUE READING
DataGrid WP1

Massimo Sgaravatto
   INFN Padova
WP1 (Grid Workload Management)
„   Objective of the first DataGrid workpackage is (according to the
    project "Technical Annex"):

       To define and implement a suitable
       architecture for distributed scheduling
       and resource management on a GRID
       environment

„   This includes in particular the design and development of a
    useful (as seen from the DataGrid applications perspective) grid
    scheduler, or Resource Broker
WP1 Achievements                       (after the first year)

„   Analysis and evaluation of existing projects,
    software, technologies
„   Analysis of User Requirements from
    Applications
„   Definition and implementation of 1st
    Workload Management System
    „   First available Resource Broker ("super-
        scheduler") with the ability to take into account
        the data access requirements that are typical of
        the DataGrid applications
WP1 Components
„   UI - User Interface
    „   Lightweight component for access to the workload management
        system
„   RB - Resource Broker
    „   Core WP1 component able to find the “best” resource matching the
        user requirements
„   JSS - Job Submission Services
    „   Reliable job management operations
„   II - Information Index
    „   Caching index to the information space directly connected to the
        RB
„   LB - Logging and Bookkeeping
    „   Repository for events occurring in the lifespan of a job
WP1 Components
„   UI (User Interface)
    „   Lightweight component for access to the
        workload management system
    „   Ability to submit a job, described via an
        appropriate Job Description Language
        (JDL), based on Condor ClassAds to the
        DataGrid testbed from any user machine
UI commands
„   dg-job-submit
    „   To submit a job on the Grid
„   dg-job-get-output
    „   To retrieve the job output files (OutputSandbox)
„   dg-job-list-match
    „   Returns the list of resources fulfilling job requirements
„   dg-job-cancel
    „   To cancel one or more submitted jobs
„   dg-job-status
    „   To get the job status
„   dg-job-get-logging-info
    „   To get logging info
WP1 Components
„   RB (Resource Broker)
    „   Responsible to choose the “best” resources where to submit jobs
        based on the constraint specified in the JDL and characteristics and
        status of resources (published in the Grid Information Service and
        Replica Catalog)
    „   The strategy that is used for this first project release is to send the
        job to an appropriate CE (Computing Element):
         „   where the submitting user has proper authorization
         „   that matches the characteristics specified in the JDL (Architecture,
             computing power, application environment, etc.)
         „   where the specified input data (and possibly the chosen output SE) are
             determined to be "close enough" by the appropriate resource
             administrators.
    „   Matchmaking performed using Condor ClassAds library
WP1 Components
„   JSS (Job Submission Service)
    „   Responsible for job management operations
        (issued when requested by RB) and to keep tracks
        of submitted jobs
    „   Wrapper of Condor-G
„   II (Information Index)
    „   First filter to the Grid Information Service
    „   Specific applications of Globus GIIS
„   LB (Logging & Bookkeeping)
    „   Job status information
         „   “State machine” view of each job
    „   Push model
WP1 deployment
                                      “Community” RB
                                                         LB server        One for
                                      or “Personal” RB
                                                                          each RB
              Submitting                 RB-JSS
Can submit    machine (UI)
to multiple                                                                     One for
                                                                     II
    RBs                                                                         each RB

                                                                     RC

                                 CE
                    SE                            CE

                                                         SE

                    Queue of a
                    LRMS (LSF,
                      PBS)
dg-job-submit myjob.jdl

Job submission scenario
           Myjob.jdl
                Executable = "$(CMS)/exe/sum.exe";
                InputData     = "LF:testbed0-00019";
                ReplicaCatalog = "ldap://sunlab2g.cnaf.infn.it:2010/rc=WP2 INFN Test Replica
                Catalog,dc=sunlab2g, dc=cnaf, dc=infn, dc=it";
                DataAccessProtocol = "gridftp";
                InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"};
                OutputSandbox = {“sim.err”, “test.out”, “sim.log"};
                Requirements = other.Architecture == "INTEL" && other.OpSys== "LINUX Red Hat 6.2";
                Rank = other.FreeCPUs;
WP1 Y2 plans
„   Support for automatic proxy renewal (1.2: March
    2002)
    „   Interim (working !) solution by March 2002
    „   "Cleaner" solution later when/if our GRAM patches
        (necessary to forward the "fresh" proxy to the jobmanager)
        are merged in the standard Globus distribution
„   Provision of APIs for the applications (1.3: May 2002)
„   Ability to submit MPI jobs (1.3: May 2002)
    „   Starting considering MPI jobs within a single CE
WP1 Y2 plans
„   Use of WP3 R-GMA for L&B services
    „   Tests to be done by March 2002
    „   Date for actual integration can’t be foreseen now
„   Support for interactive jobs (1.4: July 2002)
    „   Jobs running on some CE worker node where a
        channel to the submitting (UI) node is available
        for the standard streams (proof like applications)
„   Support for job dependencies (1.4: July 2002)
    „   Integration of Condor DAGman
WP1 Y2 plans
„   Grid Accounting (2.0: September 2002)
    „   Economy based model
„   GUI (1.4: July 2002)
„   Advance reservation API’s (September 2002)
    „   Collaboration with GARA efforts
„   Support for job partitioning and "trivial" job
    checkpointing (2.0: September 2002)
„   Integration of WP2 “query optimization”
    (based on network information and driving
    data replication)
Other info
„   http://www.infn.it/workload-grid
„   WP1 doc. “WP1 – WMS Software -
    Administrator and User Guide”
You can also read