EResearch at VUW: the eScience Consultant's Tale - Kevin M. Buckley School of Engineering and Computer Science

Page created by Tyler Owens
 
CONTINUE READING
School of Engineering and Computer Science

     eResearch at VUW:
the eScience Consultant’s Tale

            Kevin M. Buckley
   School of Engineering and Computer Science

         Victoria University of Wellington
                   New Zealand

         Kevin.Buckley@ecs.vuw.ac.nz
School of Engineering and Computer Science
eResearch at VUW: The eScience Consultant’s Tale

                                                          The Abstract

In 2008, Victoria University of Wellington finally filled the post of eScience Consultant, although back then the post had
been drawn up with the title of eResearch Programmer, not that either name really gives much insight into the range of
facilitation that the role has provided.

This tale will highlight recent eResearch activity at the capital city university, from the viewpoint of its eScience
Consultant and in light of the experiences of the last six years.

                                                             Outline
    Cycle-stealing Grids
    MWA Activity
    Sci Fac HPC Facility
    Loss of BeSTGRID

HTML and PDF renditions of the slides should always be available here:

                                      http://www.ecs.vuw.ac.nz/~kevin/Conferences/NZeResSymp14
School of Engineering and Computer Science
eRA09: Kevin Buckley: A Grid-Based Facility for Large-Scale Cross-Correlation of Continuous Seismic Data

                    And finally ... who is supposed to do this "no-boundary" science ?

                                                                Domain scientists ?

           Might need to brush up on their BEPL, SCUFL, WSDL, MPI and DRMAA

                                                 Workflow/Grid computer scientists ?

           Might need to read up on SEED, SAC, QuakeML

                                                                           Phew !

                  Might still be a few jobs for people who can straddle the boundaries
School of Engineering and Computer Science
eResearch at VUW: The eScience Consultant’s Tale

                    Couple of thoughts on where eResearch in NZ might be heading

I nearly wasn’t able to give this talk, because, on coming to submit my two-paragraph, 76 word,
468 byte, plain text file, abstract, Submission.txt, I was informed (in red text!)
     You have tried to upload a file type which is not permitted or file is too large.
      Please try again using a standard document file type
fortunately, renaming Submission.txt to Submission.txt.doc sidestepped whatever standard
document file type checking there was
                                                            But really?!

Similarly, I recently tried to contribute to the conversation around the likely state of eResearch,
in New Zealand, in 2020, on the back of someone "not seemingly getting it", however,

    depsite having an long-standing identity at the eresearch.org.nz web presence,
    as well as having an NZ-federated, Shibbolised, identity at my own institution
                           it’s been suggested that I need yet another identity (via linkedin) to contribute!?
Made me wonder if NZ eResearch, having already seemingly progressed way beyond a
      "plain text" future by 2014, would even need a Tuakiri in 2020 ?
And if every "professional" in NZ was co-erced onto social media, would NZ need any institutions ?!
School of Engineering and Computer Science
eResearch at VUW: The eScience Consultant’s Tale

                                                   Cycle-stealing Grids

    ECS Grid: 300-ish Eng and Comp Sci UNIX workstations - SGE
    SCS Grid: 900/600/49 VUW public lab windows machines - Condor

Both of these grids are operated by the School of Engineering and Computer Science, not that
much "operating" is actually needed, given that they make use of existing resources. Early usage
of the SCS was UNIX/Cygwin, but a couple of more recent projects have been windows-native:

           The Zoological Society of London’s Colony program (genotype sibship and parentage)
           Compiled MATLAB simulations of historic stock pricing strategies

Recently there have been two "threats" to the continued operation of the SCS Grid as a research
resouce that can be used without jumping through (too) many hoops:

    Virtual packages
    Powersaving Initiatives
School of Engineering and Computer Science
eResearch at VUW: The eScience Consultant’s Tale

                                                   Cycle-stealing Grids: Threats

Virtual packaging supposedly removes the need to install software onto machines, however any
software deployed as such becomes invisible to the grid.

Solution to this has been to continue to use the virtual packaging "infrastructure" to install packages
which the grid would use physically, below C:\Grid\, thereby installing them twice

VUW’s central IT facilitator were recently asked to look at power saving across the "enterprise"

"Solutions" considered didn’t consider any effects to the SCS Grid but identified a commercial
 product that would do "what was required"

During a trial, a user of the SCS Grid became a "concerned" user of the SCS Grid and so,
somewhat belatedly, a proper investigation, as to how the free software scheduler and
commercial product play together, was started

Let’s take a look at what VUW’s central IT facilitator found, once they started looking
School of Engineering and Computer Science
eResearch at VUW: The eScience Consultant’s Tale

                                             Cycle-stealing Grids vs Power Saving

    Condor could already do everything that’s needed in terms of powering machines down
      Hardly a surprise - once/if you stop to think about it

In order to schedule jobs into idle cycles, across machines, a scheduler needs to know how
resources are being used and so, by running such a grid, you already know machines are,
and/or have been, idle and so could power them down, instead of accepting jobs

    Some infrastructural changes are needed to allow the free software to power machines up
      Bad! The grid makes use of spare cycles within existing infrastructure: can’t dictate it

    Some infrastructural changes are needed to allow the commerical product to power machines up
      Good! VUW’s IT purchasers also operate the infrastructure, so can just change it to fit.
      Better still? Wouldn’t you know it, these are the same changes Condor would need!

It’s possible to save power as a by-product of deploying free cycle-stealing grid software
      and yet VUW already was, before it started looking around for something to purchase
School of Engineering and Computer Science
eResearch at VUW: The eScience Consultant’s Tale

                                                   MWA-related activity

The Murchison Widefield Array (MWA) is one of the precursor projects to the Square Kilometer Array.

VUW is very fortunate to have Melanie Johnston-Hollitt within in the SKA project, as well as having
her current research group, based within VUW’s Science Faculty, working on data from the MWA
programme, a key component of which, a 24-node IBM iDataplex system, is Melanie’s

Melanie even "rescued for NZ", VUW’s initial MWA-related hardware, after it ended up in WA.

(Take away: You don’t mess with Melanie!)

Not content with just working on the MWA data, Melanie has also initiated the deployment of a
New Zealand "data node", running the NGAS platform, mirroring data from WA, along with MIT
and RRI in India, using hardware from SGI, who put MWA/SKA-related kit into iVEC.

Data node currently comprises an SGI IS5000 "tray", housing 96TB. (MWA data slated for ~160TB)

(Take away: SGI technical staff come highly recommended: their marketing department, and their support portal, may
need some work!)
School of Engineering and Computer Science
eResearch at VUW: The eScience Consultant’s Tale

                                                   MWA-related activity 2

The Twin 2U chassis has given us two separate nodes: one inside VUW and one outside

In order to avoid data bottlenecks at VUW’s edge, REANNZ helped Melanie facilitate what’s now,
 in effect, a 10GbE "Science DMZ", ie, avoiding VUW’s centrally facilitated IT infrastructure, (not
 currently able to offer 10GbE), and which, in it’s 1GbE days, we, VUW, and folk at UoA/CeR/NeSI
 had "maxed out" in some GridFTP testing.

(Take away: Mellanox IB adaptor firmware can be flashed to 10GbE - thanks: toddc@sgi.com )

When I say "faciliate" above, I’m leaving a lot left unsaid, not least some NZ eRes Symp 12 leftovers

Despite it’s rather fortutious birth, this 10GbE capability became, again somewhat serendipitously,
extremely useful, in this last year, for Nevile Brownlie’s group, up at UoA, wanting a platform for
network profiling research, although, when the two technical groups came to use the resources,
we found that UoA’s central IT people had taken away their old "research DMZ" capability, which
stalled the proto-collaboration, whilst the UoA end got back up to speed!

(Take away: Don’t ever let your central IT people anywhere near your research kit!)
School of Engineering and Computer Science
eResearch at VUW: The eScience Consultant’s Tale

                                                   Science Faculty HPC Facility

VUW’s Science Faculty only got an HPC Facility after
   a lecturer moved from Massey but, because of the then existing BeSTGRID community,
      was then able to contact VUW’s eScience Consultant to ask what resources VUW had and,
      on discovering that VUW had next to nothing, bar a grid of desktop PCs running windows,
      decided to not keep his head down at his new home, but to ask for some HPC kit, leading to
        (Doppler effect studies using the sirens of the paramedics heading to/from VUW)
      the Dean of the Science Faculty showing some vision, and seeing his Faculty obtain

25 computers, 52 CPUs, 624 cores, 1920 GB RAM, IB interconnect
    6-off, SGI C2112, 2x12-core AMD Opteron 6174, 64GB RAM (4-node units)
    1-off, SGI H2106, 4x12-core AMD Opteron 6174, 512GB RAM
    RHEL5 OS hosting an SGE local resource manager (note: no vendor appl. stack)
Since added:
           2-off, SGI C2112 2x16-core AMD Opteron 6174, 64GB RAM (4-node units)
           1-off, SGI ISS3500, which houses around 30TB storage
           OS upgrade to CentOS6
    So, currently at: 784 cores with access to 64GB, and the 0.5TB node.
School of Engineering and Computer Science
eResearch at VUW: The eScience Consultant’s Tale

                                                   Science Faculty HPC Facility: 2

Main users of the facility have been computational chemistry research groups within VUW’s
School of Chemistry and Physical Sciences (SCPS), one led by that lecturer who moved
from Massey, Matthias Lein, and more recently, Nicola Gaston, who moved into SCPS
from IRL, but brought her MacDiarmid Institute PI funding with her.

Their research focuses around Gaussian and VASP, respectively, although Nicola has been
looking at deploying Crystal but found the code’s Italian authors operate a "code of silence"

The large memory node, originally slated for use by VUW’s School of Biological Sciences (SBS),
gets, as a result of the researcher driving its acquisition leaving for ANU, sporadic use by SBS
researchers for BLAST searches, with a recently completed PhD project around protein-docking
studies using RosettaLigand although without a large memory footprint.

In terms of "really testing the beast" we’ve had to rely on an School of Enginerering and Computer
Science project which used COMSOL to study far-field superlens effects, which touched 360 GB.

Large usage has also come from our Faculty of Architecture and Design, where a combination
of GenOpt and EnergyPlus are being used for optimisation studies of the energy performance
School of Engineering and Computer Science
eResearch at VUW: The eScience Consultant’s Tale

                                                   Science Faculty HPC Facility: 3

So, the Facility has seen quite a range of projects and disciplines and so here are some
of the issues that people seem to have when using it

    Transition from PC to HPC
      From as simple as not really "getting" directories or submission scripts
      to treating the shared resource as their own and writing job-scheduling "daemons"
      (read: script-kiddy self-spawning python scripts)

    Knowledge transfer within research groups/communities
      New users in a research group look to the resource facilitator, not to their group, for initial help

      Equally, code authors happy to share the code, but not to correspond about it

      So, unless you can match the developement environment, you may not be able to run
       "open source" codes
School of Engineering and Computer Science
eResearch at VUW: The eScience Consultant’s Tale

                                               Two simple things people don’t get

Here’s a directory structure that helps people run Condor Grid programs and a schematic of
a cross-correlation
|                                              A   B   C   D    E
+--Colony2                                         1   2   3    4   A
|     +--- colony2s.exe impi.dll                       5   6    7   B
|            libguide40.dll   libiomp5md.dll               8    9   C
|            simu2.exe                                         10   D
|                                                                   E
+--RunSet01
      +---- submit.cmd
      |
      +--logs
      |
      +--0
      |   +-- INPUT3.PAR
      |
      +--1
      |   +-- INPUT3.PAR
      |
      +--2
      |   +-- INPUT3.PAR
...
      |
      +--999
          +-- INPUT3.PAR

makes use of               %CONDOR_PROCESS%, initialdir = $(Process)    and $SGE_TASK_ID respectively
School of Engineering and Computer Science
eResearch at VUW: The eScience Consultant’s Tale

                                                   Loss of BeSTGRID

Already touched upon the fact that a couple of folk from UoA/CeR/NeSI and from VUW had
to be talking, over dinner up at eResearch Australasia 2013, in Brisbane, in order for some
serendiptious collaboration around 10GbE networking between two sites down here in
South Eastern Australasia to get off the ground.

On Monday, I discovered that Landcare have been operating exactly the same piece of
new hardware that I’ve recently deployed - but which of us knew?

Yesterday it was suggested that SECS at VUW aren’t teaching UG students anything about
Grid/HPC concepts - we do

I also learned that a VUW researcher, running codes at a NeSI site, and whose PI has
 students running the same codes on the facility I look after has had that code profiled
 for them by NeSI - again, who knew?

It’s my belief that, "back in the day", I, and others, would not need to go to a once a year
 Symposium to find out what users of eResearch at their instution were doing, or go
 over "the ditch" to hear about NZ activities.
School of Engineering and Computer Science
eResearch at VUW: The eScience Consultant’s Tale

                                                   Loss of BeSTGRID 2

Yes, BeSTGRID was very much a "Best Efforts" institution, however it gave New Zealand
eResearch a level of collegiality that seems to have been lost, now that it’s gone.

    Once a month, interested parties from tertiary education and CRI-land would get together
      within a video conference and, whilst it was often the same people speaking, there
      was a conversation that got fed back into institutions, even if some of those institutions
      probably wished that the whole eResearch thing would go away and leave them to get
      on without any of this collaborative stuff

    Similarly, those interested parties were often "go to" folk for institutional users who wanted
      to make use of the very national level resources that their institutions were trying to ignore

    It might have been possible to have drawn a comparison between those, informal, lines of
      communication and the kind of eResearch support provision that we saw "across the ditch"

    There was a lot of information around joining into national-level collaborative efforts
      hosted on the collaboratively editable website technical.bestgrid.org
School of Engineering and Computer Science

                                   Colophon

Slides for this talk have been created and delivered using MagicPoint

                          http://member.wide.ad.jp/wg/mgp/

An X11 based presentation tool which has the slide sources in plain text
and which also provides for creation of an HTML slide-set.
You can also read