How to run GENESIS on Fugaku - C. Kobayashi Riken Center for Computational Science

Page created by Gene Norris
 
CONTINUE READING
How to run GENESIS on Fugaku - C. Kobayashi Riken Center for Computational Science
How to run GENESIS on Fugaku
               C. Kobayashi

    Riken Center for Computational Science
                  2021/01/18

                                             1
How to run GENESIS on Fugaku - C. Kobayashi Riken Center for Computational Science
Contents
•   Purpose of this meeting
•   Get source code of GENESIS 2.0beta
•   Install & Run
•   Get good performance
•   Trouble shooting

                                         2
How to run GENESIS on Fugaku - C. Kobayashi Riken Center for Computational Science
Purpose of this meeting
• I talk about usage of GENESIS on Fugaku
• Basic usage of GENESIS is out of focus
• For the basic usage, please check Usage/Tutorial & Samples in
  the GENESIS website and manual.
    https://www.r-ccs.riken.jp/labs/cbrt/

                                                              3
How to run GENESIS on Fugaku - C. Kobayashi Riken Center for Computational Science
Section 1

Get GENESIS 2.0 beta code

                            4
How to run GENESIS on Fugaku - C. Kobayashi Riken Center for Computational Science
What is GENESIS      2.0      beta?
                            Jung et al. J. Comp. Chem., https://doi.org/10.1002/jcc.26450
GENESIS 2.0beta has following features;
1. Selecting the suitable nonbond kernel code for architecture & # of MPIs
2. MD integrator with large time step (multiple time step, MTS)
   (Jung et al. in submitted)
Enable functions = SPDYN & all atom FF (CHARMM/AMBER FF)
• MD, Minimization
• Simulations with replicas: REMD, REUS, gREST, GaMD, String method
Unavailable function in SPDYN
• Leapfrog integrator
• Langevin thermostat
• FEP (Please use 1.5.1 code)
                                                                                   5
How to run GENESIS on Fugaku - C. Kobayashi Riken Center for Computational Science
How to get GENESIS 2.0 beta code (1)
1.   Push ‘Download’ tab in GENESIS site
2.   Go to GENSIS 2.0beta page in GENESIS site
3.   Go to GitHub site;
     https://github.com/genesis-release-r-ccs/genesis-2.0

                                                            GitHub site

                                                                          6
How to run GENESIS on Fugaku - C. Kobayashi Riken Center for Computational Science
How to get GENESIS 2.0 beta code (2)
A. Download code on Fugaku directly. (recommended)

                                                          Push here

                                                          If you push it, the URL is copied

On fugaku, please execute it
    % git clone https://github.com/genesis-release-r-ccs/genesis-2.0.git

 Attention! It is strictly forbidden to put your private key on Fugaku.
 Do not use git via ssh on fugaku.
    % git clone git@github.com:genesis-release-r-ccs/genesis-2.0.git
                                                                                        7
How to run GENESIS on Fugaku - C. Kobayashi Riken Center for Computational Science
How to get GENESIS 2.0 beta code (2)
B. Quick download

                                  Push here

                                                8
How to run GENESIS on Fugaku - C. Kobayashi Riken Center for Computational Science
Section 2

Install & Run

                9
GENESIS 2.0beta source-tree
In top directory:
                       Manual and brief user guide (PDF files)
                       Source codes of GENESIS 2.0beta (only lib and spdyn)
                       Compile test
          …
                       Quick guide for installation in ASCII text file

          …
                       File for generation of configure

                                                                              10
Compile GENESIS on Fugaku
If you get source code using ‘git’ command
    % git clone https://github.com/genesis-release-r-ccs/genesis-2.0.git
    % cd genesis-2.0

If you get the zip file from github
    % unzip genesis-2.0-master.zip
    % cd genesis-2.0-master

Compile
    %   autoreconf
    %   ./configure --enable-single --host=Fugaku
    %   make         You can choose the following options;
    %   make install     --enable-mixed
                               --enable-double (default)
                           Please check doc/GENESIS-2.0.pdf(section 2.2.3).   11
Compile test of GENESIS on Fugaku
Compile
 % cd ../tests/regression_test
 (make job script)
 % pjsub regression.sh
regression.sh
 #!/bin/sh
 #PJM -L "rscgrp=eap-small"
 #PJM -L "rscunit=rscunit_ft01"
 #PJM -L "node=2"                    Explanations of options in script
 #PJM --mpi "proc=8"                 will be shown in next pages
 #PJM -L "elapse=00:30:00"
 #PJM -j
 #PJM -S
 module switch lang/tcsds-1.2.28a
 export OMP_NUM_THREADS=12
 export PLE_MPI_STD_EMPTYFILE=off
 bindir=PLEASE_INSERT_GENEIS_PATH
 python2 ./test.py "mpiexec ${bindir}/spdyn " fugaku > regression.log    12
Job script of GENESIS
A example of job script in
 #!/bin/bash
 #PJM -L "rscgrp=eap-small"                         Resource group (can be changed) (required)
 #PJM -L "rscunit=rscunit_ft01"                     Resource unit (can be changed) (required)
 #PJM -L "node=16"                                  # of nodes (required)
 #PJM --mpi "proc=128"                              # of MPI processes (required)
 #PJM -L "elapse=20:00"                             Elapse time(required)
 #PJM -j
 #PJM -S                                            Output stat information (optional, however, I strongly recommend)

 pdir=PLEASE_INSERT_GENEIS_PATH
 module switch lang/tcsds-1.2.28a Setting of development environment (can be changed)
 export OMP_NUM_THREADS=6
 export PLE_MPI_STD_EMPTYFILE=off Disable empty stdout files for each process
 mpiexec -stdout run_fep1.out $pdir/spdyn input/run_fep1.inp

   # stdout file name (Default is (jobscript).(job ID).out.(0 (stdout) or 1 (stderr)).(process ID))
                                                                                                                13
Useful commands on Fugaku
How to confirm your current loaded environment
   % module list
How to confirm available environment             Details in “Use and job execution”
                                                 Section 4.6
   % module avail

How to check cpu time of your group
   % accountj –h –E –g group_name
     Please check [SUBTHEME_PERIDO] section       Details in “support tools user guide”
How to check disk usage (you & group)             Sections 3.1.7 & 3.1.8
   % accountd

                                                                                14
Section 3

Get good performance

                       15
How to get good performance on Fugaku
   We need to meet the following conditions
   1.    Please use suitable parameter sets. (You can check it from
         GENESIS benchmark pages.)
   2.    Proper calculation (# of MPI/OMP, version of compilers…)
                                           At this moment, development
                                           environments are also under construction.

To check if calculation is proper, benchmark is important!
                                                                               16
How to get benchmark
• Run simulations with different # of nodes, # of MPI processes,
  and # of OMP threads and check the scalability
• 5000 ~ 10000 steps are enough in most cases.
• In GENESIS, please check in “dynamics” instead of “total time”
[STEP6] Deallocate Arrays

Output_Time> Averaged timer profile (Min, Max)
                                                           This is time for main loops for MDs.
  total time
    setup
                  =
                  =
                        104.674
                         24.235
                                                           Please use it to estimate your total
    dynamics
      energy
                  =
                  =
                         80.439
                         62.515
                                                           simulation time.
      integrator =        8.233
      pairlist    =       3.957 (       3.454,   4.242)
  energy
    bond          =       0.120 (       0.003,    0.315)
    angle         =       0.361 (       0.017,    0.844)
    dihedral      =       1.177 (       0.047,    2.678)
    nonbond       =      50.372 (      47.778,   51.623)
      pme real    =      41.143 (      36.091,   43.540)
      pme recip   =       9.215 (       8.017,   11.668)
(skip)

                                                                                              17
Why my simulation is so slow?
Please do following points before consulting with someone.
    1. Check your parameters in control file and simulation condition.
         •   Please check parameters and performance in the benchmark site.
         •   Do you compile GENESIS in ‘recommended’ way?
             (Please do not set FCFLAGS or CCFLAGS by yourself.)
    2.   Run benchmarks with different sets of MPI/OMP cores
         •   Fugaku has 4 CMGs (core memory group) with 12 cores. → OMPs > 12
             is not effective.
         •   Suitable ratios of MPI/OMP are difference in simulation size and # of
             nodes (In general, smaller OMP is preferred in smaller nodes)
    3.   Find which part is bottle-neck.

                                                                                     18
How to find bottle-neck
[STEP6] Deallocate Arrays

Output_Time> Averaged timer profile (Min, Max)
  total time
    setup
                  =
                  =
                        104.674
                         24.235
                                                           Check point    : “Energy” is bottle-neck?
    dynamics      =      80.439
      energy      =      62.515
                                       min       max                 Check point    : Real or Recip?
      integrator =
      pairlist    =
                          8.233
                          3.957 (       3.454,   4.242)          Y
  energy
    bond          =       0.120 (       0.003,    0.315)
                                                                     Check point : Differences
    angle
    dihedral
                  =
                  =
                          0.361 (
                          1.177 (
                                        0.017,
                                        0.047,
                                                  0.844)
                                                  2.678)
                                                                     in process are large or small?
    nonbond       =      50.372 (      47.778,   51.623)
      pme real
      pme recip
                  =
                  =
                         41.143 (
                          9.215 (
                                       36.091,
                                        8.017,
                                                 43.540)
                                                 11.668)
                                                                     Check point : Constraint or
    solvation
      polar
                  =
                  =
                          0.000 (
                          0.000 (
                                        0.000,
                                        0.000,
                                                  0.000)
                                                  0.000)
                                                             N       communication?
      non-polar   =       0.000 (       0.000,    0.000)
    restraint     =       0.000 (       0.000,    0.000)             Check point : Differences
    qmmm          =       0.000 (       0.000,    0.000)
  integrator                                                         in process are large or small?
    constraint    =       1.884 (       1.613,   2.082)
    update        =       2.317 (       2.180,   2.454)
    comm_coord
    comm_force
                  =
                  =
                          0.994 (
                          2.955 (
                                        0.697,
                                        1.450,
                                                 1.534)
                                                 4.636)
                                                           Check point : Bottle-necks are difference
    comm_migrate =        0.115 (       0.070,   0.168)    between sets of MPI/OMP cores?
    communication
                                                                                                  19
Other check points
•   Please try benchmark of a few sets two or three times.
•   If differences of execute time in processes (   ,   in page 28)   are too large (> 3 times),
    you may doubt that the machine has hardware/network troubles.
•   If simulation is too slow while times in log file is not slow, you may doubt that
    HDD or network has troubles.

•   Performance drops when you set small rstout_period (< 1000).

•   If (Real >> Recip), “respa (elec_long_period >1)” may not be efficient.

                                                                                            20
Section 4

Trouble shooting

                   21
When you meet a trouble with GENESIS
Please do following points before consulting with the developers.
    1.   Please read your log files carefully and find out which part of
         calculation/compilation failed.
         A) Compile: configure log(‘config.log’) and compiler messages
         B) Calculation: outputs, script.$(jobid).out, and script.$(jobid).stats
    2.   Check your parameters in control file and simulation condition related
         to your error logs.
    3.   But, you don’t need to read source codes.
                                                                                   22
Other check points (1)
ü   Did you check GENESIS web site & manual carefully?
    Your parameter and/or usage may not be allowed.
ü   Do you use recent source code?
    Your problem might be fixed in the recent version.
ü   Do you use Fugaku properly?
    You can check usage in the portal site. (English documents are prepared.)
ü   Do you select proper development environment and binary?
    In many cases, old binary does not work in newer environments.
    Administrator of Fugaku suggests re-compile your code when the environment is
    updated.
                                                                                    23
Other check points (2)
ü    Is memory usage less than 28GiB?
     Fugaku has 32GiB memory, however, only ~28GiB can be used in calculation.
     Please check your memory usage (MAX MEMORY SIZE (USE) ) in ‘stats’ file.
ü Did the job exit within a calculation time written as “elapse=“ in script?
ü Do you set correct shape of nodes ("node=lxnxm”, l, n, m=numbers) in
    current development environment. (in particular, use of multiple replicas)
     Node shape is frequently changed.
     Please check current node shape from the portal site.
ü Please try the job again.
     Fugaku is also under development. Job sometimes fails due to unknown reason.
                                                                                    24
How to contact us
About GENESIS: GENESIS forum; (we have two BBS rooms; English & Japanese)

                     Forum

About Fugaku (for users): HPCI: helpdesk_at_hpci-office.jp
                           Others : r-ccs-ungi-support_at_riken.jp
Questions in user briefing (held every month) are welcome.
                                                                      25
You can also read