HIGH PERFORMANCE COMPUTING for SCIENCE & ENGINEERING (HPCSE) I - HS 2021 TUTORIAL 01: BASICS

Page created by Jeff Schneider
 
CONTINUE READING
HIGH PERFORMANCE COMPUTING for SCIENCE & ENGINEERING (HPCSE) I - HS 2021 TUTORIAL 01: BASICS
HIGH PERFORMANCE COMPUTING for SCIENCE &
                       ENGINEERING (HPCSE) I
                                            HS 2021

                                  TUTORIAL 01: BASICS

Ermioni Papadopoulou

Computational Science and Engineering Lab               24.09.2021
ETH Zürich
HIGH PERFORMANCE COMPUTING for SCIENCE & ENGINEERING (HPCSE) I - HS 2021 TUTORIAL 01: BASICS
Outline

          I. Course organization: Piazza / Moodle / website
          II. Git
          III. Makefiles
          IV. Euler Cluster: usage
          V. C++ refresher
I. Course Intro & Organization
TEACHING ASSISTANTS
 ‣ Ermioni : permioni@ethz.ch                  ‣ Johan      : johan.lokna@math.ethz.ch
 ‣ Pascal : webepasc@ethz.ch                   ‣ Pascal AdM : pascalau@student.ethz.ch
 ‣ Daniel : wadaniel@ethz.ch                   ‣ Man Hin : chengman@student.ethz.ch

ASK QUESTIONS!             ➡ Email us for administrative issues
                           ➡All other questions/ inquiries: PIAZZA forum:
                              https://piazza.com/class/kqteyavwboq4ad
                           ➡Also, TA office hours will be posted soon!

HOMEWORKS         Not obligatory, can add BONUS to your final grade
                  final grade = max{exam grade, (20 % hw grade + 80 % exam grade)}
                      Submit homework 1-6 via the Moodle ETHZ app:
                      https://moodle-app2.let.ethz.ch/course/view.php?id=15876
II. Git repository
            COURSE REPOSITORY https://gitlab.ethz.ch/hpcseI_hs21/lecture
How does Git work?                      Why Git?
                                        ✴ On of the most popular version control systems
                                         (VCS) today!
                                        ✴ One can save their work and work safely remotely
                                         and concurrently between multiple users
                                        ✴ ETH provides access to gitlab.ethz.ch to all ETH
                                         members
                                        ✴ Available on ETH clusters & Linux environments

                                          HOW TO USE Git        https://git-scm.com
                                                               https://git-scm.com/doc

                                        HOW TO INSTALL Git
                                                   https://github.com/git-guides/install-git
git clone https://gitlab.ethz.ch/hpcseI_hs21/lecture           git pull
III. Makefiles
                            compiler             CC=g+
                                                 CFLAGS=-O3 -Wall -Wextra -Wpedanti                           dependencies
CFLAGS: compiler
directives/ options targets
                                                 all: factoria

Possibly LDFLAGS:                                factorial.o: factorial.cp
                                                                                                               Compile commands
list of link directives                TAB!               $(CC) -c -o factorial.o factorial.cpp $
                                                 (CFLAGS)
                                                                                             Convert .cpp code to object code
                                                 factorial: factorial.                         -c: generate object file
                                                         $(CC) -o factorial factorial.o
                                                                                               -o: put the compilation in the file factorial.o
                                                 clean:                                  Link the object code together into executable
                                                          rm -f *.o *~ factorial
                  Makefile 1 for factorial.cpp .PHONY: al
                                                 .PHONY: clean
COMMON MAKEFILE CFLAGS
                                                                                                               USAGE
- O3 : optimize generated code (O0: not optimise code)
                                                                                   make  : execute a specific target
- g :compile with debug information
                                                                                     make clean : remove all object and executable files
- std=c++11 use the c++11 standard language definition
                                                                                     make : will execute the first target in the file
- I$(DIR) : include $(DIR) containing auxiliary header files, other than /
  usr/include                                                                      LDFLAGS EXAMPLES
- Wall : give verbose compiler warnings, include common warnings                   - LDFLAGS=-lm : link the C math library
- Wextra : enable extra warnings                                                   - LDFLAGS=-L../lib -L: for other  than /
- Wpedantic : turns off extensions and generates more warnings                       usr/lib
      +

             l

                    l

                             o

                                  p

                                            c
III. Makefiles
                                         CC=g+
                                                                                      Makefile 2 for factorial.cpp,
                                         CFLAGS=-O3 -Wall -Wextra -Wpedanti
         Dependency files                OBJ=factorial.                               where factorial(int n) is a function
         - all include files             DEPS=factorial.hp                            that calculates the factorial of an integer
                                                                                      number.
                                         all: factoria

                                         %.o: %.cpp $(DEPS)                           Also in the directory, there exists a header
                                                 $(CC) -c -o $@ $< $(CFLAGS)          file: factorial.hpp, with function
                                                                                      declaration of factorial(int n)
                                         factorial: $(OBJ)
Suffix replacement rule:                         $(CC) -o $@ $^ $(CFLAGS)
- .o file depends upon the .cpp
  version of the file and the .hpp clean:
  (DEPS) files
                                             rm -f *.o *~ factorial                     General compilation rule:
                                                                                        - $@ $^ : macros for the left and right
- To generate the .o file, compile .PHONY: al
                                    .PHONY: clean                                         sides of :, i.e. here: $@=factorial and
  the .cpp file using the (CC)
                                                                                          $^=(OBJ)
  compiler.
- -o $@ : put the compilation in the file named on the left of : i.e. %.o
- $< : first item in dependencies list
                                                                    https://www.gnu.org/software/make/manual/make.html#Makefiles
                          MAKEFILES TUTORIALS                               https://cs.colby.edu/maxwell/courses/tutorials/maketutor/
                                                            https://web.stanford.edu/class/archive/cs/cs107/cs107.1194/resources/make
    +

           l

                l

                     o

                           p

                                                c
IV. Euler cluster : usage
                                                           DETAILS
                                                       https://scicomp.ethz.ch/wiki/Euler

                                                                                                          CPU core
                                                                                                          C++, SIMD

      Euler
                                                                                         Opteron   Opteron   Opteron   Opteron    Opteron   Opteron

               Cluster: network of nodes
                                                                                          2435      2435      2435      2435       2435      2435
                                                                                          L1        L1        L1        L1         L1        L1                                           Tesla T10
                                                                                         Cache     Cache     Cache     Cache      Cache     Cache
                                                                                          L2        L2        L2        L2         L2        L2

               Distributed memory                                                        Cache     Cache     Cache     Cache      Cache     Cache
                                                                                                                                                                                          Tesla T10

                                                                       2x 64bit Memory

                                                                                                                                                      HyperTransport
                                                DDR2 800
                                                                                                                 L3 Cache

                                                                          Controllers
                                                                                                                                                                                  PCIe
               MPI                                         10.66GB/s
                                                                                                              SRI / Crossbar
                                                                                                                                                                          HT
                                                                                                                                                                       9.6GB/s
                                                                                                                                                                                 Bridge

                                                                                         HyperTransport       HyperTransport        HyperTransport

                                                                                                                                                                                          Tesla T10
                                                                                                                           HT
                                                                                                                        9.6GB/s
                                                                                                                                                                                          Tesla T10
                                                                                         HyperTransport       HyperTransport        HyperTransport

                                                                       2x 64bit Memory

                                                                                                                                                      HyperTransport
                                                DDR2 800
                                                                                                                                                                          HT

                                                                          Controllers
                                                           10.66GB/s                                                                                                   9.6GB/s
                                                                                                                                                                                  PCIe
                                                                                                              SRI / Crossbar
                                                                                                                                                                                 Bridge

                                                                                          L2        L2        L2
                                                                                                                 L3 Cache

                                                                                                                        L2         L2        L2
                                                                                                                                                                                                      GPUs
                                                                                                                                                                                          Tesla T10
                                                                                                                                                                                                      CUDA
                                                                                         Cache     Cache     Cache     Cache      Cache     Cache
                                                                                          L1        L1        L1        L1         L1        L1
                                                                                         Cache     Cache     Cache     Cache      Cache     Cache                                         Tesla T10
                    Node: multiple processors                                            Opteron
                                                                                          2435
                                                                                                   Opteron
                                                                                                    2435
                                                                                                             Opteron
                                                                                                              2435
                                                                                                                       Opteron
                                                                                                                        2435
                                                                                                                                  Opteron
                                                                                                                                   2435
                                                                                                                                            Opteron
                                                                                                                                             2435

                    Shared Memory                 Details
                                                                                                                                                                                 (Leonhard)
                                                      - Effective bandwidth with 12 cores: 20GB/s (STREAM Benchmark)

                    C++ threads, OpenMP
IV. Euler cluster : usage
DOCUMENTATION
https://scicomp.ethz.ch/wiki/User_documentation
                                              module avail                      cluster
                                              module list                       nodes
                                              module load gcc

                               ssh euler                        bsub
                       PC                         login
                                                                       queue
                                                  node
                                                                               cluster
                                                                               nodes
                                           compile and link
                                           check output

 STEPS
  1. Connect to a login node of Euler
                                                                               cluster
  2. Develop your code (copy files or edit)                                    nodes

  3. Compile your program
  4. Submit a job / run your program on compute nodes
  5. Check your job (status and output)
IV. Euler cluster : usage
1. Connect • ssh -Y @euler.ethz.ch
                 • -Y: Enables trusted X11 forwarding
                 • Access to one of the Euler login nodes
                 • https://scicomp.ethz.ch/wiki/Accessing_the_clusters#SSH to set up SSH keys to
                   login without typing password every time.
2. Develop     • Copy your files to Euler, e.g.
                 • scp code.tar.gz @euler.ethz.ch:code.tar.gz
                 • scp -r  @euler.ethz.ch:/cluster/scratch/
                   
                 • better: use Git!
               • Use a text editor to write/modify your code, i.e. vi(m), emacs, nano
               • OR mount your /scratch folder locally
                 • sshfs @euler.ethz.ch:/cluster/scratch//
                   
                 • https://www.digitalocean.com/community/tutorials/how-to-use-sshfs-to-
                   mount-remote-file-systems-over-ssh
IV. Euler cluster : usage
3. Compile • You need the appropriate programming tools and libraries to compile your code
                  • Just load the environment module you need
                  • Examples:
                    • module load gcc            (newer version of gcc)
                    • module load open_mpi       (MPI library)
                    • module list                (shows loaded modules)
                    • module avail               (what is available)
                    • module unload gcc          (unloads a module)
                    • Compile your code and produce the executable
                    • Example:
                      • g++ cputest.cpp -o cputest
                    • Use Makefiles!
IV. Euler cluster : usage
4. Submit your job  • login nodes are only for development
                          • The program must run on a compute node
                          • To do that, you must use the bsub command
                             bsub -W 00:10 -n 1 ./cputest
                          • You can submit script files too: bsub < job.sh
                          • You can also get an interactive shell on a compute node
                            • bsub -W 00:10 -n 1 -Is bash

5. Check your job       • Useful commands
                          • bqueues: displays information about queues
                          • bjobs: displays information about jobs (bjobs -l -u )
                          • bkill : kills a job

                        • Output files
                          • lsf.o: created in your working directory when the job finishes
                           • includes information about your job (statistics, etc.) / messages the
                             program prints (standard output)
V. C++ refresher
 We want to calculate and print the factorials of all numbers up to one non-negative integer number N.
                         The factorial of each number n ≤ N is calculated as:
                           n! = n ⋅ (n − 1) ⋅ (n − 2)⋯3 ⋅ 2 ⋅ 1 and 0! = 1
                                                            int product = 1
                                                            for(int n = 0; n
You can also read