Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

01 ParProg20

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Winter term 2020/2021

Parallel Programming with OpenMP and MPI


Dr. Georg Hager
Erlangen Regional Computing Center (RRZE) at Friedrich-Alexander-Universität Erlangen-Nürnberg
Institute of Physics, Universität Greifswald

Lecture 1: Preliminaries (kick-off meeting)


Audience and contact
▪ Audience
▪ Physics, theoretical chemistry, computer science, applied math, materials
science, “computational XYZ”
▪ Everyone who
▪ Needs more computing power than what a laptop/PC can provide
▪ Wants to learn about parallel programming from desktop to supercomputers
▪ Lecturer
▪ Georg Hager georg.hager@uni-greifswald.de
▪ Associate lecturer at University of Greifswald, Institute of Physics
▪ PhD 2005, Habilitation 2014 (both in Greifswald)
▪ Contact: Preferably use the Moodle forum
▪ Moodle course: http://tiny.cc/ParProg20

Parallel Programming 2020 2020-10-13 2


Course format
▪ Online lecture
▪ 2 hours (90 minutes) per week
▪ Lecture video published every Monday in moodle

▪ Exercises
▪ One exercise sheet every week
▪ Solutions will be discussed in Q&A (no submits necessary)

▪ Online Q&A session (via BBB) with discussion of exercises


▪ Tuesday 3 p.m.

▪ All material (slides, videos, exercises) available at http://tiny.cc/ParProg20


Parallel Programming 2020 2020-10-13 3
Course prerequisites
▪ Lecture:
▪ Some C, C++, or Fortran programming
▪ Examples are in (simple) C or Fortran

▪ Exercises:
▪ Linux command line (including remote access via SSH)
▪ Recommended Windows tool: MobaXTerm (https://mobaxterm.mobatek.net/)
▪ Handling a compiler on the command line
▪ You will get accounts for accessing the HPC clusters at RRZE (FAU Erlangen-
Nürnberg)

▪ Linux tutorial for n00bs: https://ryanstutorials.net/linuxtutorial/


Parallel Programming 2020 2020-10-13 4
Supporting material
▪ G. Hager and G. Wellein:
Introduction to High Performance Computing
for Scientists and Engineers.
CRC Computational Science Series, 2010.
ISBN 978-1439811924
▪ Documentation:
▪ https://www.openmp.org
▪ https://www.mpi-forum.org

▪ The big ones and more useful HPC-related


information:
▪ https://www.top500.org/

Parallel Programming 2020 2020-10-13 5


Outline of lecture
▪ Basics of parallel computer architecture
▪ Basics of parallel computing
▪ Introduction to shared-memory programming with OpenMP
▪ OpenMP performance issues
▪ Introduction to the Message Passing Interface (MPI)
▪ Advanced MPI
▪ MPI performance issues
▪ Hybrid MPI+OpenMP programming

▪ Goal: A good grasp of the potentials and performance issues of parallel


computing in computational science

Parallel Programming 2020 2020-10-13 6


Supercomputing
HPC applications
© WW1,

▪ What are supercomputers good for?


FAU

▪ Weather and climate prediction


▪ Drug design
▪ Simulation of biochemical reactions
▪ Processing and analysis of measurement data
▪ Properties of condensed matter
▪ Fundamental interactions and structure of matter © T. Exner, Molcad GmbH

▪ Fluid simulations, structural analysis, fluid-structure interaction


▪ Mechanical properties of materials
▪ Rendering of 3D images and movies
▪ Simulation of nuclear explosions
▪ Medical image reconstruction
▪ …

Parallel Programming 2020 2020-10-13 8


HPC algorithms
▪ Whatever the application, there’s usually a numerical algorithm behind it
▪ Computational science → many standard algorithms
▪ “Seven dwarfs”
1. Dense linear algebra
2. Sparse linear algebra
3. Spectral methods
4. N-body methods
5. Structured grids
6. Unstructured grids
7. Monte Carlo methods See also:
The Landscape of Parallel Computing Research:
A View from Berkeley, Chapter 3
Parallel Programming 2020 2020-10-13 9
Parallel computing
Task: Map a numerical algorithm to the hardware of a parallel computer

𝑣𝑖 = ෍ 𝐴𝑖𝑗 𝑏𝑗 ???
𝑗=1

Goal: Execute the task as fast and effective as possible


Parallel Programming 2020 2020-10-13 10
Parallelism in modern computers
Core Node (2 sockets + memory + I/O,
Registers Exec. units possibly multiple chips

Memory
Socket
per socket)
L1 cache
L2 cache

Memory
Socket
core core core core
core core core core …
core core core core
Supercomputer
L3 cache (many nodes, high-performance
network, storage)
Chip (up to 64 Cores)
Parallel Programming 2020 2020-10-13 11
The Top500 list
▪ Survey of the 500 most powerful supercomputers
▪ http://www.top500.org
▪ Performance ranking?
▪ Solve large dense system of equations: 𝐴𝑥 = 𝑏 (“LINPACK”)

▪ Max. performance achieved with 64-Bit floating-point numbers: 𝑅𝑚𝑎𝑥


▪ Published twice a year (ISC in Germany, SC in USA)

▪ First: 1993 (#1: CM5 / 1,024 procs.): 60 Gflop/s


▪ June 2020 (#1: Fugaku / 7.3 mio procs): 415.5 Pflop/s

▪ Performance increase: 79% p.a. from 1993 – 2020


Parallel Programming 2020 2020-10-13 12
What is “performance”?

Performance metric:
“Flops” (+ - * /)
Lattice site updates
Iterations
“Solving the problem”...

Work
𝑃=
Time
“Wall-clock time”

Parallel Programming 2020 2020-10-13 13


The flop is quite popular…
▪ Flop == Floating-point operation (add, subtract, multiply, divide)
▪ Flop/s == “how many flops can be done per second?”

▪ How many flops can be done by a machine at most (“peak performance”)?


▪ Depends on accuracy of input operands (double, float, half-precision)
▪ Divides are slow and thus usually neglected

▪ Some double-precision peak numbers to get you orientated…


▪ Top500 range (June 2020): 2.6 Pflop/s … 514 Pflop/s
▪ Modern multicore server CPU (AMD Rome 7742): 2.3 Tflop/s
▪ Your PC: 100 … 500 Gflop/s (+ GPU 0.5 … 10 Tflop/s)
▪ Your cellphone: 5 … 50 Gflop/s
Parallel Programming 2020 2020-10-13 14
Supercomputing in Germany
Jülich Supercomputing Center:
JUWELS (9.9 PF/s)

Hannover Berlin

JSC RRZE (0.5 PF/s)

LRZ
HLRS Leibniz Supercomputing
Center: SuperMUC-NG
(26.8 PF/s)
HLRS: Hawk (26 PF/s)

Parallel Programming 2020 2020-10-13 15


RRZE “Meggie” cluster (you will get access to this!)
▪ 728 Compute nodes (14.560 cores)
▪ 2x Intel Xeon E5-2630 v4 (Broadwell) 2.2 GHz (10 cores)
▪ 20 cores/node
▪ 64 GB main memory per node
▪ No local disks
▪ Peak Performance: 𝑅𝑝𝑒𝑎𝑘 = 0.5 Pflop/s

▪ #346 @TOP500 (Nov. 2016)


▪ 𝑅𝑚𝑎𝑥 = 0.48 Pflop/s

▪ Price tag: 2.5 million €


▪ Power consumption: 120 kW – 210 kW (depending on workload)
Parallel Programming 2020 2020-10-13 16
Power consumption of RRZE HPC systems (last 7 days)

Parallel Programming 2020 2020-10-13 17


Power consumption of supercomputers
▪ Cost of electrical energy (example FAU): 20 ct/kWh
▪ 1 MW of power costs 1.8 million € per year
→ cost of electrical power over lifetime ≈ investment sum
▪ This does not include the cost for cooling (may be 5% … 150% of electrical
power)
▪ ≈ 1000 €/a for a typical
server

▪ Other countries have


different boundary
conditions
▪ US: 7ct/kWh for industrial
customers (2019)
Parallel Programming 2020 2020-10-13 18
Take-home messages
▪ Supercomputers are parallel computers
▪ No parallelism → no performance
▪ It’s your task to write parallel code (or use parallel programs that someone else
wrote)
▪ Even your desktop PC is a parallel computer nowadays

▪ Supercomputers are expensive


▪ … to buy
▪ … and to run,

so their efficient use is paramount


▪ → learn how to write efficient parallel programs

Parallel Programming 2020 2020-10-13 19

You might also like