Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Introduction To HPC: Content and Definitions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Introduction to HPC

content and definitions


Jan Thorbecke, Section of Applied Geophysics

Delft
University of
Technology

Challenge the future


Motivation and Goal
Get familiar with hardware building blocks, how they operate, and
how to make use of them in writing efficient programs.

Understand the function of an Operating System (Linux).

Participants should be able to write, compile and run numerical


code efficiently on standard hardware.

Start to think how to solve problems in Parallel (try to avoid


thinking in sequential loops).

During the course keep in mind a numerical problem which you


would like to implement/improve.

2
Learning Outcomes
Understand the architecture of modern CPUs and how this
architecture influences the way programs should be written.

Optimize all aspects in the processes of programming,


compilation, starting program, running program by OS, executing
(parallel) instructions by CPU to writing output to disk.

Write numerical software, which exploits the memory hierarchy of


a CPU to obtain a code with close to optimal performance.

Analyze an existing program for OpenMP and MPI parallelization


possibilities.

Evaluate the possibilities of accelerators to speed up


computational work.
3
Organization

Dates: 7, 12, 21 May and 11, 18 June 2015

Time: 13:30 17:00

Points: 2

Examination: attendance, exercises and discussions during the


course. For MSc students multiple-choice exam with discussion
afterwards.

4
Schedule and rooms

CITG Building:
Stevinweg 1
2628 CN Delft

7 May room 2.02


12 May room 2.99
21 May room 2.02
11 June room 2.02
18 June room 2.02

5
What is HPC

High-performance computing (HPC) uses supercomputers and computer


clusters to solve advanced computation problems. Today, computer systems in
the teraflops-region are counted as HPC-computers. (Wikipedia)

A list of the most powerful high performance computers can be found on the
TOP500 list. The TOP500 list ranks the world's 500 fastest high performance
computers as measured by the HPL benchmark.

For this course: Scientific programs written for optimal use of


computational hardware.

6
Four Components Computer System
user user user user
1 2 3 n

IO fdelmodc compiler text

system and application programs

operating System

hardware

7
Computer System Structure
Computer system can be divided into four components

Hardware provides basic computing resources


CPU, memory, I/O devices

Operating system
Controls and coordinates use of hardware among various applications
and users

Application programs to solve the computing problems of users


Word processors, compilers, web browsers, database systems, games

Users
People, machines, other computers

8
Contents
Introduction (1)
Hardware (1,2)
OS (2)
Programming (2/3)
Optimization (3)
Multi-processor hardware(3/4)
Writing Parallel programs (4/5)
Parallel programming + wrap-up and discussion (5)

9
Part 1
Components of a computer system
addressing
memory hardware
Memory Hierarchy
types of cache mappings
cache levels
virtual memory, TLB
Processors
out of order
pipelining
branches
multi-core cache-coherency
SSE2/3
Modern processors
SansyBridge, Power7, Interlagos, GPU
Future trends in hardware

10
Part 2
Operating System

IO management

Processes
IPC, RPC

Virtual memory management


allocation
dynamic static
page replacement policies

File systems
storage, RAID

Linux, Unix, OSX, XP

11
Part 3
User environment
login, shells

Compiling and linking


compile stages
Makefiles

Number representations
IEEE 754 standard

Programming languages
C, Fortran
profiling and debugging
numerical libraries

12
Part 4
Computer Performance
Amdahls law
standard benchmarks Spec, Linpack
Optimization
cache blocking
loops
examples

13
Part 5
Programming for parallel systems

Multi-processor hardware
Classification
HPC hardware

Programming for parallel systems


OpenMP
MPI
examples

Finish the last parts of the course

Followed by questions, discussion and evaluation of exercises

14
The Exercises

Exercises are set up to get familiar with concepts and ideas. If


you already have enough experience with some parts in the
exercise you can skip those parts.

Exercises are simple and straightforward, no programming skills


are needed, and consists of compilation and running of small
programs. Every exercise comes with a step by step procedure to
follow.

The exercises are hopefully fun to do and are not obligatory to


pass the course, but highly recommended.

15
The slides

There are quite a few of them...

Not all of them will be discussed

Sometimes techniques and explained in two different ways, one


will be used in the lectures, the other one can be used to
understand the concepts better.

At the moment there are no lecture notes and the detailed slides
try to compensate that.

16
Definitions and Terms

bit = binary number with states 0 or 1


byte = 8 bits
word = 4 or 8 bytes (the number of bits used in a register)
Kbit = 1000 bits, used for communication speeds
KiB = 1024 (210) Bytes, used for memory and storage
MiB = 1048576 (220) Bytes
flops = floating point operations per second
CPU = Central Processing Unit
FPU = Floating Point Unit
ALU = Arithmetic Logic Unit
AGU = Address Generation Unit

http://physics.nist.gov/cuu/Units/binary.html

17
Definitions

Latency: Latency is a time delay between the moment


something is initiated, and the moment one of its effects
begins or becomes detectable. The word derives from the
fact that during the period of latency the effects of an action
are latent, meaning "potential" or "not yet observed".

Bandwidth: a data transmission rate; the maximum amount


of information (bits/second) that can be transmitted along a
channel.

flops: number of floating point operations (addition


multiplication) per second.

18
Clock

Most modern CPUs are driven by a clock.

The CPU consists internally of logic and memory (flip flops).

When the clock signal arrives, the flip flops take their new
value and the logic then requires a period of time to decode
the new values.

Then the next clock pulse arrives and the flip flops again take
their new values, and so on.

19
Feedback

Positive and negative feedback about the course is highly


appreciated.

You can always send me an e-mail or come to my desk in


room 3.12 (CITG Building on Tuesdays and Thursdays only).

20
Book References
Computer Architecture , A Quantitative Approach
Fifth Edition, John L. Hennessy, David A. Patterson 2011 Morgan Kaufmann

The C Programming Language


(2nd edition), Kernighan & Ritchie, 1988 Prentice Hall

Modern Fortran Explained


(Numerical Mathematics and Scientific Computation)
Michael Metcalf, John Reid , Malcolm Cohen, 2011 Oxford University Press

Fortran 90 Programming
T.M.R. Ellis, Ivor R. Phillips, Thomas M. Lahey, 1994 Addison Wesley

Parallel Programming With MPI


Peter Pacheco, 1996 Morgan Kaufmann

21
References on internet

http://arstechnica.com/index.ars

http://www.tomshardware.co.uk/

http://insidehpc.com/

http://www.gpgpu.org/

http://www.theregister.co.uk/

wikipedia.org

22

You might also like