





# PDC HPC Summer Course DD3258/DD2358 2018 7.5 ETCS

Attendance, two weeks Lectures and labs: Get Lab attendance sheet signed !

**Project**, finished Fall '18:

Grade: Grad.: P,F Undergrad. : E... A

**Support:** Lab assistant, Project advisor, Examiner

### **Project:**

- For some application and HPC architecture of your choice:
- Develop efficient program for non-trivial problem
- Demonstrate and report how efficient it is.

Expected work on the project is **3 weeks** of work *incl.* report writing *Deadline for reports: Nov 9, 2018.* <sup>4</sup>

The project is *not* about:

- Substantial development of new code.
- Scientific results obtained with code
- So:

Prioritize measurements and analysis/interpretation! Demonstrate use of tools (profiling, ...), and simple performance model.

NO TIME for development of new significant code.

#### **Examples:**

\* Parallelize a code you know and/or work with; choose interesting part.

5

- \* Write a simple code for key algorithm of bigger solution process
- \* Write a simple code for a simple problem

## Now – during lab-afternoons • Discuss with instructors & course participants, form groups of size G. • Define project and choose tutor: Michael, Thor, Roman, Stefano, ... • Write very short synopsis, check with supervisor ! • Submit synopsis to *summer-info@pdc.kth.se* before end of the course Later -• Start the work *ASAP*: • Finish the work; Get in touch with tutor !! • Submit report to *tutor*. The report will be graded and sent back with comments; you may have to complete some parts and hand in again. We need email and paper mail address! • KTH students: LADOK • Other students: Certificate will be sent to you 6

- 1. Develop initial version of program;
- Develop approximate Performance model = theoretical prediction: time = f(problem size N, #processors P, problem partitioning parameters, ...) Try to assess the *communication* and *computation* times separately.
- 3. *Measure* performance, e.g. t = f(N, P, ...), for different problem sizes, if relevant x = wall clock time start to finish, (*not* CPUtime), ...

| Size∖#p | roc         | 1 | 2 | 4 | п |
|---------|-------------|---|---|---|---|
|         | $N_1$       | х | Х | х | Х |
|         | $N_2$       | х | х | х | Х |
|         | $N_{\rm M}$ | х | х | х | Х |
|         |             |   |   |   |   |

4. If suitable, plot "speedup" and/or "efficiency", MFLOPS?, ...

- Make several measurements to discover variations discuss sources of variability. (interactive nodes, dedicated,...)
  - Compare w. prediction; Interpret: Why these numbers?
- Identify "bottlenecks" by profiling tools; find remedy & make changes
- Check improvement by measurements
- Write report with description of problem, *algorithm*, and design decisions, pertinent graphs of measurements and profiling, "before and after".

| Single processor performance    | Multi-processor performance |
|---------------------------------|-----------------------------|
| Algorithm:                      | Algorithm: Communication !  |
| BLAS etc. library               | Latency vs. bandwidth       |
| Memory hierarchy                | # messages vs. size         |
| Disk - main - cache - register; | 2                           |
| Organization of loops           | Problem partitioning        |
| data layout (cache misses)      | Load balancing              |
| index strides (-"- )            | 8                           |
| "unrolling"                     |                             |
| Compiler directives ("-O2")     |                             |

8

















# PDC's Mission

#### Research



Conduct world-class research and education in parallel and distributed computing methodologies and tools as part of CSC's HPCViz department

PDC Center for ligh Performance Computing

#### **Infrastructure (PDC-HPC)**

Operation of a world-class ICT infrastructure for Swedish research, including HPC and data services, with associated user support and training

17

| PDC HPC Infrastructure                       |        |                                                               |             |        |  |  |  |  |
|----------------------------------------------|--------|---------------------------------------------------------------|-------------|--------|--|--|--|--|
|                                              | System | System/Processor                                              | TPP (TF)    | Cores  |  |  |  |  |
|                                              | Beskow | Cray XC40 Intel Haswell                                       | 2,430       | 67,456 |  |  |  |  |
| ×<br>KTH &                                   | Tegner | SuperMicro Intel Ivy<br>Bridge & Haswell<br>Nvidia K420 & K80 | 65<br>+ GPU |        |  |  |  |  |
| PDC Center for<br>High Performance Computing |        |                                                               |             |        |  |  |  |  |
|                                              |        |                                                               |             | 18     |  |  |  |  |











































