Home > Teaching > CS 698L: Programming for Performance

CS 698L: Programming for Performance

Units: 3-0-0-9
Course objective

To obtain good performance, one needs to write correct but scalable parallel programs using programming language abstractions like threads. In addition, the developer needs to be aware of and utilize many architecture-specific features in modern architectures like the memory hierarchy, vectorization, etc. In this course, we will discuss programming language abstractions and architecture-aware development to help write scalable parallel programs.

This course will primarily involve programming assignments to use the concepts learnt in class and appreciate the challenges in extracting performance.

Prerequisite
  1. Exposure to CS210 (Computer Organization), CS330 (Operating Systems), and CS422 (Computer Architecture) is desirable.
  2. Programming maturity (primarily C/C++/Java) is desirable.
Topics
  1. Understanding performance: performance models, Amdahl's law
  2. Architecture basics (pipelined execution, OOO)
  3. Cache coherence (optional)
  4. PAPI counters
  5. Memory hierarchy: caches, exploiting spatial and temporal locality
  6. Data locality and cache miss analysis
  7. Loop and data transformations
  8. Shared-memory programming and Pthreads
  9. OpenMP
  10. Shared-memory synchronization
  11. Vectors and vectorization
  12. GPGPU programming: GPU architecture, CUDA
  13. Optimistic parallelization
  14. Memory consistency models
  15. Distributed-memory machines and MPI
  16. Transactional memory
References
  1. M. Herlihy and N. Shavit. The Art of Multiprocessor Programming.
  2. Peter S Pacheco. An Introduction to Parallel Programming.
  3. J. Hennessy and D. Patterson. Computer Architecture: A Quantitative Approach.
  4. D. Culler, J, Singh with A Gupta. Parallel Computer Architecture: A Hardware/Software Approach.
  5. A. Grama, A Gupta, G Karypis, and V Kumar. Introduction to Parallel Computing.