CS 698L: Programming for Performance
To obtain good performance, one needs to write correct but scalable parallel programs using programming language abstractions like threads. In addition, the developer needs to be aware of and utilize many architecture-specific features in modern architectures like the memory hierarchy, vectorization, etc. In this course, we will discuss programming language abstractions and architecture-aware development to help write scalable parallel programs.
This course will primarily involve programming assignments to use the concepts learnt in class and appreciate the challenges in extracting performance.
- Exposure to CS210 (Computer Organization), CS330 (Operating Systems), and CS422 (Computer Architecture) is desirable.
- Programming maturity (primarily C/C++/Java) is desirable.
- Understanding performance: performance models, Amdahl's law
- Architecture basics (pipelined execution, OOO)
- Cache coherence (optional)
- PAPI counters
- Memory hierarchy: caches, exploiting spatial and temporal locality
- Data locality and cache miss analysis
- Loop and data transformations
- Shared-memory programming and Pthreads
- Shared-memory synchronization
- Vectors and vectorization
- GPGPU programming: GPU architecture, CUDA
- Optimistic parallelization
- Memory consistency models
- Distributed-memory machines and MPI
- Transactional memory
- M. Herlihy and N. Shavit. The Art of Multiprocessor Programming.
- Peter S Pacheco. An Introduction to Parallel Programming.
- J. Hennessy and D. Patterson. Computer Architecture: A Quantitative Approach.
- D. Culler, J, Singh with A Gupta. Parallel Computer Architecture: A Hardware/Software Approach.
- A. Grama, A Gupta, G Karypis, and V Kumar. Introduction to Parallel Computing.