Programming for Performance
CSE 698L, IIT Kanpur, Semester 2019-2020-I
||swarnendu AT cse.iitk.ac.in
||WedFri 9:00-10:15 AM in KD 102
||TueThur 4-5 PM in KD 302
To obtain good performance, one needs to write correct but scalable programs using
programming language abstractions like threads. In addition, the developer needs to be aware
and utilize many architecture-specific features in modern architectures like the memory
hierarchy, vectorization, etc. In this course, we will discuss concepts that impact
performance both on single- and multicore systems. In addition, we will discuss programming
language abstractions like OpenMP.
This course will involve both pen-paper and programming assignments.
Should I register? The intended audience is senior UG and PG students. I expect this
course will be useful
to students who are working or are interested in the general area of systems and
development. Remember that this is NOT a programming hacks course.
Feel free to discuss with me if you are unsure about the scope.
- Exposure to the following courses (or equivalent) is desirable: CS210 (Computer
Organization), CS330 (Operating Systems), and CS422 (Computer Architecture).
Good knowledge in C/C++ (primary) and Java is desirable.
Course Policies and Syllabus
- Remember that clarifying every question and doubt is important. There is nothing like a
trivial or a
- Please be on time to class, it is distracting for everyone when impunctual students keep
in late to class.
- Please try to avoid using laptops and/or mobile devices in class, they are distracting for
instructor and other students who want to concentrate.
- Submitting your assignments late will mean losing points automatically. You will lose 10%
for each day that
you miss, for up to three days. You might be excused only under extreme circumstances, but
let me now at the
The following is a tentative list of topics that we will cover during the
course. We might add new, drop existing, or reorder topics depending on progress and class
- Architecture Basics
- Pipelining, OOO, superscalar, VLIW
- Caches and cache coherence
- Parallel Computing Platforms
- Single Processor Performance Issues
- Cache and data locality analysis
- Data dependences and fine-grained parallelism
- Loop and data transformations
- PAPI counters
- Parallel Programming Models
- Shared-memory parallel programming (Pthreads, OpenMP, TBB)
- GPGPU architecture and CUDA programming
The course may also involve reading related research papers.
I am open to constructive feedback about the course content and presentation.
I will distribute unofficial feedback forms a few weeks into the semester, and you will have
the choice of remaining anonymous.
You may discuss concepts with classmates, but all assignments must be your own or your
team's work when teamwork is permitted.
You may not search online for existing solutions related to the assignments, even as a
Students caught cheating or plagiarizing will automatically fail the course and will be
reported to the
- Class/piazza participation - 5%
- Assignments - 40%
- Mid semester exam - 20%
- End semester exam - 35%
We will use Piazza as the discussion forum. Click here to register
the course on Piazza.
We will use Canvas for course submissions.
| 31/07, 02/08, 07/08
||Course Overview, Write
|| Course Overview
Cache Miss Analysis
| 14/08, 16/08, 21/08
AK 5.2-5.4, 5.7.2, 5.9, 6.2.1, 6.2.2, 6.2.5, 6.3.1-6.3.4
AP 4.1, 4.2, 4.5, 5.1-5.6
||Parallel Architectures and
LLNL Parallel Computing Tutorial
PP Chapter 4
LLNL Pthreads Tutorial
OSTEP Thread API,
| 30/08, 04/09, 06/09
| 11/09, 13/09, 25/09, 27/09
PP Chapter 5
OpenMP Application Programming Interface v4.5
||Parallel Programming Patterns
MRR Chapters 2.5, 3.2, 3.3, 3.5
| 23/10, 25/10
TBB Getting Started Tutorial
TBB Chapters 2, 3, 5, 6, 7, 9
| 26/10, 30/10, 01/11, 06/11,
||GPU Architecture and CUDA
NVIDIA CUDA C Programming Guide
NVIDIA CUDA C Best Practices Guide
KH Chapters 3, 4, 5, and 6
| 08/11, 13/11, 15/11
||Concurrent Data Structures
MP Chapter 9
I have listed (NOT in any particular order) a few popular references. You are
NOT REQUIRED to buy these books for this course, but they are a good read.
In addition, we will read and discuss related materials and research papers which we will
announce in class.
- [CSAPP] Computer Systems: A Programmer's Perspective - R. Bryant and D. O'Hallaron
- [DRAG] Compilers: Principles, Techniques, and Tools - A. Aho, M. Lam, R. Sethi, and
- [AK] Optimizing Compilers for Modern Architectures - R. Allen and K. Kennedy
- [PP] An Introduction to Parallel Programming - Peter S. Pacheco
- [AP] Automatic Parallelization: An Overview of Fundamental Compiler Techniques -
Samuel P. Midkiff
- [OSTEP] Operating Systems: Three Easy Pieces - R. Arpaci-Dusseau and A.
- [MRR] Structured Parallel Programming: Patterns for Efficient Computation - M.
McCool, A. D. Robinson, and J. Reindeers
- [TBB] Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor
Parallelism - J. Reindeers
- [KH] Programming Massively Parallel Processors: A Hands-on Approach - David B. Kirk
and Wen-mei W. Hwu
- [MP] The Art of Multiprocessor Programming - Maurice Herlihy and Nir Shavit