Programming for Performance

CSE 698L, IIT Kanpur, Semester 2019-2020-I

Instructor Information

Name Swarnendu Biswas
Email swarnendu AT
Class hours WedFri 9:00-10:15 AM in KD 102
Office hours TueThur 4-5 PM in KD 302

TA Information

Name Amit Kumar
Name Mohit Malhotra
Name Sri DivyaYaddanapudi

Course Description

To obtain good performance, one needs to write correct but scalable programs using programming language abstractions like threads. In addition, the developer needs to be aware of and utilize many architecture-specific features in modern architectures like the memory hierarchy, vectorization, etc. In this course, we will discuss concepts that impact performance both on single- and multicore systems. In addition, we will discuss programming language abstractions like OpenMP.

This course will involve both pen-paper and programming assignments.

  • Exposure to the following courses (or equivalent) is desirable: CS210 (Computer Organization), CS330 (Operating Systems), and CS422 (Computer Architecture).
  • Good knowledge in C/C++ (primary) and Java is desirable.

Should I register? The intended audience is senior UG and PG students. I expect this course will be useful to students who are working or are interested in the general area of systems and large-scale software development. Remember that this is NOT a programming hacks course.

Feel free to discuss with me if you are unsure about the scope.

   Course Policies and Syllabus   |   Academic Integrity   |   Evaluation Scheme   |   Resources   |   References    

Course Policies and Syllabus



The following is a tentative list of topics that we will cover during the course. We might add new, drop existing, or reorder topics depending on progress and class feedback.

The course may also involve reading related research papers.


I am open to constructive feedback about the course content and presentation. I will distribute unofficial feedback forms a few weeks into the semester, and you will have the choice of remaining anonymous.

Academic Integrity

Evaluation Scheme





Supplementary reading

 31/07, 02/08, 07/08 Course Overview, Write Cache-Friendly Code Course Overview
Write Cache-Friendly Code
CSAPP 6.2-6.6
DRAG 11.2
Cache Miss Analysis Example
 09/08 Dependence Analysis Slides AK 2.2
 14/08, 16/08, 21/08 Loop Transformations Slides AK 5.2-5.4, 5.7.2, 5.9, 6.2.1, 6.2.2, 6.2.5, 6.3.1-6.3.4
AP 4.1, 4.2, 4.5, 5.1-5.6
 23/08 Parallel Architectures and Programming Models Slides LLNL Parallel Computing Tutorial
 28/08 POSIX Threads Slides PP Chapter 4
LLNL Pthreads Tutorial
OSTEP Thread API, Condition Variables
 30/08, 04/09, 06/09 Vectorization Slides
 11/09, 13/09, 25/09, 27/09 OpenMP Slides PP Chapter 5
OpenMP Application Programming Interface v4.5
 16/10 Parallel Programming Patterns Slides MRR Chapters 2.5, 3.2, 3.3, 3.5
 23/10, 25/10 Intel TBB Slides TBB Getting Started Tutorial
TBB Chapters 2, 3, 5, 6, 7, 9
  26/10, 30/10, 01/11, 06/11, 08/11 GPU Architecture and CUDA Programming Slides NVIDIA CUDA C Programming Guide
NVIDIA CUDA C Best Practices Guide
KH Chapters 3, 4, 5, and 6
 08/11, 13/11, 15/11 Concurrent Data Structures Slides MP Chapter 9


I have listed (NOT in any particular order) a few popular references. You are NOT REQUIRED to buy these books for this course, but they are a good read. In addition, we will read and discuss related materials and research papers which we will announce in class.