Programming for Performance

CS 610, Semester 2022-2023-I, IIT Kanpur

Class hours: MonThurs 9:00-10:15 AM in KD 101

Office hours: MonThurs 10:30-11:30 AM in KD 302


Instructor Information

Name Swarnendu Biswas
Email swarnendu AT cse.iitk.ac.in

TA Information

Name Email (AT cse.iitk.ac.in)
Abhinav Kuruma abhinav
Abhishek Revskar abhishekdr
Akash Panzade akashp
Arun KP kparun
Ashutosh Patel ashutoshp
Ayush Singh ayushs
Suvam Basak suvambasak

Course Description

To obtain good performance, one needs to write correct but scalable parallel programs using programming language abstractions like threads. In addition, the developer needs to be aware of and utilize many architecture-specific features like vectorization to extract the full performance potential. This course will discuss programming language abstractions with architecture-aware development to learn to write scalable parallel programs. This is not a “programming tips and tricks” course.

We will have five or more assignments to use the concepts learned in class and appreciate the challenges in extracting performance.

Prerequisites
  • Exposure to the following courses (or equivalent) is desirable: CS220 (Computer Organization), CS330 (Operating Systems), and CS422 (Computer Architecture).
  • Programming maturity with popular programming languages like C, C++, and Java.

   Course Syllabus and Policies   |   Academic Integrity   |   Evaluation Scheme   |   Resources   |   References   


Course Syllabus and Policies

Syllabus

The course will primarily focus on the following topics.

We may add new, drop existing, or reorder topics depending on progress and class feedback. The course may also involve reading and critiquing related research papers.

Policies

Feedback

I am open to constructive feedback about the course content and presentation. Feel free to provide suggestions for improvements.


Academic Integrity


Evaluation Scheme

Class participation/quizzes 5%
Assignments 40%
Midsem 25%
Endsem 30%

Resources

Date Topic Resources Recommended Reading
01/08 Course Overview First Course Handout
Course Overview Slides
01/08 Compiler Challenges for Parallel Architectures Slides AK Chap 1
04/08, 08/08 Write Cache-Friendly Code Slides Cache Miss Analysis Example
CSAPP Chap 6
HP APP B
DRAG 11.1, 11.2
11/08, 18/08 Dependence Analysis Slides DRAG 11.3, 11.4, 11.6
AK Chap 2
  POSIX Threads Slides PP Chapter 4 (IITK has subscribed to the ebook)
OSTEP Thread API, Condition Variables
LLNL Pthreads Tutorial
22/08, 25/08, 29/08 Loop Transformations Slides AK 5.2-5.4, 5.7.2, 5.9, 6.2.1, 6.2.2, 6.2.5, 6.3.1-6.3.4
AP 4.1, 4.2, 4.5, 5.1-5.6
HP 4.5
01/09, 05/09, 08/09 Vectorization Slides Guide for Intel Compilers
Cornell Virtual Workshop on Vectorization
08/09, 12/09, 15/09, 13/10 OpenMP Slides PP Chapter 5 (IITK has subscribed to the ebook)
OpenMP Application Programming Interface v5
LLNL OpenMP Tutorial
16/10 Memory Models and OpenMP Slides MCM Chapters 1-5 (IITK has subscribed to the ebook)
17/10, 20/10, 22/10 Intel TBB Slides TBB Chapters 2, 3, 5, 6, 7, 9
TBB Tutorial (legacy)
oneTBB Documentation
27/10, 29/10, 31/10, 03/11,
05/11, 07/11
GPU Architecture and CUDA Programming Slides KH Chapters 1-5 (IITK has subscribed to the ebook)
NVIDIA CUDA C Programming Guide
NVIDIA CUDA C Best Practices Guide
10/11, 14/11 False Sharing Slides MCM Chapters 2,6,8 (IITK has subscribed to the ebook)


References

I have listed (NOT in any particular order) a few popular references. We may read and discuss related materials and research papers which we will announce in class.