CS698C: Topics in CSE: Sketching and Sampling for Big Data Analysis

Today, there is a tremendous interest in processing “Big Data” due to the ubiquity of large data sets being generated in production systems. This has led to an interest in designing algorithms and systems for problems with very large inputs. The problems range from matrix problems and numerical linear algebra, optimization problems and graph problems.

This course will highlight the recent advances in algorithms for numerical linear algebra and optimization that have come from the technique of linear sketching, whereby given a matrix, one first compresses it to a much smaller matrix by pre-multiplying it by a (usually) random matrix with certain properties. Much of the expensive computation can then be performed on the smaller matrix, thereby accelerating the solution for the original problem. This technique has led to the fastest known algorithms for fundamental problems in this area. We will consider least squares as well as l1-regression problems, low rank approximation, and many variants of these problems, such as those in distributed environments. We will also discuss connections of these methods with and using graph sparsifiers. Some of this work is partially covered in the monograph by Prof. David Woodruff: “Sketching as a Tool for Numerical Linear Algebra”,  Foundations and Trends in Theoretical Computer Science, vol 10, issue 1-2, pp. 1-157, 2014, publisher: NOW Publishing.

Prof. David Woodruff will be giving a GIAN course on this topic from Jan 2-6. Each day will have between 3-4 hours of lecture, except perhaps the final day. All students who wish to take the course will be enormously benefitted by attending Prof. Woodruff's lectures. It is mandatory for those registering in the course to also attend the GIAN lectures.