Credit: 3-0-0-0 (9)
(A) Objectives
Today applications such as machine learning and numerical linear algebra work with massive data. In this course we will cover algorithmic techniques, models, and lower bounds for handling such data. A common theme is the use of randomized methods, such as sketching and sampling, to provide dimensionality reduction. In the context of optimization problems, this leads to faster algorithms, and we will see examples of this in the form of least squares regression and low rank approximation of matrices and tensors, as well as robust variants of these problems. In the context of distributed algorithms, dimensionality reduction leads to communication-efficient protocols, while in the context of data stream algorithms, it leads to memory-efficient algorithms. We will study some of the above problems in such models, such as low rank approximation, but also consider a variety of classical streaming problems such as counting distinct elements, finding frequent items, and estimating norms. Finally we will study lower bound methods in these models showing that many of the algorithms we covered are optimal or near-optimal. Such methods are often based on communication complexity and information-theoretic arguments.
(B) Contents
| SNo | Broad Title | Topics | No. Of Lectures |
|---|---|---|---|
| 1 | Sketching and Dimensionality Reduction | Estimating |
10 |
| 2 | Lower Bounds | Lower bounds for |
6 |
| 3 | Approximate Linear Algebra via Dimensionality Reduction | Least Squares Regression, Subspace Embeddings, epsilon-Net Argument, Matrix Chernoff Bound, Affine Embeddings, Approximate Matrix Product, Low Rank Approximation, Sketching-based Preconditioning, Leverage Score Sampling, Distributed Low Rank Approximation, weighted low rank approximation, M-Estimators | 16-18 |
| 4 | Compressive Sensing | Compressive sensing, RIP, L1 minimization, Sparse recovery using sparse matrices, RIP1 | 4 |
| 5 | Sparse Fourier | Sparse Fourier Transform | 3 |
| 6 | Computational Geometry and graphs | Streaming Algorithms for Geometric problems, streaming algorithms for certain graph problems | 4 |
| Total | 43-45 |
(C) Pre-requisites, if any (examples: a- PSO201A,or b- PSO201A or equivalent):
Solid understanding of Probability (CS203M, MSO201 or equivalent) and Linear Algebra (MTH201A or equivalent).
(D) Short summary for including in the Courses of Study Booklet
This is a mathematically rigorous course on developing a variety of algorithms used for processing massive data, based on random sampling and sketching, dimensionality reduction, compressed sensing and sparse Fourier transform. Topics: Estimating F_0 and l_0-number of distinct items in a data stream, estimating l_2and the Johnson-Lindenstrauss’ Lemma, estimating l_p using p-stable distributions for p∈(0,2), estimating l_p for p>2 and closely related problems. Johnson-Lindenstrauss’ Lemma: Fast J-L, Sparse J-L, Applications to Numerical linear algebra: linear regression and subspace embeddings, affine embeddings, approximate matrix product, low rank approximation, leverage score sampling. Compressive Sensing RIP, l_1minimization, sparse recovery using sparse matrices, sparse Fourier transform, streaming algorithms for geometric problems and graph problems. Lower bounds for F_0, Johnson-Lindenstrauss’ Lower bound, Information theory, Indexing, lower bounds for estimating norms in the streaming model.
Sketching as a Tool for Numerical Linear Algebra by David Woodruff, arXiv:1411.4357.
Other useful resources: Materials from the following related courses overlap significantly and may be useful in various parts of the course: