Probabilistic Machine Learning
CS772A/CS698X
Winter 2016

Instructor: Piyush Rai: (office: KD-319, email: piyush AT cse DOT iitk DOT ac DOT in)
Office Hours: Thur 10-11am (or by appointment)
Q/A and announcements: Piazza (please register)
Class Schedule: Mon/Wed 5:00-6:30pm
Location: KD-101
TAs: Milan Someswar (milansom AT cse), Priya Saraf (priyas AT cse), Vinit Tiwari (vinitt AT iitk)
TA Office Hours: Milan (Thur 3-4pm, RM 403D), Priya (Wed 4-5pm, RM 302), Vinit (Mon 3-4pm, RM 505)

Jump to: [Schedule] [Readings] [Links+Software]

Background and Course Description

This course will look at machine learning from the viewpoint of modeling data as coming from an underlying (unknown) probability distribution. The machine learning problems then boil down to inferring the model parameters and other latent variables that define the probability model and using these in making predictions/decisions from the data. The probabilistic view is particularly useful to (1) realistically model and capture the diverse data types, characteristics, and peculiarities of the data via appropriately chosen probability distributions, and (2) encode prior assumptions about the model via prior distributions over the parameters/latent variables (also see this recent Nature article which talks about these and many other benefits of the probabilistic viewpoint). This course will introduce the basic (and some advanced) topics in probabilistic machine learning, covering (1) common parameter estimation methods for probabilistic models; (2) formulating popular machine learning problems such as regression, classification, clustering, dimensionality reduction, matrix factorization, learning from sequential data (e.g., time-series), etc., via probabilistic models; (3) Bayesian modeling and approximate Bayesian inference; (4) Deep Learning; and (5) some assorted topics. We will also, at various points during this course, look at how the probabilistic modeling paradigm naturally connects to the other dominant paradigm which is about turning machine learning problems into optimization problems, and understand the strengths/weaknesses of both these paradigms, and how they also complement each other in many ways.

Syllabus

Refer to the tentative class schedule for the list of topics.

Books

This course will take a probabilistic view to machine learning and the following book may be used as a reference: Pattern Recognition and Machine Learning (PRML) by Chris Bishop. In addition, we will have slides/notes based on the lectures, and other material available online. For reference, some other recommended books are:

- Machine Learning: A Probabilistic Perspective (MLPP) by Kevin Murphy.
- Bayesian Reasoning and Machine Learning by David Barber. Also freely available online as PDF.
- Computer Vision: Models, Learning, and Inference by Simon J.D. Prince. Also freely available online as PDF.

If you don't have any prior exposure to machine learning, the following book is highly recommended (mostly non-probabilistic view): Course in Machine Learning (by Hal Daumé III)

Grading

There will be 3 homework assignments (total 30%), a mid-term (20%), a final-exam (20%), and a course project (30%)

Schedule (Tentative)

Date Topic Readings/References Deadlines Slides/Notes
Probabilistic Machine Learning
Dec 30 Introduction to machine learning and probabilistic modeling Review on prob/stats and linear algebra, [JM15], [Z15] slides (4-up print)
Jan 4 Probability refresher, properties of Gaussian distribution PRML: Chap. 1 section 1.2 (upto 1.2.2), Chap. 2 up to section 2.3.3, Appendix B, Review on prob/stats and linear algebra slides (4-up print)
Jan 11 Basics of parameter estimation in probabilistic models Parameter estimation for text analysis (only up to section 3), [PP08] (Matrix Cookbook) slides (4-up print)
Jan 13 Regression: Probabilistic Linear Regression MLPP (Murphy): Section 7.1-7.3, 7.6 (7.6.1, 7.6.2) slides (4-up print)
Jan 18 Classification: Probabilistic Linear Classification (Logistic Regression) MLPP (Murphy): Section 8.1-8.3.4, 8.3.6 slides (4-up print)
Jan 20 Exponential Family and Generalized Linear Models [J03] slides (4-up print)
Jan 25 Clustering and Density Estimation: K-means and Gaussian Mixture Models PRML: Chapter 9 (up to Section 9.3.2) Project proposals due slides (4-up print)
Jan 27 Expectation Maximization PRML: Chapter 9 (Section 9.3 and 9.4; may skip 9.3.3 and 9.3.4), Optional reading: [NH99] slides (4-up print)
Feb 1 Expectation Maximization (Contd.) PRML: Chapter 12 (Section 12.1 and 12.2) slides (4-up print)
Feb 3 Probabilistic PCA and Factor Analysis, Mixtures of PPCA/Mixtures of FA PRML: Chapter 12 (Section 12.1 and 12.2), Optional readings: [TB99], [GH97], [CG15], [B09], [IR10] slides (4-up print)
Feb 8 Probabilistic Matrix Factorization [SM07], [K09] slides (4-up print)
Feb 10 Gaussian Processes for Nonlinear Regression and Nonlinear Dimensionality Reduction MLPP (Murphy): Section 15.1-15.2, 15.5 slides (4-up print)
Approximate Bayesian Inference
Feb 22 Sampling based Inference: Monte Carlo, Rejection Sampling, Importance Sampling PRML Chapter 11 (up to Section 11.1), Optional reading: [ADDJ03] slides (4-up print)
Feb 24 Sampling based Inference: Markov Chain Monte Carlo, Gibbs Sampling PRML Chapter 11 (Section 11.2 and 11.3) Optional reading: [ADDJ03] slides (4-up print)
Feb 29 Sampling based Inference: Some Examples - GMM, Matrix Factorization, and LDA (Topic Models) MLPP (Murphy): Section 24.2.3 and 24.2.3.1, [SM08], [GS04] slides (4-up print)
Mar 2 Variational Bayesian (VB) Inference: Introduction and Mean-Field Approximations PRML: Chapter 11 (up to section 10.1), Also recommended: [BKM16] slides (4-up print)
Mar 7 Properties of VB, More Examples, and Expectation Propagation PRML: Chapter 11 (section 10.2-10.4, 10.6, 10.7), Also recommended: [BKM16] slides (4-up print)
Assorted Topics in PML
Mar 9 Sparse Linear Models MLPP (Murphy): Section 13.1-13.2, 13.3 (only up to 13.3.1), 13.4.4, Optional reading: [T01] slides (4-up print)
Mar 13 State Space Models and Linear Dynamical Systems PRML: Chapter 13 slides (4-up print)
Mar 14 Structured Prediction: Conditional Random Fields MLPP (Murphy): Section 19.6 Mid-sem project report due slides (4-up print)
Mar 16 Latent Dirichlet Allocation and Topic Models Recommended: The LDA paper slides (4-up print)
Mar 30 Deep Probabilistic Models (1) Optional reading: Representation Learning: A Review and New Perspectives slides (4-up print)
Apr 4 Deep Probabilistic Models (2) Optional reading: Representation Learning: A Review and New Perspectives slides (4-up print)
Apr 6 Nonparametric Bayesian Models for Latent Class and Latent Feature Learning Recommended: Indian Buffet Process: An Introduction and Review, Optional: Dirichlet Process slides (4-up print)
Apr 11 Inference and Optimization via Message Passing Recommended: A tutorial paper, also see Factor Graphs and the Sum-Product Algorithm slides (4-up print)
Apr 13 Overview of other recent advances, Course Summary and Perspectives slides (4-up print)

Suggested/Further Readings

- [PP08] The Matrix Cookbook (a very handy reference for matrix algebra and calculus)
- [JM15] Machine learning: Trends, perspectives, and prospects: Michael Jordan and Tom Mitchell (Science article)
- [Z15] Probabilistic machine learning and artificial intelligence: Zoubin Ghahramani (Nature article)
- [GR13] A nice roadmap to *learning* about Bayesian Learning
- [J03] The Exponential Family and Generalized Linear Models: Michael I.Jordan (Chapter from unpublished book)
- [NH99] A View of the EM Algorithm that Justifies Incremental, Sparse, and Other Variants: Radford Neal and Geoff Hinton
- [CG15] Linear Dimensionality Reduction: Survey, Insights, and Generalizations: John Cunningham and Zoubin Ghahramani
- [B09] Dimension Reduction: A Guided Tour: Christopher J. C. Burges
- [IR10] Practical Approaches to Principal Component Analysis in the Presence of Missing Values: Alexander Ilin and Tapani Raiko
- [TB99] Mixtures of Probabilistic Principal Component Analysers: Michael E. Tipping and Christopher M. Bishop
- [GH97] The EM Algorithm for Mixtures of Factor Analyzers: Zoubin Ghahramani and Geoff Hinton
- [GJ95] Supervised learning from incomplete data via an EM approach: Zoubin Ghahramani and Michael I. Jordan
- [SM07] Probabilistic Matrix Factorization: Ruslan Salakhutdinov and Andriy Mnih
- [K09] Matrix Factorization Techniques for Recommender Systems: Yehuda Koren, Robert Bell and Chris Volinsky
- [ADDJ03] An Introduction to MCMC for Machine Learning: Christophe Andrieu, Nando De Freitas, Arnaud Doucet, and Michael I. Jordan
- [SM08] Bayesian Probabilistic Matrix Factorization using Markov Chain Monte Carlo : Ruslan Salakhutdinov and Andriy Mnih
- [GS04] Finding Scientific Topics : Thomas L. Griffiths and Mark Steyvers
- [BKM16] Variational Inference: A Review for Statisticians: David Blei, Alp Kucukelbir, and Jon McAuliffe
- [AJA16] Patterns of Scalable Bayesian Inference: Elaine Angelino, Matthew James Johnson, and Ryan P. Adams
- [T01] Sparse Bayesian Learning and the Relevance Vector Machine: Michael E. Tipping

Useful Links and Softwares

- Scikit-Learn: Machine Learning in Python
- Weka: Machine Learning and Data Mining in Java
- Stan (Probabilistic Programming)

Course Policies

Anti-cheating policy