This course will take the Bayesian statistical modeling approach to machine learning. Some of the key benefits of the Bayesian approach include the ability to quantify the uncertainty in the parameters/predictions through posterior probability distributions, the ability to incorporate prior knowledge in a principled way, the ability to learn the model hyperparameters and the right model size/complexity automatically from data, and the property of embodying online learning in a natural way. In this course, we will discuss the foundations of Bayesian modeling, especially in the context of machine learning and, through various case-studies/running-examples, we will look at how to set up a machine learning problem as a Bayesian model and how to design sampling/optimization techniques to perform computationally scalable inference in these models.

Instructor's consent. However, note that this course will make extensive use of concepts from probability, statistics, and optimization. Therefore a solid background on these topics, as well as introductory machine learning, will be essential. Students are also supposed to be familiar with programming in MATLAB/Python.

(Tentative break-up) There will be 5 homework assignments (total 30%) which may include a programming component, a mid-term (20%), a final-exam (25%), and a course project (25%)

A tentative set of topics to be covered in this course includes:

- Ba(ye)sics: Parameter estimation in probabilistic and Bayesian models, common probability distributions and their properties, conjugate priors and closed-form Bayesian posterior updates
- Bayesian linear regression and classification, hyperparameter estimation
- Exponential family and its role in probabilistic inference
- Approximate Bayesian inference
- Sampling based methods: MCMC methods
- Optimization based methods: Variational Bayes (VB), Expectation Propagation (EP)
- "Likelihood Free" methods

- Online Bayesian inference for large-scale learning (Stochastic MCMC, Stochastic VB, Stochastic EP)
- Bayesian kernel methods and Gaussian Processes
- Bayesian model comparison
- Nonparametric Bayesian modeling
- Bayesian Deep Learning
- Bayesian Optimization
- Bayesian Reinforcement Learning
- Probabilistic Programming

Treatment of the above topics will be via several case-studies/running-examples, which include generalized linear models, finite/infinite mixture models, finite/infinite latent factor models, matrix factorization of real/discrete/count data, sparse linear models, linear Gaussian models, linear dynamical systems and time-series models, topic models for text data, etc.

We will primarily use lecture notes/slides from this class. In addition, we will refer to monographs and research papers (from top Machine Learning conferences and journals) for some of the topics. Some recommended, although not required, books are:

- Christopher Bishop, Pattern Recognition and Machine Learning, Springer, 2007.
- Kevin Murphy, Machine Learning: A Probabilistic Perspective, MIT Press, 2012
- Carl Rasmussen and Chris Williams. Gaussian Processes for Machine Learning. The MIT Press, 2006.
- David Mackay. Information Theory, Inference, and Learning Algorithms. Cambridge Univ. Press, 2003.
- David Barber. Bayesian Reasoning and Machine Learning Cambridge Univ. Press, 2012.
- Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, Donald B. Rubin. Bayesian Data Analysis, Chapman \& Hall/CRC, 2013.