Home > Teaching > CS 772: Probabilistic Machine Learning

CS 772: Probabilistic Machine Learning


Familiarity with the basic concepts in linear algebra, statistics and probability. Having already done some coursework on these would be ideal but we will nevertheless cover some of the essentials concepts during the beginning of the course. Some prior familiarity with machine learning (e.g., via a course like CSE-771) will be helpful (but not required). Some assignments may have a programming component. The students should be comfortable with basic programming in MATLAB or Octave (other programming languages such as Python may be allowed with instructor’s permission)



Estimating the parameters of the underlying model that is assumed to have generated the data is central to any machine learning problem. Probabilistic modeling offers principled and rigorous ways to model data of diverse types, characteristics, and peculiarities, and offers algorithms to uncover the model parameters and make inferences/predictions about the data. This course will expose the students to the basic concepts/algorithms used in probabilistic modeling of data and we will gradually work our way up to use these as building blocks for solving more complex machine learning problems. We will also, at various points during this course, look at how the probabilistic modeling paradigm naturally connects to other dominant paradigm which is about treating machine learning problems as optimization problems, and understand the strengths/weaknesses of both these paradigms, and how they also complement each other in many ways. A rough outline of the course is given below.

  1. Introduction to probabilistic modeling of data
  2. Basic methods for parameter estimation in probabilistic models: MLE and MAP estimation
  3. Common probability distributions, conjugate priors and exponential family
  4. Introduction to Bayesian learning
  5. Case studies: Bayesian linear regression and classification, sparse linear models
  6. Latent variable models for clustering: mixture models
  7. Latent variable models for dimensionality reduction: factor analysis, probabilistic PCA and matrix factorization
  8. Latent variable models for modeling sequence and time-series data: hidden Markov models, linear dynamical systems
  9. Latent variable models for structured prediction
  10. Learning and inference in probabilistic graphical models
  11. Approximate Bayesian inference (MCMC, Variational Bayes, Expectation Propagation)
  12. Online approximate Bayesian inference
  13. Bayesian learning with kernels: Gaussian Processes (or “Bayesian SVM”)
  14. Topic models
  15. Deep learning
Books and References:

There will not be any dedicated textbook for this course. In lieu of that, we will have lecture slides, online notes and monographs, tutorials, and papers for the topics that will be covered in this course. Some recommended(although not required) books are:

  1. Christopher Bishop, Pattern Recognition and Machine Learning, Springer, 2007.
  2. Kevin Murphy, Machine Learning: A Probabilistic Perspective, MIT Press, 2012.
  3. David Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press, 2012.
  4. Simon J. D. Prince, Computer Vision: Models, Learning, and Inference, Cambridge Univ.Press, 2012