Title: Weakly Supervised Dynamic models for Facial Analysis in Videos

Abstract: Previous successful approaches for facial expression classification
in videos have primarily relied on global pooling based approaches. Such
methods often assume presence of a single uniform event spanning the sequence
and  discard local temporal information. Thus these methods might not be
optimal in the case of learning prediction models for (i) weakly supervised
setting, and (ii) unsegmented videos with multiple events. We have tried to
tackle these  challenges by explicitly modeling both weak labels and the
dynamic nature of facial expressions. I will first briefly discuss our work on
weakly supervised learning for Pain classification in videos [2, 3]. This
framework combines multiple segment representation in videos with Multiple
Instance Learning (MIL) framework to both classify and localize target
expression in a video. I will also discuss both qualitative and quantitative
results highlighting the advantages of this method. Despite the advantages of
MIL, it identifies only a single discriminative event and therefore also fails
to include any dynamical patterns in facial expressions. In our second work, we
propose a generalization of MIL referred to as Local Ordinal Models (LOMo) [1].
This approach is based on a novel latent structured SVM (LSVM) formulation that
jointly learns the sub-events and a prior on their ordering. LOMo differs from
previous LSVM models in using a 'loosely structured' formulation and we propose
an effective SGD based hinge loss minimization objective to solve it. We also
show that LOMo achieves consistent improvements over relevant competitive
baselines on four challenging facial analysis tasks. In combination with
complimentary features, our method reports state-of-the-art results on these
datasets. I will close the talk with a discussion about our current work on
extending LOMo to unconstained human action recognition in video.

[1] Sikka, K., Sharma, G., Bartlett, M. (2016). LOMo: Latent Ordinal Model for
Facial Analysis in Videos. Computer Vision and Pattern Recognition (CVPR).  

[2] Sikka, K., Dhall, A. and Bartlett, M. (2014). Weakly Supervised Pain
Localization and Classification with Multiple Segment Learning. The Best of
Face and Gesture 2013, Image and Vision Computing (IVC).

[3] Sikka, K., Dhall, A., and Bartlett, M. (2013). Weakly Supervised Pain
Localization using Multiple Instance Learning. IEEE International Conference on
Automatic Face and Gesture Recognition (AFGR).


Bio:
Karan Sikka completed his bachelor from Indian Institute of Technology Guwahati
in 2010 and is currently a Final year PhD student at University of California
San Diego working with Dr. Marian Bartlett. His research is focused on building
robust machine learning models for classifying facial expressions in videos. In
particular the focus is on using Weakly Supervised Learning approaches to
tackle underlying challenges in recognizing natural expressions. He work on
using Multiple Instance Learning for pain classification was awarded the Best
Student Paper Honorable Mention Award. His team also stood 2nd in the first
Facial Expression in the Wild Challenge in 2013 and was also awarded the Best
Paper Award. His long-term research plan is to use effective and richer machine
learning models for understanding human behavior and also reveal interesting
aspects using these models.