Seminar 10 : Applications of Diverse Classifier Ensembles
Speaker : Krithika Venkataramani
Date : 06:00 pm, Monday, April 19th, 2010
Venue : CS103

Abstract: Diversity in classifier ensembles improves accuracy in classification. Depending on the diversity in the classifier outputs, a certain method of combination of the classifier ensembles would be optimal. Alternately, the generation of classifier ensembles that have a desired diversity can be done, depending on the data distribution. Different applications of diverse classifier ensembles are shown, including face and fingerprint verification and digital watermark detection. The types of classifiers that would help in producing different kinds of diversity are discussed. For the different data distributions, different ensemble generation strategies are tried for the different applications. The possible different types of diversity in classifier ensembles are discussed for the given data distribution.

References :

Seminar 9 : Classifier ensemble design for improved classification
Speaker : Krithika Venkataramani
Date : 06:00 pm, Wednesday, April 14th, 2010
Venue : CS103

Abstract: Classifier ensembles are useful when the data distributions are complex and single classifiers do not provide sufficient accuracy in classification. This talk explores the factors that affect the classifier ensemble accuracy, and discusses the aspects of designing optimal classifier ensembles. Classifier ensembles that provide diverse information are better in classification. The ensemble combination strategy affects the accuracy and the diversity in the ensemble determines the best combination method. The underlying data distribution plays a major role in overall classification accuracy. It is assumed that each classifier in the ensemble is the same type of classifier, but with a different parameter or training set. The type of classifier is termed as the base classifier. Effective classifier ensembles take into account the underlying data distribution in the generation process along with the choice of the base classifier and combination strategy. Applications are shown on different data.

References :

K. Venkataramani and B.V.K. Vijaya Kumar, Role of statistical dependence between classifier scores in determining the best decision fusion rule for improved biometric verification, Int. Workshop on Multimedia Content Representation, Classification and Security (MRCS), Sep. 2006.
K. Venkataramani and B.V.K. Vijaya Kumar, OR rule fusion of conditionally dependent correlation filter based classifiers for improved biometric verification, Proc. SPIE, Vol. 6245, Orlando, FL, April 2006.
K. Venkataramani and B.V.K. Vijaya Kumar, Conditionally-dependent classifier fusion using AND rule for improved biometric verification, Int.Conf. on Advances in Pattern Recognition (ICAPR) 2005, LNCS 3687, August 2005, pp. 277-286.

Seminar 8 : On learning the names of colours
Speaker : Harish Karnick
Date : 06:30 pm, Monday, April 5th, 2010
Venue : CS102

Abstract: We make a case for treating language as a complex adaptive system. The hypothesis we argue for is: simple learning biases or general rules in the human cognitive architecture when iteratively applied in a population with communicative intent can give rise to stable patterns (properties) in the system used for communication - namely language. This process has been called cultural adaptation.

We will support the hypothesis through multiple case studies. Today we consider colour naming. Data for colour terms have been collected for many languages of the world (WCS and Berlin & Kay). A careful statistical analysis best supports the existence of universal colour naming terms that cluster strongly in the colour space across languages. In parallel, computational models for colour naming that use only some known properties of the human colour perception system reproduce almost exactly the results of the statistical analysis. This remarkable agreement provides interesting clues to how stable patterns seem to emerge from simple biases.

References :

Seminar 7 : Visual Perception Using Deep Learning and Energy-Based Models
Speaker : Nitish Srivastava
Date : 06:00 pm, Friday, March 26th, 2010
Venue : CS102

Abstract: Traditional architectures for object recognition tasks involve preprocessing raw input through a hand-crafted feature extractor (SIFT, SURF etc). This transformed input is then fed to a generic trainable classifier. This constitutes a "shallow" architecture. However, there is considerable theoretical and empirical evidence that complex tasks, such as invariant object recognition in vision, require "deep" architectures, composed of multiple layers of trainable non-linear modules. The Deep Learning Problem is related to the difficulty of training such deep architectures.

Each layer of the deep architecture is composed of an encoder which computes a feature vector from the input, and a decoder which reconstructs the input from the features. A large number of such layers can be stacked and trained sequentially, thereby learning a deep hierarchy of features with increasing levels of abstraction. The training of each layer can be seen as shaping an energy landscape with low valleys around the training samples and high plateaus everywhere else. Forming these high plateaus constitute the so-called Partition Function problem.

A particular class of methods for deep energy-based unsupervised learning will be described that solves the Partition Function problem by imposing sparsity constraints on the features. The method can learn multiple levels of sparse and overcomplete representations of data. When applied to natural image patches, the method produces hierarchies of filters similar to those found in the mammalian visual cortex.

References :

Talk Slides (from Yann LeCun's talk at MSR Winter School on Machine Learning and Computer Vision), available here.
Additional list of references on Deep Learning, available here.
Yoshua Bengio and Yann LeCun, Scaling learning algorithms towards AI, in Bottou, L. and Chapelle, O. and DeCoste, D. and Weston, J. (Eds), Large-Scale Kernel Machines, MIT Press, 2007.
Yann LeCun, Sumit Chopra, Raia Hadsell, Marc'Aurelio Ranzato and Fu-Jie Huang, A Tutorial on Energy-Based Learning, in Bakir, G. and Hofman, T. and Schölkopf, B. and Smola, A. and Taskar, B. (Eds), Predicting Structured Data, MIT Press, 2006.

Seminar 6 : Recent advances in applying machine learning to computer vision
Speaker : Ashish Agrawal
Date : 06:00 pm, Saturday, March 20th, 2010
Venue : CS102

Abstract: The field of computer vision has witnessed significant growth in the recent past with strong industrial and business applications of research. It has benefited significantly from a careful and selective implementation of many Machine learning techniques. This talk would be focused around the advances and discussions held during Microsoft Research's Winter Camp (Website here).

I would introduce the major problems researchers are trying to solve in the field of computer vision, those of segmentation, tracking, identification etc. I would then talk briefly about the techniques of Support Vector machines, kernel learning and Graphical models and how they are used in solving some of these problems.

I would discuss about the two major camps of thought in the approaches to solve vision problems and whether highly engineered features help in classification problems. If time permits, I would talk about two interesting research problems which personally excite me, one of clustering massive image sets and the second of understanding the Biological processes of vision.

References :

Talk Slides, available here.

Seminar 5 : The Netflix challenge - Part I
Speaker : Harish Karnick
Date : 05:00 pm, Saturday, February 27th, 2010
Venue : CS102

Abstract: Netflix is an online movie (CD/DVD) rental and video streaming services company. It encourages all its users to rate the movies that they see on 1 to 5 star scale rating the movie. It uses a proprietary recommendation system, Cinematch, which uses the rating scores of users along with similarity calculations for movies to recommend movies to clients. The Netflix challenge was a challenge problem issued in Oct. 2006 to improve by 10% the RMSE (root mean square error) of Cinematch. Many teams/individuals participated in it and finally in July 2009 two teams submitted solutions (20 minutes apart) that achieved the required 10% improvement. The winner after validation was announced in Sept. 2009 and the prize of $1 million was given in Dec. 2009. In a series of talks we will trace the technical history of how the problem was finally solved. The entire effort has lead to many technical innovations and several papers have been published by the two teams who finally crossed the 10% improvement barrier. The problem has given a fillip to the area of recommendation systems and more specifically to collaborative filtering techniques which are gaining increasing traction due to the large amounts of individual preference data that is becoming available at social networking sites.
In the first talk we will describe the problem the data set and the initial attempts at solution.

References :

Seminar 4 : Manifold Learning and Random Projections
Speaker : Aman Dhesi
Date : 06:00 pm, Saturday, January 30th, 2010
Venue : CS102

Abstract: This talk will serve as an introduction to the field of Manifold Learning. Often in practice it is found that machine learning algorithms are faced with data that is superficially high-dimensional, but is actually confined to some low-dimensional object - a manifold - embedded in the high-dimensional space. In manifold learning, one is interested in two things -

Exploiting this intrinsic low-dimensionality in data to speed up tasks like clustering and classification which suffer from a curse of dimensionality.
Learning an explicit low dimensional representation of the data for visualization or as a preprocessing step for other algorithms.

We will first talk about the ISOMAP algorithm - a very popular heuristic manifold learning technique that finds a low dimensional embedding given a set of high-dimensional points. Despite its empirical success, there is very little theoretical understanding of Isomap and indeed about manifold learning algorithms in general. We will then talk about some open problems which have hindered the development of a better theoretical understanding of manifold learning. Next, we will talk about the Random Projection method, and extend the classical Johnson-Lindenstrauss Lemma to manifolds. This is one approach to understanding the "embeddability" of a manifold, though it has its own shortcomings.

Time permitting, we will also talk about a recently proposed data structure - the Random-Projection tree, a simple variant of the kd-tree that adapts to intrinsic dimensionality in data.

References :

Talk Slides, available here.
Richard G. Baraniuk and Michael B. Wakin, Random projections of smooth manifolds, FoCM, 2009.
Chinmay Hegde, Michael B. Wakin, and Richard G. Baraniuk, Random projections for manifold learning, NIPS, 2007.
Sanjoy Dasgupta and Yoav Freund, Random projection trees and low dimensional manifolds, STOC, 2008.
Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, and Nakul Verma, Learning the structure of manifolds using random projections, NIPS, 2007.
J. B. Tenenbaum, V. de Silva, and J. C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science, Dec, 2009.
Sanjoy Dasgupta and Anupam Gupta, An elementary proof of a theorem of Johnson and Lindenstrauss, Random Structures and Algorithms, 22(1), pages 60-65, 2003.
Kenneth L. Clarkson, Tighter bounds for random projections of manifolds, SoCG, 2008.
Lee-Ad Gottleib, Robert Krauthgamer, A Nonlinear approach to dimensionality reduction, ArXiv Preprint, available here, 2009.

Seminar 3 : Learning Convex Bodies is Hard
Speaker : Purushottam Kar
Date : 06:00 pm, Saturday, January 30th, 2010
Venue : CS102

Abstract: We will look at a specific learning problem - namely that of learning convex sets in Euclidean spaces. The problem and its variants have generated a lot of research in the past decades and we will briefly state these results before moving on to describe a recent result on learning convex bodies. The result is cute and uses ideas from the theory of error correcting codes. The discussion will follow [2] fairly closely.

References :

Talk Slides, available here.
Navin Goyal and Luis Rademacher, Learning Convex Bodies is Hard, COLT, 2009.

Seminar 2 : An Introduction to Computational Learning Theory
Speaker : Purushottam Kar
Date : 06:00 pm, Saturday, January 23rd, 2010
Venue : CS102

Abstract: This talk will begin our foray into Computational Learning Theory. We will begin with a gentle introduction to Computational Learning Theory - more specifically the PAC learning model of Valiant. The discussion will move on to describe some classical results in this model - both positive as well as negative. In particular we shall explore the characterization of the sample complexity of a proper learning problem in terms of the Vapnik-Chervonenkis dimension of the target concept class.

References :

Talk Slides, available here.
Michael J. Kearns and Umesh V. Vazirani, An Introduction to Computational Learning Theory, The MIT Press, 1994.

Seminar 1 : When does the Metropolis algorithm succeed ?
Speaker : Swagato Sanyal
Date : 10:30 am, Saturday, January 16th, 2010
Venue : CS102

Abstract: The talk will center around the Metropolis algorithm(MA) which is an easy-to-implement randomized search heuristic. It searches in a space having a huge number of points with some cost defined on each of them, looking for a point corresponding to minimum cost. There is also a symmetric relation called the neighborhood, defined on the space. At each step, the algorithm selects a neighbor following some probability distribution defined on it's neighbors. The distribution is such that a point which is 'better' that the current one is more likely to be the new current point than a point which is worse. The probability of accepting worsening depends on a parameter called temperature whose choice turns out to be extremely crucial. In this work our aim is to, given a combinatorial optimization problem, characterize classes of inputs for which the Metropolis Algorithm works efficiently.

There are two formalizations of efficient behavior in the literature which demonstrate as equivalent. We get, as a result, two characterizations of 'success', both in terms of the Markov chains underlying a run of MA. One characterization relates success to the so-called rapid mixing of the underlying Markov chain. We also work out a few known results using our characterizations to illustrate their use.

References :

M. Jerrum and G. B. Sorkin, Simulated annealing for graph bisection, FOCS, pages 94-103, 1993.
N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller, Equations of state calculations by fast computing machines, Journal of Chemical Physics, 21: pages 1087-1092, 1953.
G. Sasaki, The effect of the density of states on the Metropolis algorithm, Information Processing Letters, 37(5): pages 158-163, 1991.
G. H. Sasaki and B. Hajek, The time complexity of maximum matching by simulated annealing, Journal of the ACM (JACM), 35(2): pages 387-403, 1988.
A. Sinclair, Algorithms for Random Generation and Counting: A Markov Chain Approach, Birkhauser Verlag, 1993.
I. Wegener, Metropolis versus simulated annealing and the black-box-complexity of optimization problems, COMPSTAT, pages 517-527. Physica-Verlag HD, 2008.

Back to Talks