CS499 : B Tech Project
2009-10

Student Details: T Sudhamsh Goutham (Y6501)

Advisor: Prof. Amitabha Mukerjee

Title:
Grounded Label Learning in Telugu from Mutlimodal Iput

Abstract:
This work discusses a method of learning grounded labels in Telugu language using multimodal input. The focus is on learning labels for actions. Semantic information obtained from a video is used to learn schemas for binary actions(involving two objects) like chase, come closer and move away. A Computational Model of Dynamic Visual Attention is used to find the most salient object in a frame. The feature vectors are computed using relative position and relative velocity of salient objects for a set of frames. The frames are then clustered using an unsupervised clustering method called the Merge Neural gas Algorithm. The clusters are then identified as action schemas for different verbs. Audio commentaries collected from people, while showing them the video, are typed in a word separated form. By associating the words to the frames during which they were uttered, the labels for these clusters are learnt. The labels are grounded as we associate them to action schemas derived from a video.

Description:
The project consissts of five main stages:

1) Identify the objects in focus in each frame. Then we identify the range of a frame as +/- 10 frames and find the objects in focus in that range. This work is done manually by inspecting each frame. Then when two objects are in focus for a period of time, we compute the feature vectors of those frames for that pair of objects.

2) Clustering the feature vectors using a Merge-Neural Gas algorithm. The MNG code consists of two important programs. The mng.c creates a network from input data and the mngclassify.c classifies the input data based on a network. Then we manually combine the winner neurons into clusters.

3) Classify each frame as MA/CC/Ch by taking commentaries from three people. When there is ambiguity, re-look at the frames manually. By this we can identify each frame as one of MA/CC/Ch. Then use information to identify the clusters obtained in previous stage.

4) Collect audio commenataries. We collected 8 commentaries in Telugu and typed them in word separated form. Then some changes were made like converting nouns and pronouns to canonical forms, simplification of dialectical variants etc.

5) Then we compute the frequencies of words and find the conditional probability measure of a word being label for a cluster.

For a detailed description of the method and the results obtained, please go through the project report attached. For further information, please send a mail to me

.

Links:

  1. Final Project Report
  2. Sem 1 Project Report
  3. Proposal
  4. Data for Feature Vector Computation and Data of Clusters Obtained
  5. Java Code for Feature Vector Computation and Association Measure Computation
  6. C Code Merge-Neural Gas Algorithm
  7. Sample Audio Commentaries
  8. A Video depicting a commentary collection process