amitabha mukerjee

 

professor
department of computer science and engineering
indian institute of technology, kanpur
kanpur- 208 016, india.
amit [at] iitk.ac.in
 


Research: Grounded Cognitive Grammar Acquisition

I work at the intersection of Computer Vision and Natural Language. I am particularly interested in discovering structures of the world via perception, and mapping these to language.

the symbol for "in". At its semantic pole is the image schema with two arguments, and a distribution that must be matched by the visual angle. it has learned the association of this schema with the linguistic unit "in" from co-occurring language.

For example, one of the perceptual signatures available to an infant is the visual angle subtended by an object on the retina. if one clusters the visual angles for various landmarks, we find that a stable pattern arises when it becomes 360 degrees - when we are fully inside a space like a room, the angle doesn't change with our local motions. We show that computationally such a pattern is discovered naturally as a stable cluster. We argue that such a cluster may provide an initial characterization of containment.

However, mapping such a perceptual structure to language is problematic. Different languages carve up the space of spatial relations in different ways. Also, with increasing exposure to language, the mental model of a lexical item (the image schema) itself changes. One of the objectives of this work is to trace this ontogenetic evolution in the image schema of containment relations.

In this work, we demonstrate the process via a computational simulation based on a simple video. We consider how an early learner may acquire the perceptual notion of full containment and how she may learn to map it for different languages.

Initial image schema: Let's say the agent is considering two-object interactions, - a trajector (tr) and landmark (lm). By observing the distribution of the visual angle presented by the lm at the tr, we discover that if the situation is one of full containment, then the visual angle of lm at tr is very close to 360 degrees. This may be thought of as a computational model of an initial image schema for containment.

Symbolic unit discovery: The system has a fragile perceptual schema, it now needs to discover what to call this. For this, it considers co-occurring commentaries by a number of adults. All lexical items uttered in sentences co-occurring with containment situations are candidates for association. We use a mutual information measure for association, and discover that in, into, inside are the words with strongest association with full containment. Thus we learn a symbol (in the sense of cognitive grammar) with both phonological and semantic poles.

The discovered symbol has two arguments slots for {tr, lm}, and a function that discriminates if a given spatial situation between them is [IN] or not. On the other hand, when hearing the word "in", a visualization can also be generated by this schema, by imagining a landmark which has a visual angle.

phrase structure discovered from untagged corpus, using the [Solan/Edelman:2002] algorithm ADIOS.

symbolic composition: symbolization for "in the box", with one argument slot free, for the trajector

Discovering syntactic constructions (Symbolic composition): Subsequently, the system starts to look at the patterns of tokens that appear in the same commentary. Several construction are found to frequently co-occur with containment situations, e.g. the {circle|big square} moves into the box. These constructs are then associated with containment.

We can now recognize synonyms using this construction, e.g. someone says "the big block goes into the box" - by comparing with the visual image, we can recognize the "big block" as a synonym for "big square". At this stage, some pronomial anaphora are also discovered as polysemies ("it", "them", "each other").

Meaning enrichment: Once such language constructions are known, we can identify these in novel text, without co-occurring perceptual input. This is how most words are learned.

E.g. the construction "in the X", has X as the container. From the large Brown corpus, we now find a number of tokens appearing in this position - this tells us that these tokens are acting as a container. However, by looking at the object classes these tokens are coming from, we can identify that these are not direct spatial containment. Hence, we gradually extend the meaning to include conventionalized metaphorical extensions; the semantic pole is enriched with containers such as time ("in the nineties") or group ("in the team").

This work has two ramifications for building AI systems. First, scaling up to human-like capabilities may not be possible by hand-coding the knowledge: such unsupervised approaches for learning schemas may be is more scalable, though they require many many such situations with co-occurring language. One of our goals should be to develop large corpora with such data.

An associated problem I am looking at involves discovering design symbols based on exploring design spaces, and models for learning design expertise, and especially the types of tacit knowledge that goes into formulating design decisions.

Select Publications: Attention and Symbol Emergence

A sample frame from a 2D video (from Tversky group, Stanford U.). The parallel commentary for this scene may say: The big square is pushing the small square.

 

Other interests: Hands-on learning in schools

 


publications   students   storytelling science  literary  iitk birds  book excerptise 

 

cse home page    center for robotics    iitk home   

Update: Oct 2009