Storytelling Science

Learning to Talk

Amitabha Mukerjee


Imagine you are a child, and that you are observing some monkeys with your parents. Your parents point out things: "See that little monkey holding onto its mother". Because you have experienced some of these words in other circumstances, (e.g. the sound "hold on"), you are able to apply the meaning here, but also to broaden your understanding of what these words mean.

While all of us agree that this is how we learn meanings, if I ask people about words and meanings, most people talk of dictionaries. But having one word explained using other words ends up in a cycle - A explained using B which is eventually explained using A. In second language teaching, this is precisely what does not work. Our understanding of language must be grounded in our perceptions. Indian philosophers such as Bhartrihari (7th century), after much debate on word meanings, had developed the notion that the meaning of a word does not reside in the word itself - it is the set of all sentences in which a word can be meaningfully used. This is a surprisingly modern concept, and is reflected in the work of Western philosophers such as Carnap (1928) and Quine (1990). The thinking today has gone further, many people feel that having a body is essential in learning language.

Big square, Small square, and Circle are actors in this movie created by psychologists. This is a scene which people often describe as "The large square is hitting the small square". Other scenes may be described as "The large square corners the circle and is threatening her."

Meaning and the Senses

As another example, take the words "small" and "large". Clearly they represent different sizes. Now consider the expressions "small dog" and "large dog" - they should simply change the sizes of the dogs, right? But in repeated tests, when asked "What is the colour of the dog?" most people see the "large dog" as a dark colour such as black or brown, whereas the "small dog" is more often white. Clearly, this information correlates our experiences of dogs.

Verbs, such as "chase", or "sleep" involve more complex structures - why is "Green ideas sleep furiously" not meaningful? When is a monkey "chasing" another, versus "following" it? Such notions are difficult to acquire, and to understand words such as "chase", it helps to have experienced it, which requires you to have a body.

Can a Circle be Female?

For years, psychologists have been developing simple videos with squares and circles moving around, but when these are shown to people, they describe it in terms of elaborate intentions ("the square and the circle are lovers"), even ascribing different genders to certain shapes. A lot of visual meaning are hidden in these cleverly constructed videos.

But how can computers acquire language? Most programs that attempt to interpret human language do so based on elaborate rules of grammar, but the concept of meaning is often rudimentary. It is increasingly being realized in the Artificial Intelligence community that meanings can only be learned as a function of the body, in terms of one's perceptions and actions.

In recent work at IIT Kanpur, such videos from psychology are being used to learn the event structures of simple verbs such as "chase", "hit", "hide" etc. The computer correlates the actions in the movie with verbal descriptions by humans, to come up with models of these actions. Later on, it can be shown new movies, and is able to recognize these actions. It is hoped that in this way, computers will slowly be able to build up models of meaning that are grounded in seeing.

But humans use many notions that are not so simple to capture. For example, in one scene, the big square is chasing the little square and the circle. The computer detects this, but since the circle is running slightly ahead of the little square, it also says "the little square is chasing the circle". Humans would never say this, because for them, the big event that they focus their attention on is the chase by the big square. Understanding such aspects require building complex models of attention based on the fact that our statements must be compact.

At the end of this, will computers understand language? It is quite likely that they would be able to act appropriately in many situations with language, so to some extent, the answer is "yes". But will they do it like humans do? Until they have a body like humans, the answer is, "Quite certainly not!"

  • More Storytelling Science Columns

  • Amitabha Mukerjee works on building artificial brains at IIT Kanpur.