PROJECT PROPOSAL

COURSE-ME768
ARTIFICIAL INTELLIGENCE IN ENGINEERING
INSTRUCTOR: Dr. AMITABHA MUKHERJEE

ARTICULATED AGENT MOTIONS BASED ON NL INPUT

Kesari Anandsudhakar
Rajesh Rajasekar
Vikrant Kumar
email: { ands, rajraj, vikrantk }@iitk.ac.in

Motivation

The Virtual Director project at IIT Kanpur attempts to build a virtual environment and produce actions in it according to a natural language story. This project fits into the big picture beyond the stage where natural language has been parsed for grammar and semantic information, and the spatial data defuzzified. At this point, the input would be in the form of spatial descriptors and action directives which are at a level of complexity below natural language but higher than the most primitive of directives.

Objectives:


Examples

The woman on the left narrated a joke animatedly.
The fat man and the other woman listened.
The other woman was amused by the joke, but the fat man wasn't

For this short segment from a story a sequence of images like the following may be produced.


Past Work

Work has been done in this field at the Indian Institute of Technology, Kanpur by Dr. Amitabha Mukherjee as a part of the Virtual Director project.

The aspects of anthropometrics and modelling are discussed by Norman Badler and Stephen Smoliar [Badler/Smoliar:1979]. Physical aspects such as the dynamics of complex objects are studied by Hoffman and Hopcroft [Hoffman/Hopcroft:1987]. This will be used to perform character animation.

Norman Badler has developed Jack, a basic human body animation system [Badler]. An extension of the basic Jack systm by Levison et.al. [Geib/Libby/Moore:1994] is SodaJack which simulates a soda bar operator. Action planning in the context of search and manipulation of objects is discussed.

Salesin et.al have created a precise language for the otherwise fuzzy process of defining shots and other basic elemnts of cinematography [Salesin/Cohen/Christianson:96]. Camera control to amplify the impact of the story can be based on this convention.

Lozano and Lozano-Perez explain the method of using visibility graphs to perform path planning in a domain with polygonal obstacles [Lozano/Lozano-Perez:1996].



Sample Input-Output

The input to the program will be a description of the scene followed by a transcript of the interactions between the various components of the scene.
A man is trying to play ball with his girlfriend. But the robot seems to charm her more. Here is a sample of what the input may be like:
Sky.color=blue
Grass.color=green
//1
tree=new Tree
bench=new Bench
woman=new Woman
wall=new Wall
robot=new Robot
ball=new Ball
man=new Man
//2
tree.location=(-10,50)
bench.location=(0,20)
bench.color=red
woman.posture=sit
woman.location=bench
woman.wear=new Shirt(red)
woman.wear=new Skirt(blue)
wall.location=(-8,40)
wall.length=5
wall.orientation=45
robot.location=(3,18)
ball.color=red
ball.container=wall.top
ball.location=(-7,41)
man.location=(-9,50)
man.wear=new Shirt(blue)
man.wear=new Trouser(black)
//3
man.goto=ball
man.pickup=ball
man.goto=woman
man.give.object=ball
man.give.dest=woman
woman.accept.src=man.give
woman.wait=man.giving
man.give.go
woman.accept.go
woman.throw.object=ball
woman.throw.direction=robot
woman.throw.go
robot.wait=ball.land
robot.goto=ball
robot.pickup=ball
robot.throw.object=ball
robot.throw.direction=woman
man.goto=ball
man.pickup=ball
man.goto=woman
man.give.object=ball
man.give.dest=woman
woman.accept.src=man.give
woman.wait=man.giving
man.give.go
woman.accept.go
end

In this domain, Sky and Grass are global objects whose characteristics are defined in the input. Between 1 and 2, the other instances of other classes such as Man, Tree, and Ball are initialized. between points 2 and 3, each object's characteristics are set. Uninitialized properties of objects will default to preset values. This presetting may be absolute or in terms of other properties. At this stage, the output will be a rendering a scene with all the objects in their initial configurations.

Beyond point 3, the actions of the various objects is specified. 'goto', 'pickup', and so on are simple actions where only one effector and one affected play a part. In an action like 'give', there may be a corresponding 'accept'. These two constitute a composite process wich need to be coordinated temporally. This is achieved through the use of the 'wait' construct. For example, when a Man 'gives' a Ball to a Woman, he generates a 'giving' message. So issuing a 'woman.wait=man.giving' instruction makes woman, the instance of Woman, wait till it(she?) recieves a 'giving' message from man. To me, this appears to be quite analogous to the role played by body language in the interaction between human beings.

Whenever an action is executed, it is known what the involved objects are. This information is processed by the Camera Control module. The objects give the position of and area covered by the event. The type of event ('goto', 'give', etc) is used to decide the appropriate camera idiom.

The output will be a graphical rendering of the scene and the events therein; similar to the previously illustrated sequence of images. Presently, the choice for the format of output is VRML. This provides a platform independent and compact representaion for the scene and enables easy transmission over networks.


Online Links

Human modelling at the University of Penn.: http://www.cis.upenn.edu/~hms/home.html
The JACK project: http://www.cis.upenn.edu/~hms/jack.html
References to Virtual Human works: http://www.pasociety.org/perfanim
Ken Perlin's IMPROV: http://www.mrl.nyu.edu/perlin

Bibliography

@Article{Badler/Smoliar:1979,
  author=       { Badler, Norman I. and Smoliar, Stephen W.},
  year=         { 1979},
  institution=  { U. of Pennsylvania 2-->National U. of Singapore},
  title=        { Digital Representations of Human Movement},
  journal=      { ACM Computing Surveys},
  month=        { march},
  volume=       { 11},
  email=        { badler@central.cis.upenn.edu; Smoliar@ISS.nus.sg},
  annote=       {
The techniques of representing a human being as a computer generated 
graphic entity are discussed. A brief description of the Labanotation 
used to presicely quantify body postures is given. It is emphasised 
that although direct representation in a 2D space is possible, in the 
general case, the better approach is to construct a 3D model and then 
project it to a plane. Three aspects are discussed:
1 Representations of the human body
   this is done by the following methods:
    stick figures : the simplest and most unimpressive
    surface models: excellent results, but slightly unrealistic and 
                    a fair share of blemishes.
    vomule models : the human body is decomposed into primitive solids
                    such as cylinders, spheres, and ellipsoids.
2 Representation of movement
  Given a model for a human body, the process of animating it involves 
  producing a succession of frames, each slightly different from the 
  previous. This is attained by key frames, where a set of important 
  frames is provided and the intermediate parts are interpolated; by 
  movement functions where labanotation is used to choreagraph the 
  motion; and simulation where the mechanics of the human body are also
  encapsulated in the model.

3 Finally, an architecture for such a system is discussed. 
-k.anandsudhakar feb/2k }

}
@Misc{Geib/Levison/Moore:1994,
  author=       { Geib, Christopher and Levison, Libby and
                  Moore, Michael B.},
  year=         { 1994},
  institution=  { upenn},
  title=        { SodaJack: An Architecture for Agents that Search for 
                  and Manipulate Objects},
  month=        { january},
  email=        { (geib,libby,mmoore)@linc.cis.upenn.edu},
  annnote=      {			
This paper deals with the problem of an agent whose aim is to search 
and undertake manipulation tasks in an environment. They have 
implemented the approach in a system called SODAJACK, which does the 
animation. The agent receives as input high level commands like 
"fetch the scoop" and the system has to figure out the exact low-level 
action to do the job. This involves a knowledge about the possible 
locations of the scoop, plan a route to those, explore them, and then 
finally the act of lifting it up. 
The system has been divided into a hiearchy of three planners that 
respond to the input goal, and give as output the action outline to 
acheive the goal. The task division is like this:
1.search planner, it converts the goals into a plan to search.
2.object specific planner, this relates to each search plan by the 
search planner a particular object and undertakes feasibility tests 
for the action plans generated.
3.hierarchical planner(ItPlans), this supervises the other two and 
delegates the control first to search to get a plan and then to the 
object to make it specific.  -Vikrant Kumar 11/02/2000 }

}
@Misc{Salesin/et.al:1994,
  author=       { Christianson, David B. and Anderson, Sean E. and
                  He, Li-wei and Salesin, David H. and 
                  Weld, Daniel S. and Cohen, Michael F.},
  year=         { 1994},
  instituiton=  { 4U. of Washington; Microsoft Research, Redmond
                  2-->Stanford},
  title=        { Declarative Camera Control for Automatic
                  Cinematography},
  email=        { (dbc1,lhe,salesin,weld)@cs.washington.edu;
                  seander@stanford.edu; mcohen@microsoft.com},
  annote=       {
For long programmers haven't made use of cinematographic principles in 
computer animations. The authors try to fill this gap by making the 
rules of cinematic storytelling lend themselves easily to programming. 
For this purpose they have formalized the rules into a  Declarative 
Camera Control Language(DCCL). Such a thing will be very useful as it 
will allow programs to present a dramatic point of view aesthetically.
The authors first introduce the language of the cinema like the breakup 
of a film into scenes and shots, shots being the smallest unit. Another 
thing is the placement of the camera, which depending on the scene can 
be apex, internal, external or parallel. Cinematograohers have 
identified certain field of views of the shots which give pleasing 
results. And then there are certain constraints on a shot which should 
be satisfied like parallel editing and break movement. Next comes the 
concept of idioms, which is the way cinematographers describe 
situations in a film. DCCL is an attempt to formalize this idiom. 
The DCCL is composed of four basic components fragments, views, 
placements and movement endpoints.Fragment is the time interval during 
which the camera performs a simple motion. A simple shot may comprise 
of one or more fragments.

Next they define the Camera Placement System(CPS). The CPS is a three 
stage pipeline consisting of 
1.the sequence planner
2.the  compiler
3.the heuristic evaluator.
The basic aim of the CPS is to give the camera positions depending on 
the input of the positions of the various interacting entities. And 
the authors have implemented this approach in a video game.
The authors succeed to bring to the highlight the importance of using 
the cinematographic techniques in computer animations so as to make the 
experience more enriching.             -Vikrant Kumar  11/02/2000 }

}
@Article{Lozano/Lozano-Perez:1996,
  author=       { Lozano, Oded Maron Tomas and Lozano-Perez, Tomas},
  year=         { 1996},
  institution=  { 2mit},
  title=        { Visible Decomposition: Real Time Path Planning in 
                  Large Planar Environments},
  journal=      { AI Memo},
  month=        { Januaray},
  www=          { ftp://ftp.ai.mit.edu/pub/users/oded\
                  /papers/planning.ps.Z},
  email=        {oded@ai.mit.edu,tlp@ai.mit.edu},
annote= { This paper deals with the use of visibility graphs to do motion planning. -Rajesh Rajasekar 2/2000} }
@Article{Hoffman/Hopcroft:1987
  author=       { Hoffman, Christoph M. and Hopcroft, John E.},
  year=         { 1987},
  institution=  { Purdue-cs; Cornell-cs},
  title=        { Simulation of Physical Systems from Geometric Models},
  journal=      { IEEE J. of Robotics and Automation},
  month=        { june},
  vol=          { RA-3},
  annote=       {
The mechanics of simulation are discusssed. -k. anandsudhakar feb/2k}

}

Kesari Anandsudhakar, Vikrant Kumar, Rajesh Rajshekhar at IITK