FINAL PROJECT REPORT

VIRTUAL MODEL OF A DANCER CONTROLLED BY VISION

Contents:-

  • Motivation
  • Past Work
  • Methodology
  • Results
  • Conclusion
  • Bibliography
  • Implementation
  • by:-

    Soumyadeep Paul (96283)

    2nd Year B.Tech. CSE deptt. IIT Kanpur

    Topics:-

                                                      MOTIVATION          

       The basic aim of this project is to show how hand gestures could be captured using a single camera and modelled so as to control a graphics character. Its motive therefore is to build a program which could track hand motions in real time and use the output obtained to control a graphics character. The goal of such systems is to remove human - machine interface boundary and to bring human - machine interaction closer to human-human interaction. In the future , such interfaces would become more and more important as computers will be called on to mediate in teleconferencing and process images and videos in which gestures will convey significant information. Moreover such a method could also be used in human metamorphosis systems , in which anyone can change (metamorphose ) his/her form into other characters.

                                         

                                                PAST WORK

        Dr. Jun OHYA , Kazuyuki EBIHARA, Jun KURUMISAWA and Ryohei NAKATSU of ATR Media Integration and Communications Research Laboratories constructed a Virtual Kabuki Theater recently to show how body movements and face images of a person detected in real time could be used to govern the movements of a Kabuki actor's model. Theirs was a first step towards the development of human metamorphosis system. Though this project draws its inspiration from their work , the methodology used in detection of hand movements is very different.

        Work is also going on in the MIT Media Lab towards the development of a Virtual personalised Aerobics Trainer. The user's aerobic movements are captured using a camera and a feedback is given to the user according to the quality of his performance. The motive of creating such a system is to allow the user to create and personalize his/her aerobics training session according to his/her needs and desires.

       For more information :-

  • Virtual Kabuki Theater - I could only find a summary there.
  • Virtual personalized Aerobics trainer :- a great site to visit. They have all the information in their web pages. (MIT Media Lab)
  •                                                                           

                                              METHODOLOGY

       The methodology I used for hand tracking in real time is the following :-

    The image sequences with the user moving his hands are captured and the image processing method described below is applied to all these images.

    BINARIZING THE IMAGE - The images are binarized using an appropriate threshold value. This removes a lot of background noise .

    SMOOTHING AND OPENING - The binarized image is smoothened using a standard gaussian convolution matrix . Opening consists of looking upon the neighbouring pixels and opening that pixel . It is actually erosion followed by dilation. Though these convolution operations effectively removes a lot of noise, they are time taking. For many backgrounds (like the black background I am  using ) these are not necessary and hence I am leaving this option to the user.

    CALIBRATION:-

    Initial pose of the user

    In the beginning the user is made to stand with his arms wide apart. This position is used for  calibration i.e. finding out user's shoulder tip position , armlength  and forearm length. I am doing this in the following way. By scanning from two extreme ends of the image ,  I try to locate the first few pixels of high intensity. Since the background noise has been eliminated , and the user is standing with his arms wide apart hence these pixels correspond to those of user's wrists. Using this therefore , I get the x coordinates of the wrist. Now the x coords are used to locate the y coords in the following way. Consider a strip of small width having the above as x coords. In this strip I locate the high intensity pixels and they give the y coord of the wrist. Shoulder strip is located using statistical data about human body distances.

    Now arm length is found using above data. The calibration is now complete.

    TRACKING HAND :-

    Hand tracking is done in the following way. The extreme tips are located using the above procedure. This might either correspond to the user's wrist or the elbow. Since I already know the armlength , forearm length using statistical data, I calculate the distance of the located tip from the shoulder and compare it with upper arm length. Now there are two cases that can arise :-

    Case 1:  The distance of the located tip is greater than upperarm length. In this case the located tip is the wrist. I use geometry to locate the elbow joint and using some image processing I resolve any ambiguity that may arise.

    Case 2:  The distance of the located tip is lesser than upperarm length. In this case the located tip is the elbow joint. This also means that arm is folded. The wrist is now found out by looking both above and below the line joining the wrist and the shoulder tip. This gives me the wrist position since I am not allowing occlusion.

    After location of wrist and elbow joint , the data is fed to graphics part. Following is an example image.

    GRAPHICS OUTPUT

       I had initially planned to give a much better graphics output than the one at present. But due to some implementation problems ( shortage of memory on host system ) I had to discard my idea. My initial plan was to do animation using dancer images ( or drawings) with arms in various positions . But this had a problem ; since the number of such images is limited , the continuous data output from the image processing part had to be dicretised. In the present case this problem is not there since the images are constructed according to the output of the image processing part.

                                                                                                                                                                     

                                              RESULTS

       The results obtained after binarizing the images and applying convolution matrices are good . The following is an example image.

      If the user stands in the correct pose with his arms wide apart initially, then the shoulder tip data obtained is accurate thus validating the statistical data.

      The following image shows the result obtained in run time :-

    The vertical lines show x coord and horizontal lines show y coord detected

      One basic problem with such systems is that there is a inverse relationship that exists between accuracy and speed. One could use a heuristic that would be more accurate in obtaining the coordinates but with visible reduction in speed. Since speed is of utmost importance in real time systems , one should try to optimise the accuracy and the speed.

                                                                                                      

                                               CONCLUSION

     It is possible to achieve real time tracking of human hand even without restricting the user to wear gloves or dark clothes.

     The limitations of the present work are :-

  • Requires a black background.
  • Does not work when the user is wearing pretty dark clothes.
  • Works for some specific lighting conditions only.
  •  I have written the code for dark clothes case too but due to time limitations, I wasn't able to remove all the errors from it.

    Click here for the heuristic in dark clothes case .

      A good future work in this topic would to be to track body movements too. This would lead to more realistic output images (dance sequences). Another extension could be to make the graphics more realistic. Following is one of the ways this could be done :-

       Capture some images of the dancer ( or any other character for that matter ) and break up the image into fragments. Now these fragments could be used to construct images according to output obtained from image processing part. I have already written a program which automatically detects user's shoulder and elbow joints and breaks up the captured image into corresponding fragments. The following are some outputs of the program.

      More such images could be captured with arms at different angles say 45 degrees and 135 degrees wrt the vertical.The fragments could now be joined to obtain various anm positions. This was just a crude example of how one could attain more realistic imagery.

                                                                                          

                                              BIBLIOGRAPHY

    1) Jun OHYA et. al , "Virtual Kabuki Theater: Towards the Realisation of Human Metamorphosis Systems",IEEE Intl. Workshop on Robot and Human Communication,1996 IEEE.

    2) James M. Rehg and Takeo Kanade, " DigitEyes: Vision - Based Human Hand Tracking ", Proc. 1994 Proceedings of European Conf. on Computer Vision, Stockholm , Sweden.

    3) Thad Starner and Alex Pentland, " Real-Time American Sign Language from Video Using Hidden Markov Models " , MIT Media Lab. Tech. Report Vol. 375,1995.

    LINKS

  • Virtual Kabuki dancer controlled by Vision ( ATR , Dept : 1 )
  • MIT Media Lab Vision and Modelling Group
  • A Virtual Personal Aerobics Instructor
  • The Computer Vision Homepage
  •                                       

                              IMPLEMENTATION DETAILS

       The source code is written in C language. The image processing part is done using an image processing library called Matrox Imaging Library ( MIL ) . This allows the capturing of image sequences using a camera. The code could be compiled only on the machines which support MIL ( in this case the image processing computer in the Robotics Lab , Mech. Dept. IITK ). The source code is in a file e:\user\spaul\ai\demo\demo.c . It can be compiled by going to that directory and typing mcl demo. The graphics is done using Borland C++ graphics library. Moreover , the program for fragmenting the images is also there in the file e:\users\spaul\ai\demo\frag.c.It can be compiled by typing mcl frag.

    EXTENSIONS

       The source code is sufficiently commented . Any additions could be done by writing out separate functions and making appropriate changes in main. The comments would help the user to understand main easily .

    ----------The End ---------