Artificial Intelligence ME 768 Jan-Apr 2000 
CONTROL OF A ROBOT BY VOICE INPUT


Submitted by

Gatram Pradeep (97131)
Shalabh Gupta (97319)

IIT Kanpur : February 2000



INTRODUCTION

        Suppose you want to control a menu driven system. What is the most striking property that you can think of ?

        Well the first thought that came to our mind is that the range of inputs in a menu driven system is limited. In fact, by using a menu all we are doing is limiting the input domain space. Now, this is one characteristic which can be very useful in implementing the menu in stand alone systems. For example think of the pine menu or a washing machine menu. How many distinct commands do they require.
Top
 

MOTIVATION

        Last year we both participated in robocarromines (Techkriti '99). We were using switches to control the various motions of our robots. Then Shalabh participated in sumo fighting (Yantriki '99) using a fully autonomous version of the robot NINJA. We felt the next logical step was to either go for some sort of wireless control mechanism or design a voice based control system. We decided to go for the latter.

        Also a dancing robot competition is being organized by Ingenuity cell at Techkriti-Millennium, in which the robots have to dance to the tune of the music being played. This event was the one which got us to think about the concept of a voice controlled robot.

        We are not aiming to build a software which can recognize a lot of words. Our basic idea is to develop some sort of menu driven control for our robot, where the menu is going to be voice driven. A recognition strength of a few words would do for such kind of jobs. A person interacting with such a system would not need to use his hands for routine jobs, which is what we wish to achieve. This leads us to our main task in the project.
Top
 

THE TASK

       What we are aiming at is to control the robot Ninja using voice commands.
       Ninja can do these basic tasks :-

  1.  move forward
  2.  move back
  3.  turn right
  4.  turn left
  5.  hit
  6.  stop ( stops doing the current job )
        This can be considered as a small menu consisting of 5 commands. So a software which can recognize and distinguish the 5 commands from one another will do the job. So a software needs to be developed which takes voice data as input & outputs the matched command.
Top
 

SAMPLE INPUT OUTPUT
 
 
INPUT (Speaker speaks)
OUTPUT (Robot does)
forward
moves forward
back
moves back
right
turns right
left
turns left
hit
hits the coin
stop
stops doing current task

Top

PAST WORK

        A lot of work has been done earlier in the field of  isolated word recognition. Using a traditional recognizer an accuracy of around 60% has previously been obtained for both a 156 town name task and 1108 road name task. Techniques presented in [Azzopardi/Semnani_et_al:1998] has resulted in an accuracy of 90% for an automated corporate directory system with 120,000 entries.

        As an input method for rapidly spreading small  portable information devices, and advanced robotics' applications, development of  speaker independent speech recognition technology which can be embedded on a single DSP chip has been developed by [Hoshimi/Yamada_et_al:1998]. When the newly proposed noise robustness method was tested with 100 isolated word vocabulary speech of 50 subjects, recognition accuracy of 94.7% was obtained under various noisy environments.

        Software engineering for research and development in the area of signal processing is by no means unimportant. A programming paradigm which allows software components to be advantageously combined with each other in a way that recalls the concept of hardware plug-and-play, without the need for incorporating complex schedulers to control data flows has been developed by [Dutoit/Shroeter:1998].

        Earlier similar work in a limited input domain was done using wireless for e.g. remote control of electrical switches (this is currently one of the ingenuity problems). We read a newspaper report about an year ago (The Hindu : Thursday Science & Technology Section) about such a project. A suggested application was for hospitalized patients who usually are dependent on some one else for to switch on/off the lights, fan, etc. But what if the patient's hands are broken. Obviously a voice based system ought to be used in such a case.
Top
 

METHODOLOGY

        We are taking the voice data from the microphone using a soundcard. This data is stored in an array. This array is passed on to a function which extracts words from the array ( i.e. spoken words are extracted & quiet periods are dumped ). These words are the sent to a function which extracts frequency as a function of time. This is the frequency vector of the spoken word. This vector is compared with reference vectors. The comparison is done using the standard inner product of two vectors. One of the reference vectors would match ( i.e. the inner product in this case is greater than the other 5 ). The command corresponding to this reference vector is fed to Ninja. The electronic circuit mounted on Ninja would then interpret the command & move it accordingly.
Top
 

RESULTS

Top
 

CONCLUSIONS
        In this project we are getting a user dependent isolated word recognition system with a recognition accuracy of about 85% using six words. The accuracy can be improved further and the system can be used for more number of words if during the training of the system, the noise conditions are improved.
       We were getting a peak SNR (Signal to noise ratio) of about 20 dB, whereas in best conditions, SNR can be obtained upto 35 dB, at 8 KHz sampling rate. Also the microphone used by us was not filtering out the bursts of air produced when we speak words, which was adding to a lot of noise in the input voice signal.
       But for speaker independent word recognition system, we cannot use the technique discussed here. The frequency scales, speed of speaking words, and signal power concentration on different syllables vary widely from speaker to speaker (as depicted by the variations in the frequency and amplitude graphs for the same words in methodology section). Thus for speaker independent systems, we must use a better approach like Markov chain modeling etc.
 
Top
 

APPLICATIONS
        We believe such a system would find wide variety of applications. Menu driven systems such as e-mail readers, household appliances like washing machines, microwave ovens, and pagers and mobiles etc. will become voice controlled in future. Our project may find applications out there because inherently the number of possible inputs are limited. Using our software these can be controlled through a network as well.

Top
 

WEB LINKS


Top

SOURCE CODE

BIBLIOGRAPHY


This proposal was prepared by Gatram Pradeep and Shalabh Gupta as a part of the project component in the Course on Artificial Intelligence in Engineering in the JAN semester of 2000 .
(Instructor : Amitabha Mukerjee )

[ COURSE WEB PAGE ] [ COURSE PROJECTS 2000 (local CC users) ]