Learning To Play Table Tennis - A Reinforcement Learning Approach
Course project for ME768
Instructor Dr. Amitabh Mukherjee
Students : Sudipta N. Sinha , Anshuman Rai
Contents
  1. Introduction - aim & objectives
  2. Past Work
  3. Motivation
  4. Methodology
  5. Expected Results
  6. References
  7. Online Resources
Introduction

Aim of the project -
 

                    In this course project we propose to build a simulator for theTable -tennis  game  with the main focus on designing various approaches to the other side of the table. using to make the players learn the shots in the virtual table-tennis environment that we will simulate.  The simulator will provide an alternative to experimenting with real robots by modelling virtual robot players with neural networks controlling them.

Back to Contents

Why Reinforcement Learning ?
 
 
                     Reinforcement Learning has been used for control problems like  Elevator Dispatching, dynamic channel allocation and strategy games like Back-Gammon and Checkers with very large state spaces of the order of 10^20.  An alternative form of learning called supervised learning is learning from examples provided by an external agent, but alone it is not enough for learning from interaction. In interactive problems such as  ours is a bit impractical to get examples of desired behavior that are both accurate and representative of all the situations in which the agent has  to act. In uncharted territory---where one would expect learning to be  most useful---an agent must be able to learn from its own experience. A game like Table-tennis ( or similar racquet sports) provide an   interesting mix of control problems and strategy problems making  the task of developing good players quite challenging.
Past Work -

 Table-Tennis Simulator used for Neural Networks -
 

                       Some illustrative work has been done by D Aulignac, A. Moschovinos and S. Lucas in building a 2D virtual Table-tennis simulator at Vase Labs, Univ. of Essex with the aim of  holding a table tennis tournament where different players( robot controllers) could play against each other. They chose to have a 2D simulation and design neural networks and then train them using training sets generated by another program (the algorithmic  controller). The simulator they initially built  used a multilayer perceptron (MLP) architecture and later comparisons were made  with radial-basis functions (RBF) architecture. A second approach which they have not implemented but suggested is the use of modular neural networks. It involves decomposing the task into a smaller sub-tasks where each is handled by a specialist network.
  The Acrobot
 
                       Reinforcement Learning has been applied to the task of simulating  a  gymnast as a two-link robot arm that learns to swing on a highbar. The system has been widely studied by control engineers (eg. Spong, 1994) and machine learning researchers (eg. Dejong  and Spong, 1994; Boone, 1997). The learning algorithm used was Sarsa(lambda) with linear function approximation, tile coding, and  replacing traces with a separate set of tilings for each action. Although this project has nothing to do with Table-tennis, this is essentially modelled as a Markov Decision Problem (MDP)  and the reinforcement learning method used here is linear function approximation and tile coding. This introduces the key issue of generalisation, how experience with a limited subset of the state  space is generalized to produce a good approximation over a much larger subset..This issue is also important in our Table-tennis agent
 

Back to Contents

Motivation
 
 
                        Perhaps the greatest drive/motivation for this project is that  little work has been done in this area - although similar  problems have been tackled. Also the simulation can be a precursor to  solutions for  other aspects of this problem  - namely vision and robotics, and  integrating these solutions, we can perhaps get to see an actual robot playing table-tennis! By building the simulator we are creating a framework where we can test appropriate algorithms for the controlling a robot without going into the physical aspects. Also since racket sports like tennis, badminton,  squash, and even games like baseball - are essentially the same in principal as far as reaching the ball is concerned - only difference being the performance measure of a shot and other physical aspects, with some changes in parameters,  without modifying the basic framework,  the simulator  can model  other sports. By incorporating human-like physical constraints for racket motion - the simulation can provide insights into the game of table-tennis.
 

Back to Contents

Methodology
 
                The physics of the real game are simulated to adapt the requirements of a real-time virtual environment. These include
 
Approach 1
 

Underlying Assumptions : We assume that the velocity vector and coordinates for the bat are independent of each other - under this assumption we can decompose the problem of determing coordinates and velocity of the bat  into two individual and unrelated parts.

The two modules are as follows:

where,

S is the problem state
A is the action set for the state S.
X ,Y,Z are coordinates
V is velocity
The superscripts b and r  refer to the ball and racket respectively.
The subscripts denote the components
terms supercripted with ' are determined  by the  simulator
terms superscripted   by * are determined by the agent action


 
 
  • Determining the velocity/force of the bat at the point of interception :  This phase again uses a neural network  and a training set (described  below). The input to the neural network is the interception point and the output produced is the velocity of the bat that will return the ball to the other side of the table. The generation of input/output pairs (training set) is done by a separate program that considers a particular perfomance measure.
  • Approach II

    In this case at each timestep the action set comprises of increments in the velocity components, which using the coordinates of the bat in the previous state determines the next state. Each state is characterised by - ball  position, ball velocity, bat position , bat velocity. Here also we use a Q-learning network with a different reward function R
                  R(s) = a high -ve value          for missing the ball altogether
                            = r(x,z)
                              when the ball is returned  and (x,z) is the point where  the ball  lands on the table (see fig.)
    Here the trajectory of the bat will be continuous because the increments in the position coordinates
    are dependent on the increments in velocity components.
     


     
    where,

    S is the problem state
    A is the action set for the state S.
    X ,Y,Z are coordinates
    V is velocity
    The superscripts b and r  refer to the ball and racket respectively.
    The subscripts denote the components
    terms supercripted with ' are determined  by the  simulator
    terms superscripted   by * are determined by the agent action

    Back to Contents


    Expected Results

     
    The results in approach I will be more predictable especially from the second module because this does not involve Q-learning.

    Approach I

    The following images show the frames captured with the corresponding state representation during the phase of finding intercept point between ball and bat. The motion of the bat is shown in dotted lines (our guesses)
               Frame No - 1

             ball position coordinates     =     6.50     8.25    1.90
             ball velocity-components   = -50.00  -2.98 -1.00

     

     (open image in new window for better view)

    Frame No - 2

    ball position coordinates   =   2.86    7.98    1.83
    ball velocity-components= -22.69 -2.33 -0.45


     
     

    Frame No - 3

     ball position coordinates  =   -0.88   6.46   1.96
     ball velocity-components = -13.53 -5.92 -0.71


     
     

    Frame No - 4

    ball position coordinates   =  -6.26   5.85   1.59
    ball velocity-components =  -5.75  1.93  -0.14


     

     
    Approach II

    In this case the motion of the bat will be continuous. Also we expect some creative solutions in this case, for instance, the one illustrated below. It will favour such human-like solutions (shots) with back-swing , i.e.,  a better prepared player will play a better shot.

     

     
    Back to Contents
    References
     
  • C.W Anderson and Zhaohui Hong, Reinforcement Learning with Modular Neural Networks for Control, Colorado State University, Fort Collins, CO 80523, anderson@cs.colostate.edu .
    1.  D. d' Aulignac, A. Moschovinos, and S. Lucas, Virtual table tennis and the design of neural network players, Deptt. of Electronic Systems Engineeeing, Univ. of Essex, Colchester, C04 3SQ, UK, sml@essex.ac.uk
    2. C.W. Anderson, Deptt, of Computer Science, Q-Learning with Hidden-Unit Restarting, Colorado State University, Fort Collins, CO 80523, anderson@cs.colorado.edu
    Back to Contents
    Links to Resources:
     
     
  •  Back to top
  •  Back to index

  •