SUPERVISED LEARNING OF  OPTIMAL GOAL SCORING BEHAVIOURS

FOR A SIMULATED SOCCER ENVIRONMENT

 

Sreangsu Acharyya 9810540

Indian Institute of Technology - Kanpur : August 1998



Contents



 

Introduction

  Informal

        Consider that you are a foot baller. A pass has just been sent your way ,so you have to decide quickly what to do next. i.e  whether to etc. But to do all these things you  need knowledge about these actions and should have learnt to execute them in a hostile noisy environment.

        Consider the utopian case of a static ball before an untended goal :

        If the environment is strictly deterministic you can aim anywhere  between the goal posts . But real worlds are  noisy and hostile so you have to select an aiming point  such that the probability of scoring the goal is maximum considering the uncertainity in your ability of directing the ball exactly and those defenders out their to stop  you etc.

         The problem is not only to get the best shooting solution because their are other  action options.So one needs to consider the best of all these actions and select the one which is  the most relevant. So the problem has an outer  and an inner loop

      The  problem  can be approached in several ways.
 

        The selection of the method has to be based on the relaiability and the ease of hand coding the domain knowledge. If one does have a substantial knowledge about the domain reinforcement learning will spend a lot of time reinventing the wheel .While on the other hand it is dangerous to keep faith on faulty domain knowledge or wrong coaching.



 
 



Past Work

A substantial volume of work on this field emanates from the CMUnited  robosoccer team . There arhitecture is based on Peter Stone and Manuela Veloso's approach of layered learning.

    They have reported using simple feed -forward neural networks for learning shooting  to an un-tended goal . Where the aiming point  within the goal was fixed apriori. [1].

    They have also used memory based learning techniques to learn shooting behaviour in presence of a goalie that  revolves on a circle of a preassigned radius with a fixed  velocity in front of the goal , goal scoring accuracy obtained was of the order of 60%.[2]

    Decision tree learning algorithms have also been used to estimate the probability of succes of a pass to a tem member. This information then can be fed forward for strategiic planning. [3]

    The CMUnited  team is currently working on a re-inforcement learning model that can be applied to hidden state or opaque markov chain models.[4]
 




 
 

METHODOLOGY

 
 

        The approach will be following  Stone & Veloso's paradigm of layered learning , in which  strategies are developed in a bottom up  hierarchy of increasingly complex behaviours.

        At the ground level  will be the most basic: like the locomotive behaviours.

        These will be used  as building blocks to generate more informative behaviours forming the mid level Note : It would be interesting to see if such complex behaviour can be discovered autonomusly from the building blocks .

        The still higher level will include

etc
 
 

2 Methodologies of imparting Behaviours

  1. Reinforcement learning from   "0" domain knowledge.
  2. Hard coding an assumed domain
Considering the present problem the implementation should be somewhere in the continuum spanned by these 2 extremes.Learning can take account of obscure  domain knowledge or a changing environment but it will be slow.
Hard coded strategies assume some domain knowledge thus are not robust in a changing environment.

    The solution will be to incorporate both  depending upon heir applicability
            To impart a substantial amount  of domain knowledge that  one is confident about and is not expected to change and leave the rest to learning. So as to minimise re-inventing the wheel and still get an effective system.
 

 


 
 
 

CONFIGURATION OR STATE SPACE

    X, Y  ,\THETA are the externally  supplied variables internally we may need to keep internal representation as

X , Y, \THETA ,Vx ,VY     for  each player and ball in addition we may have a boolean variable signifying which team has posession.
 


ABSTRACTION QUANTIZATION AND

THE CURSE OF DIMENSIONALITY
 

To facillitate analysis and reduce the search space we need to quantise the search space.

If  Number of position states( x, y , \theta) = P
      Number of  velocity states =                       V
Asuming one position state can be occupied by a single player the state space size will be the order of
p(p-1)(p-2)...(p-2n) v2n *2     [where  n is the size of each team  ]
Which no doubt is a huge workspace and  the curse of dimensionality is very real.

REDUCTION:

    i  ) Through fuzzification of the variables like
                                        penalty  ,  forward ,  midfield
                                        left flank right flank , centre.
    ii)    Consider symmetry i.e tranlation and rotation invariant  variables and rules. like representing position as a relative to position of other players and goal,using abstracted informaton  like direction{North, East,West , South} & accesibility window{broad,Mid , Narrow}  .etc.

CHOOSING THE BEHAVIOUR

        This requires 2 tiered optimisation.i.e.
 
  •  An inner level  optimisation that generates the best parameters for a fixed behaviour. Like selecting the best shooting angle .This inner loop has to be run for all the available behaviours to generate their individual best parameters .
  • Make the most intelligent chice from among those optimised strategies,from the outer level.

  •  

     
     
     
     
     
     
     
     
     
     
     



     
     
     

    THE MATHEMATICAL MODEL OF THE SOCCER FIELD

    FOR SHOOTING BEHAVIOUR

    The system model that is being considered adds uniformly distributed noise with exponentially decreasing magnitude to the ball and player motions. These are guided by the following equations.

    W here   is the added noise.

    The  amount of acceleration a player can impart to the ball is limited by his stamina. The velocity of the ball is the  vector summation of  a decayed fraction of the past  corrupted velocity with an additive noise whose range is proportional to the magnitude of the velocity. The new positiion after unit time step is determined by incrementing the past position with the current velocity.
     
     



















    Since the nature of the noise that is being added to the horizontal and vertical components of the velocities are known it is possible to deduce analytically the probability distribution of the angle  \theta . Knowing this distribution it is possible to estimate the probability that the shot remains within the goal boundary by integrating it  within appropriate limits. This can be understood from the foolowing diagram.

    Real world has another complication ;that of defenders trying to stop the ball from entering the goal. This the probability of goal can be defined as

    P(Goal) = P(shot remains within goal) && P(the ball un-intercepted).

    From the formulation of the angle made with the x axis we get

    The first part of the above equation has already been derived  as.

    Now one has to model the P(ball un-intercepted) term. Because of the complicated model it is not faesible to model the velocity distribution of the players because of there highly nonlinear nature and the abscence of  an applicable inverse function. So for this project simulation techniques were used to model the second term.
     
     


















    The governing equations are  : for the path along the shot

    For the ball to remain unintercepted the time taken by the defender

    has to be greater than the time taken by the ball.
     

    One can combine these 2 probabilities to arrive at the probability of goal given the configuration of the field and the shooting angle.

    Some of the plots obtained through this modelling are shown.

     /TO VIEW SOME OF THE PLOTS GENERATED.
     
     








    RADIAL BASIS FUNCTION NEURAL NETWORKS AND FUZZY LOGIC








                                                                                    The topology of the network is as shown.
     
     









    Radial basis network formulation forms an alternative to modelling with feed-forward sigmoid neural networks. The network consits of a single hidden layer of neurons having radial activation functions . Here Gaussian function is used. The output of the units are given as

    One can note this is same as a fuzzy rule base system using gaussian membership functions to model the individual components of the vector input.. Thus this property of radial basis neural  networks enables us to understand the encoded rules.

    Here the outputs are calculated as



    The final out put is obtained as


     
     
     
     
     
     

    DETERMINATION OF FREE PARAMETERS




    The frre parameters that are to be determined are the

    1. centres C
    2. widths \sigma and
    3. The output weights.
    In terms of  fuzzy logic paradigm it amounts to the optimal placement and shaping of the gaussian membership functions and the  de fuzzified  consequent values.

    Detrmination Of centres:
    The centres of the radial basis units can be obtained either by supervised techniques ( gradient decent) or by unsupervised techniques (clustering). Though supervised  learning is expected to result in further minimisation
    of error here only unsupervised techniques are being used as they are much faster .

    Unsupervised learning can also be divided into 2 categories

     

     

    HARD COMPETITIVE LEARNING:

    Hard competitive learning essentially involves clustering of data vectors into optimum number of classes and finding a prototype  to represent  those clusters.

        The algorithm is same as that of K-means Clustering , usually only the input vectors are clustered ;however in this implementation an extended vector containing all the input  components and some viatal output components have also been included to avoid the possibility of the algorithm to return sets well clustered in the input space but poorly clustered in the output space. Moreover to account for the fact that the means of the individual components are differrnt by orders of magnitude they have been apprpriately scaled.
    The algorithm can be described as.

    The added noise is initially high and allows one to escape local mimima and is similar to the simulated annealing approach.

    SOFT COMPETITIVE LEARNING:

    Unlike hard competitive learning here a training instance may be assigned to different centres with varying degree of membership depending on the distance from th proposed centers. The membership values can range from [0 1]

    The algorithm can be summarised as

    It has been found that hard competitive learning often results in empty clusters, soft clustering is largely immune to this problem but  suffers from the problem of repeated or very close centers which poses problems for determining the weight values of the output nodes.

    DETERMINATION OF  WIDTH SIGMA:

    The sigma parameter or the width of  the radial basis units may also be determined by supervised learning. But here they have been determined heuristically using two rules. Later those estimates wee improved by supervised learning or gradient decent.
    The rules followed for determining the sigma values are

    It was found the 1st rule is better suited for hard clustering whereas the second gave better results with soft clustering.

    DETERMINATION OF WEIGHTS

    The weights are obtained by the minimisationof the squred error with respect to the training set. The minimisation is done by equating the derivative of the error to 0. Though gradient decent may be used for the minimisation this can be solved analytically as it results in a system of linear equations.




    GENERALISATION  ERROR:

    For  good results it is not enough to minimise the error for the training set only. The error should be llow even for cases not contained by the training set. This is the ability of the network to generalise.

    This can be implemented by splitting up the data set into training and validation sets. The obvious drawback being that the entire data set cannot be utilised  as some of the data has to be sacrificed for validation.

    There are other methods by which one can utilise the entire training set and still ensure a good degree of generalisation. eg


     
     
     
     

     Input and Output


    The project consists of two parts

    1. Generation of optimal shooting solutions from off line calculations
    2. Using the above module to generate a training set for a Radial basis neural network whose free parameters are tob determined.
    The  input to the first part will consist of the current configuartion of th soccer field. Here one can have atmost 2 defenders and a golie.
    The configuration vector contains the distance from the centre of the goal  and the angle made with the centreline for each player. It also needs input about the shooting velocity and the velocity of the defenders and the parameters of the associated noise model.

    Using these inputs the code trainx2.c can generate the probability of scoring a goal  for different shooting angles; and thus identify the optimal shooting angle.

    Some of the outputs generated can be seen    HERE

    The second part of the project is devoted to the finding of the free parameters of the radial basis network . It includes
     

    1. The  determination of the centres of the radial  basis units
    2. Determining the sigma  or the width parameters of the radial basis units
    3. Determination of the output weights.
    The  input vector to the network is of the form

        Ds,Dg,Dd1,Dd2,Theta_s,Theta_g,Theta_d1,Theta_d2.
     
     


    RESULTS OBTAINED ,CONCLUSIONS & FUTURE EXTENSION









          It  can be seen that the dimension of the input vector is eight. Thus even with two fuzzy linguistic variable attached to these variables number of rules necessary is quite high i.e  2^8 .

            Till now networks  with 10 ~ 25 radial basis units have been tried and the error has been reduced to the order of  0.3. Thus it can be seen that  with considerably less number of rules the error  over the entire training set has been reduced.

            However  0.3 is still not adequate for precise control of the shooting trajectory as envisioned during prposition of the project. Thus the current trained network  can be used to identify 3 broad zones within the goal post where to shoot and not the exact optimal angle.

        Since only heuristic approaches where taken to locate the centres it is expected that the performance  will be better if those were selected through optimisation procedures. Gentic Algorithms being an obvious choice which will allow similar centres to compete with each other and the dissimilar to co-operate thereby effectively mapping the input and the output space.

            Once  an aceptable error margin has been crossed this network itself can then be used to generate the optimal defensive strategies.
    That will be possible because the nework model being  continuous and differentiable one can consider the free parameters to be fixed and differentiate the output with respect to the goalie and defender location and equate the deriavative to zero to obtain the locations of the goalie and the defenders. Thus the network will be used as a domain knowledge. One advantage of this will be the resolution of conflict between  the defenders  so that each of them do not run to the same spot.
     



     
     











    1.  Peter Stone, Manuela Veloso, Patrick Riley "Learning low-level  Multi agent behaviours"  available at http://www.cs.cmu.edu/afs/cs/usr/pstone/public/papers/

    2.  Peter Stone & Manuela Veloso "A layered approach to Learning Client Behaviours in Robocup soccer"  AAI vol 12 -1998
     available at http://www.cs.cmu.edu/afs/cs/usr/pstone/public/papers/

    3.  Peter Stone & Manuela Veloso " Beating a Defender in a Robotic soccer: Memory based learning of continuous functions" Neural Information Processing  Systems -1995, avilable at  http://www.cs.cmu.edu/afs/cs/usr/pstone/public/papers/

    4. Peter Stone & Manuela Veloo " Using Decision Tree confidence Factors for Multi-agent control" In "RoboCup-97: The First Robot World Cup Soccer Games and Conferences", H. Kitano (ed.),  1998. Springer Verlag, Berlin. available at http://www.cs.cmu.edu/afs/cs/usr/pstone/public/papers/

    5. Peter Stone and Manuela Veloso " Task Decomposition and Dynamic Role Assignment for Real-Time Strategic Teamwork."  Fifth International Workshop on Agent Theories, Architectures, and Languages (ATAL'98 ) avialble at http://www.cs.cmu.edu/afs/cs/usr/pstone/public/papers/
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     

    Annotated Bibliography