This plot shows the performance of learning algorithm for 1 serve over 20,000 trials.
Plota where

a=1 refers to successful shots => reaching the ball
a=2 refers to completely successful shots => hitting a correct shot as well

Here

Gamma = 0.9
Lambda = 0.95
Reward Function = 10 for complete success

= 5 for partial success

= -1 for failure

= 0 otherwise