Which utility is being maximized?

Satyadev Nandakumar
Fri Feb 23 17:13:38 2018

Suppose we are betting on n horses, (h1, ..., hn) each of which are publicly expected to have an equal probability of winning the race. Thus the publicly believed probability of any particular horse winning is

	1
       ---
        n
      

Privately, we come to know that the real probabilty of the horses winning is given by the probability vector

        (q , ..., q )
          1        n 
      

How should we best exploit this information?

Fair Betting

In a fair betting scenario (mathematically, a martingale), if you bet N/n rupees on a horse, and it wins, then according to the publicly announced probability, you will receive N rupees.

The above reward scheme is fair in the sense that your expected win amount is N, which is the original amount you had.

No matter which horse wins, you get rewarded N rupees, thus you neither win nor lose any money. Kind of dumb, but at least you emerge unscathed.

Exploiting a skewed distribution

Maximizing Gain

Suppose we have N rupees with us, and we have the inside information about the winning probabilities. Our initial goal is modest - if a more probable horse wins, we must end with more than N rupees. How do we do this?

  1. Strategy 1: If one of the qk is 1, clearly you put all your money on that horse. You get Nn after the race, with probability 1.
  2. Strategy 2: Suppose we bet Nqk on horse hk. Some horse has to win, say the mth horse does. Then the money we get is
    	  N n q
                   m ,
              
    which is greater than N if qm>1n.

    This strategy is beneficial in scenarios like the following: at least one of the qks are 0, and the remaining coordinates are equal. Intuitively, this corresponds to the knowledge that some horse is never going to win, and all the remaining horses are equally likely to win.

    When qm<1n, then the win amount will be less than N. Here, we end up with less than what we had in the "no knowledge scenario", where we end up with N rupees even in this outcome.

Question. Which do you prefer? In the blind allocation strategy, no matter which horse wins, you retain your money at the same level as before. In the strategy with insider information, you may win much more than your original capital if a more probable horse wins, but lose a lot of money if a lesser probability horse wins.

In the next section, we will justify why this allocation is in some sense the best we can do, even though it seems that we are worse off when the less probable horse wins.

Gains for the Gambling Addict : Logarithmic utility and the Kullback-Leibler Divergence

Of course, you might lose money when you go to the races only once. However, no respectable gambler visits the races once and stops. He would like to bet race after race, and over time, make money, if not in all possible outcome sequences, at least in an overwhelming majority of them. He would also want to maximize the money he gains. We enter the region of asymptotic gains.

Let us simplify the problem to (q, 1-q), where q < 1/2, and consider the constant strategy of allocating fraction 'a' of our current capital on the first horse, and fraction '1-a' on the second. What should be 'a' if we want to maximize our gain over time?

After a typical sequence of n races, we expect the first horse to win nq times, and the second horse to win n(1-q) times. Our capital after n races is

              q n          (1-q)n        n  qn    (1-q)n
       [N a 2]    [N(1-a)2]        = [2N]  a  (1-a).
    

The above quantity is maximized when S= aq (1-a)(1-q) is maximized. S is maximized when we maximize

       ln S = q ln a + (1-q) ln (1-a)
    
is maximum. Differentiating both sides wrt a and equating to 0, we get q=a.

Note: ln S = H(q) - D((q,1-q)||(a,1-a)), where H(q) is the entropy of the probability distribution (q,1-q) and D is the Kullback Leibler divergence between the two probability distributions. So maximizing money amounts to minimizing the Kullback-Leibler divergence between (q,1-q) and (a,1-a).

Thus the lens of asymptotic maximal gain justifies why in the previous section, we alloted Nqk to the kth horse. This is what minimizes the Kullback-Leibler divergence between the bet allocation vector (a1,...,an) and the actual probability vector (q1,...,qn). Over a long typical run, this is the unique allocation scheme that will gain us the most amount of money.

Note 2: Minimizing the KL divergence may be related to the famous Kelly criterion, but I do not understand the exact relationship.

Minimizing Loss

Our goal is: if a more probable horse wins, we must end up with more than N, and if a less probable horse wins, we must end up with at least 0.5N, let's say. So we don't want to lose too much, we are okay if we have to balance this by not winning as much as Strategy 2 on the more probable horses.

  1. Strategy 3: If qk < 1/n, then allocate

               0.5 N    q_k
              --------
              min{q_i}
            
    Clearly, on a less probable horse,
                 0.5 N    q_k
              ---------       x  n       >=  0.5N.
              min{q_i}n
            

    What happens when a more probable horse wins? Do we still win more than N? We can show after a brief calculation that if

    	           2              0.5
             (min {q })         >   ---------,
                    i                n(1-1/q)
           
    then we win more than N on a winning horse. (If this is not satisfied, we may have to reduce 0.5 to a smaller constant.)


Last Modified: Fri Feb 23 17:13:50 2018