Implementation of Upper Confidence Bound Algorithm

Amit Ranjan
Analytics Vidhya

--

In this article we will visualize how UCB algorithm works for Multi-Armed Bandit Problem.

UCB Algorithm in Nutshell

In UCB Algorithm we start exploring all the machines at the initial phase and later when we find the machine with highest confidence bound we start exploiting it to get maximum rewards.

If you want to understand it better, you can see the Upper Confidence Bound for Multi-Armed Bandits Problem. Here, we will focus on algorithmic part and visualize it with the help of coding.

Let’s see what are the steps of UCB Algorithm.

Steps of UCB Algorithm

  1. At each round n, we consider two numbers for machine m.
    ->
    Nₘ(n) = number of times the machine m was selected up to round n.
    -> Rₘ(n) = number of rewards of the machine m up to round n.
  2. From these two numbers we have to calculate,
    a.
    The average reward of machine m up to round n, rₘ(n) = Rₘ(n) / Nₘ(n).
    b. The confidence interval [ rₘ(n) — Δₘ(n), rₘ(n)+Δₘ(n) ] at round n with, Δₘ(n)= sqrt( 1.5 * log(n) / Nₘ(n) )
  3. We select the machine m that has the maximum UCB, ( rₘ(n)+Δₘ(n) )

I know these might look confusing to you but let’s make it simpler!

This weekend I was very bored and went to my friend ( Nilesh ) home. After some chit-chat, we planned to watch a movie together. But the problem was which movie to watch. I suggested some sci-fi movies which he didn’t agreed with and he suggested some action movies which I didn’t agreed. Then he suggested that let’s finalize at least 5 movies and then we will decide which one to watch. I also agreed as we have suggested plenty of movies.

So we made our list of 5 movies:
1. Iron Man
2. Angel Has Fallen
3. Hobbs & Shaw
4. 6 Underground
5. Inception

Now, we again started fighting which one to watch as all of them has good IMDB ratings. Then his mom came and asked why we are fighting. We told her our problem. She thought for a second and said why don’t you ask your other friends which movie to watch. We thought it as a good idea. But calling everyone was not a good idea as it would be very time consuming. We made a google form and send it to all of our friends to fill. It almost took 15 minutes and we almost received 100 of reviews. Now we had a data of 100 different friends of movie which they would prefer to watch. The data contains binary values which suggest that for 0 they don’t prefer that movie and for 1 they would prefer.

Nilesh: Great bro! Let’s count the total for each movie and select the maximum one.

Me: We can do that bro but there is a problem.

Nilesh: What bro?

Me: Bro our friends have given their views on which movie they will prefer out of 5 not based on the rank that I like Iron Man most. I think we took wrong review.

Nilesh: Have you gone crazy? You are saying that we got wrong dataset. If you will ask all of them again to fill, they will probably ignore us this time.

Me: I am not saying that we gonna need another dataset. I think we need to visualize this data differently.

Nilesh: How?

Me: We need an unbiased decision as almost every review contains both of our choice movies that our friends have also liked.

Nilesh: Yeah but how we will do that? We can’t just ask mom to choose any movies from each of 100 reviews and see if our friends have liked it or not. Then count for each movies in end that if it is selected by mom and liked by our friend. Dude it would be better that we should drop our plan to watch movie and do something else.

Me: I know bro but why you want it to be done by mom. We can easily solve this problem using Upper Confidence Bound method which I was reading about.

Nilesh: How this problem is same as your Multi- Armed Bandits Problem?

Me: See, we have 5 movies and 100 movie reviews for those 5 movies. We want machine to explore through each reviews and randomly select one movie and check if it is liked by our friend or not. But at the same time we don’t want the machine to get biased at one movie and want it to give the chance to others also. Lastly, we want it to find that movie that is selected by maximum of our friends by exploiting the Upper Confidence Bound. We will get a good visualization too and we will watch the movie according to that.

Nilesh: Yes you got the point. Let’s see where it takes us.

Me: Yea bro.

We ran the following code on our dataset.

We finally got the visualization like this,

Movie 5 ( Inception ) got liked by most of our friends

Nilesh: Wow bro! I was also thinking to watch Inception in case it didn’t worked but the model also suggested the Inception movie itself.

Me: Truly bro! This is amazing. Let’s watch the Inception.

How did it really happened?

Let’s understand step by step. Before that I want you to open the algorithm steps in another tab.

  1. __init__ method:
    In this method we have done step 1 of the algorithm and define certain attributes.
    self.__N → total number of reviews
    self.__m → total number of movies
    self.__movie_selected → list of 100 movies selected at each round
    self.__number_of_selection → the number of times the movie m was selected up to round N.
    self.__sum_of_movie_rank → the sum of ranks of the movie m up to round N.
  2. implement_ucb method:
    In this method, we have implemented the step 2 and step 3 combined.
Step 2 of UCB Algorithm

In If condition we are checking for whether the movie is at least selected for one time or not. Because if it is not selected at least once then according to formula of Step 2(a) of the algorithm we will get infinity, as denominator becomes 0. If the movie is not selected at least once then it will set its upper bound to infinity ( here denoted with high value 1e500 ).

Otherwise, calculate the average rank of the movie from the formula in Step 2(a) and calculate the upper confidence from the formula in Step 2(b).

Now, we have implemented step 2 let’s move to step 3.

Step 3

In Step 3 we will compare the computed upper bound with maximum upper bound if it is greater than max_upper_bound then make upper bound as the maximum upper bound and selected movie as current movie.

This is how UCB is implemented. There can be multiple ways with which one can implement. Feel free to share if you solve with another way.

--

--