Introduction to Thompson Sampling | Reinforcement Learning

Last Updated : 22 Apr, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Reinforcement Learning is a branch of Machine Learning, also called Online Learning. It is used to decide what action to take at t+1 based on data up to time t. This concept is used in Artificial Intelligence applications such as walking. A popular example of reinforcement learning is a chess engine. Here, the agent decides upon a series of moves depending on the state of the board (the environment), and the reward can be defined as a win or lose at the end of the game. 

Thompson Sampling (Posterior Sampling or Probability Matching) is an algorithm for choosing the actions that address the exploration-exploitation dilemma in the multi-armed bandit problem. Actions are performed several times and are called exploration. It uses training information that evaluates the actions taken rather than instructs by giving correct actions. This is what creates the need for active exploration, for an explicit trial-and-error search for good behavior. Based on the results of those actions, rewards (1) or penalties (0) are given for that action to the machine. Further actions are performed in order to maximize the reward that may improve future performance. Suppose a robot has to pick several cans and put them in a container. Each time it puts the can to the container, it will memorize the steps followed and train itself to perform the task with better speed and precision (reward). If the Robot is not able to put the can in the container, it will not memorize that procedure (hence speed and performance will not improve) and will be considered as a penalty. 

Thompson Sampling has the advantage of the tendency to decrease the search as we get more and more information, which mimics the desirable trade-off in the problem, where we want as much information as possible in fewer searches. Hence, this Algorithm has a tendency to be more “search-oriented” when we have fewer data and less “search-oriented” when we have a lot of data. 

Multi-Armed Bandit Problem 
Multi-armed Bandit is synonymous with a slot machine with many arms. Each action selection is like a play of one of the slot machine’s levers, and the rewards are the payoffs for hitting the jackpot. Through repeated action selections you are to maximize your winnings by concentrating your actions on the best levers. Each machine provides a different reward from a probability distribution over the mean reward specific to the machine. Without knowing these probabilities, the gambler has to maximize the sum of reward earned through a sequence of arms pull. If you maintain estimates of the action values, then at any time step there is at least one action whose estimated value is greatest. We call this a greedy action. The analogy to this problem can be advertisements displayed whenever the user visits a webpage. Arms are ads displayed to the users each time they connect to a web page. Each time a user connects to the page makes around. At each round, we choose one ad to display to the user. At each round n, ad I gives reward ri(n) ε {0, 1}: ri(n)=1 if the user clicked on the ad i, 0 if the user didn’t. The goal of the algorithm will be to maximize the reward. Another analogy is that of a doctor choosing between experimental treatments for a series of seriously ill patients. Each action selection is a treatment selection, and each reward is the survival or well-being of the patient. 

Algorithm 
 

Some Practical Applications 

  • Netflix Item based recommender systems: Images related to movies/shows are shown to users in such a way that they are more likely to watch it.
  • Bidding and Stock Exchange: Predicting Stocks based on Current data of stock prices.
  • Traffic Light Control: Predicting the delay in the signal.
  • Automation in Industries: Bots and Machines for transporting and Delivering items without human intervention.
  • Robotics: Reinforcement learning is used in robotics for motion planning, grasping objects, and controlling the robot’s movement. It enables robots to learn from experience and make decisions based on their environment.
  • Game AI: Reinforcement learning has been used to train AI agents to play games like Chess, Go, and Poker. It has been used to develop game bots that can compete against human players.
  • Natural Language Processing (NLP): Reinforcement learning is used in NLP to train chatbots and virtual assistants to provide personalized responses to users. It enables chatbots to learn from user interactions and improve their responses over time.
  • Advertising: Reinforcement learning is used in advertising to optimize ad placements and target audiences. It enables advertisers to learn which ads perform best and adjust their campaigns accordingly.
  • Finance: Reinforcement learning is used in finance for portfolio management, fraud detection, and risk assessment. It enables financial 

 


Previous Article
Next Article

Similar Reads

ML | Reinforcement Learning Algorithm : Python Implementation using Q-learning
Prerequisites: Q-Learning technique. Reinforcement Learning is a type of Machine Learning paradigms in which a learning algorithm is trained not on preset data but rather based on a feedback system. These algorithms are touted as the future of Machine Learning as these eliminate the cost of collecting and cleaning the data. In this article, we are
6 min read
Neural Logic Reinforcement Learning - An Introduction
Neural Logic Reinforcement Learning is an algorithm that combines logic programming with deep reinforcement learning methods. Logic programming can be used to express knowledge in a way that does not depend on the implementation, making programs more flexible, compressed and understandable. It enables knowledge to be separated from use, ie the mach
3 min read
Reinforcement learning
Reinforcement learning is an area of Machine Learning. It is about taking suitable action to maximize reward in a particular situation. It is employed by various software and machines to find the best possible behavior or path it should take in a specific situation. Reinforcement learning differs from supervised learning in a way that in supervised
8 min read
Genetic Algorithm for Reinforcement Learning : Python implementation
Most beginners in Machine Learning start with learning Supervised Learning techniques such as classification and regression. However, one of the most important paradigms in Machine Learning is Reinforcement Learning (RL) which is able to tackle many challenging tasks. It is an aspect of Machine learning where an agent learns to behave in an environ
5 min read
SARSA Reinforcement Learning
Prerequisites: Q-Learning techniqueSARSA algorithm is a slight variation of the popular Q-Learning algorithm. For a learning agent in any Reinforcement Learning algorithm it's policy can be of two types:- On Policy: In this, the learning agent learns the value function according to the current action derived from the policy currently being used.Off
5 min read
Epsilon-Greedy Algorithm in Reinforcement Learning
In Reinforcement Learning, the agent or decision-maker learns what to do—how to map situations to actions—so as to maximize a numerical reward signal. The agent is not explicitly told which actions to take, but instead must discover which action yields the most reward through trial and error. Multi-Armed Bandit Problem The multi-armed bandit proble
4 min read
Upper Confidence Bound Algorithm in Reinforcement Learning
In Reinforcement learning, the agent or decision-maker generates its training data by interacting with the world. The agent must learn the consequences of its actions through trial and error, rather than being explicitly told the correct action. Multi-Armed Bandit Problem In Reinforcement Learning, we use Multi-Armed Bandit Problem to formalize the
6 min read
Expected SARSA in Reinforcement Learning
Prerequisites: SARSASARSA and Q-Learning technique in Reinforcement Learning are algorithms that uses Temporal Difference(TD) Update to improve the agent's behaviour. Expected SARSA technique is an alternative for improving the agent's policy. It is very similar to SARSA and Q-Learning, and differs in the action value function it follows. Expected
9 min read
Understanding Reinforcement Learning in-depth
The subject of reinforcement learning has absolutely grown in recent years ever since the astonishing results with old Atari games deep Minds victory with AlphaGo stunning breakthroughs in robotic arm manipulation which even beats professional players at 1v1 dota. Since the impressive breakthrough in the ImageNet classification challenge in 2012, s
13 min read
Sparse Rewards in Reinforcement Learning
Prerequisite: Understanding Reinforcement Learning in-depth In the previous articles, we learned about reinforcement learning, as well as the general paradigm and the issues with sparse reward settings. In this article, we'll dive a little further into some more technical work aimed at resolving the sparse reward setting problem. The fact that we'r
15+ min read