Operant Conditioning-Learning
Operant Conditioning-Learning
Conditioning
LEARNING BASED ON CONSEQUENCES
B.F. Skinner
focuses on how behaviors are influenced by the consequences that follow them.
This theory is often associated with the concept of reinforcement and punishment.
Skinner’s work is based on the law of effect by Thorndike.
Law of effect: The probability that a particular stimulus will repeatedly elicit a particular learned
response depends on the perceived consequences of the response.
Components of Operant
Conditioning
Behavior: Operant conditioning deals with voluntary behaviors. These are actions that an
individual chooses to perform, such as pressing a button, raising a hand, or speaking a word.
Reinforcement: Reinforcement refers to the process of strengthening a behavior by following it
with a consequence that increases the likelihood of that behavior being repeated in the future.
There are two main types of Reinforcement:
Positive Reinforcement: This involves presenting a desirable stimulus (such as praise, rewards,
or treats) after a behavior occurs, making it more likely that the behavior will be repeated.
Negative Reinforcement: Negative reinforcement involves removing an aversive stimulus (such
as removing pain or discomfort) after a behavior occurs, also increasing the likelihood of the
behavior being repeated.
Primary and Conditioned
reinforcers
Primary Reinforcers: Related to biological needs: Food, water, sexual pleasures.
Conditioned Reinforcers: Money, status, grades, trophies and praises.
Premack Principle: The Premack Principle, often referred to as "Grandma's Law," is a
psychological concept in operant conditioning developed by David Premack in the 1960s. It
states that a more probable behavior can reinforce a less probable behaviour.
In other words, if a person or animal is given the opportunity to engage in a preferred or high-
probability activity, they will be more willing to engage in a less preferred or low-probability
activity as a consequence.
Premack Principle: Example
Studying and Socializing:
College students can apply the Premack Principle by allowing themselves to socialize with
friends only after completing a certain amount of studying or coursework.
Exercise and Watching Movies:
A person who loves watching movies but wants to maintain a regular exercise routine can use
the Premack Principle by allowing themselves to watch a movie as a reward after completing
their workout.
Example: Premack Principle
For Example: A parent wants their child to finish their vegetables (less preferred activity)
before they can have dessert (more preferred activity).
The child is presented with a plate of vegetables (less preferred) and is informed that they must
finish eating them before they can have a piece of chocolate cake (more preferred).
The opportunity to eat the chocolate cake serves as a reinforcement (reward) for completing the
less preferred activity, which is eating the vegetables.
The child, motivated by the prospect of enjoying the dessert, is more likely to eat the
vegetables, even if they initially didn't want to.
Examples of Positive and Negative
reinforcement
Positive Reinforcement
Employee Bonus for Meeting Sales Targets: In a sales job, if an employee meets or exceeds
their monthly sales target (the desired behavior), they receive a cash bonus (the positive
reinforcer). The bonus encourages them to continue striving for high sales performance.
Doing Homework to Avoid Grounding: A teenager who has not completed their homework
(undesired behavior) is told by their parent that they can avoid being grounded (the aversive
consequence) if they finish their assignments (the desired behavior). Completing the homework
removes the threat of being grounded, reinforcing the completion of homework through
negative reinforcement.
Skinner’s Experiment
B.F. Skinner proposed his theory on operant conditioning by conducting various experiments on
animals. He used a special box known as the “Skinner Box” for his experiment on rats.
Skinner Box Setup: The Skinner box is a small enclosure with a lever or bar that the rat can
press, a food dispenser, and a way to deliver reinforcement (food) when the lever is pressed.
There's also a mechanism to record the rat's responses.
Initial Phase: At the beginning of the experiment, the rat is placed inside the Skinner box.
Initially, the rat is likely to explore its surroundings and accidentally press the lever.
Reinforcement: When the rat presses the lever by chance, a small food pellet is delivered into a
dish in the box. This food serves as a positive reinforcer because it strengthens the likelihood of
the rat repeating the behavior (lever-pressing).
Operant Conditioning: Over time, the rat learns that pressing the lever results in receiving a
food pellet. As a result, it begins to press the lever more frequently.
Shaping: Skinner often used a technique called shaping, where he reinforced behaviors that
were closer and closer to the desired behavior. For example, if the rat initially only touched the
lever, it might receive reinforcement for that. Then, it would only receive reinforcement when it
pressed the lever slightly, and so on. This process shapes the rat's behavior towards the desired
response.
Negative Reinforcement: A rat in a Skinner’s box that was given negative reinforcement might
have an electric shock turned off if they press a lever, and should also learn to press the lever
more often.
Here, the action of pressing the lever is an operant response/behavior, and the food released
inside the chamber is the reward. The experiment is also known as Instrumental Conditioning
Learning as the response is instrumental in getting food.
Schedules of Reinforcement
Schedules of reinforcement refer to the patterns or rules that determine how and when a
reinforcement (a reward or consequence) is delivered in response to a specific behaviour.
Continuous Reinforcement
Partial Reinforcement
Fixed Ratio
Variable Ratio
Fixed Interval
Variable Interval
Continuous Schedule of
Reinforcement
A continuous schedule of reinforcement involves reinforcing a behavior every time it occurs.
Because this reinforcement occurs every time the behavior is displayed, the learner can form an
association between the behavior and the consequence of that behavior quite quickly.
For example, when training a dog to sit, you would start by providing a treat every single time
the dog sits after you give the command.
Partial reinforcement
Partial (or intermittent) schedules of reinforcement do not reinforce every instance of a
behaviour.
Instead, reinforcement is given periodically.
It might be delivered after a certain number of responses or after a certain amount of time has
elapsed.
Fixed-Ratio Schedule
In a fixed-ratio schedule of reinforcement, reinforcement is delivered after a fixed number of
responses. For example, a rat would have to press a button 10 times to receive a food pellet.
Example: Sales Commissions:
Imagine you work as a salesperson, and your employer has set up a Fixed Ratio (FR-10)
reinforcement schedule for your commissions. This means that you receive a commission for
every 10 products you sell.
In this scenario:
For the first 9 products you sell, you don't receive any commission.
However, when you sell the 10th product, you receive a commission on all 10 sales.
Fixed Interval Schedule
In a fixed-interval schedule of reinforcement, a behavior is reinforced after a fixed period has
elapsed.
For example, a rat would have to wait five minutes before pressing the button would deliver a
good pellet.
Weekly Paycheck:
Imagine a job that pays you on a Fixed Interval (FI-7 days) schedule, meaning you receive your
paycheck every 7 days. It doesn't matter how much work you do during those 7 days; you will
receive your paycheck once a week, precisely on the 7th day.
In this scenario:
You start working on Day 1, and for the next 6 days, you work diligently without any immediate
rewards.
On the 7th day (exactly one week after you started), you receive your paycheck as reinforcement.
Here's how this Fixed Interval reinforcement schedule plays out:
Day 1 to Day 6: You work without any reinforcement (no paycheck).
Day 7: You receive your paycheck.
Variable-ratio schedule of
reinforcement
In a variable-ratio schedule of reinforcement, a behaviour is reinforced after a varied,
unpredictable number of responses.
For example, a rat might be rewarded with a food pellet after 3 responses, then after 8, then 2,
then 10.
Variable-Interval schedule of
Reinforcement
In a variable interval schedule of reinforcement, behaviour is reinforced after an unpredictable
period of time has passed.
For example, a rat might be rewarded with a food pellet for the first response after a random
amount of time has elapsed.
Stimulus-Control
Stimulus control is a term used to describe situations in which a behavior is triggered by the
presence or absence of some stimulus. If a person always eats when watching TV, then (in the
operant conditioning use of the term) eating behavior is controlled by the stimulus of watching TV.
Behaviour Modification
Behavior modification, also known as behavior therapy or applied behavior analysis, is a
therapeutic approach that focuses on changing an individual's behavior through various
techniques and principles of learning and conditioning.
The primary goal of behavior modification is to encourage positive behaviors and reduce or
eliminate undesirable behaviors.
Systematic Desensitization Technique
Exposure to the Fear Hierarchy: The individual is exposed to the fear hierarchy in a systematic
and gradual manner, starting with the least anxiety-provoking item on the list.
For example, if the fear is of flying, they might begin by looking at pictures of aeroplanes while
maintaining relaxation. Once they can do this without experiencing significant anxiety, they
move on to the next item on the hierarchy, and so on.
Pairing Relaxation with Exposure: During each exposure, the individual practices the relaxation
techniques they've learned to maintain a state of relaxation. This is done while confronting the
feared stimulus or situation. The pairing of relaxation with the feared stimulus helps to replace
the anxiety response with a relaxation response.
Repetition and Gradual Progression: The process is repeated for each item on the fear hierarchy,
moving from least anxiety-provoking to most anxiety-provoking. As the individual gains
confidence and experiences reduced anxiety, they progress through the hierarchy.
Maintenance and Generalization: Once the individual can comfortably confront the most
anxiety-provoking situations on the hierarchy, they continue to practice relaxation and exposure
to maintain their progress and generalize their newfound skills to real-life situations.