Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

Logistic Regression_ Gradient Descent_ Example

The document describes a step-by-step process for updating weights and bias in a binary classification model using a dataset with four samples. It includes calculations for the forward pass, binary cross-entropy cost, gradients, and updates for weights and bias across multiple samples. After one epoch, the final updated parameters are a weight of 0.0077 and a bias of 0.0561.

Uploaded by

manchestermilf1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Logistic Regression_ Gradient Descent_ Example

The document describes a step-by-step process for updating weights and bias in a binary classification model using a dataset with four samples. It includes calculations for the forward pass, binary cross-entropy cost, gradients, and updates for weights and bias across multiple samples. After one epoch, the final updated parameters are a weight of 0.0077 and a bias of 0.0561.

Uploaded by

manchestermilf1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Example (1)

Assume
• We have ane data point, with feature 𝒙 = 𝟎. 𝟓𝐱
• Target label 𝒚 = 𝟏.
• Initial weights 𝒘 = 𝟎. 𝟐
• Initial bias 𝒃 = 𝟎. 𝟏.
• Learning rate 𝜶 = 𝟎. 𝟏.

---------------------------------------------------------------------------------------------------------------------
Step 1: Forward Pass
1 Calculate the linear combination =
𝑧 = 𝑤 ⋅ 𝑥 + 𝑏 = 0.2 ⋅ 0.5 + 0.1 = 0.2
2 Apply the sigmoid function 𝜎(𝑧) to get the prediction 𝑗ˆ.
1 1
𝑦ˆ = 𝜎(𝑧) = −𝑧
= ≈ 0.5498
1+𝑒 1 + 𝑒 −0.2

Step 2: Compute the Cost (Binary Cross-Entropy)


The Binary Cross-Entropy ( 𝐵𝐶 ) cost function for one data point is:
BCE = −(𝑦 ⋅ log⁡(𝑦) + (1 − 𝑦) ⋅ log⁡(1 − 𝑦ˆ))
Plugging in 𝑦 = 1 and 𝑦ˆ ≈ 0.508 :
BCE ≈ −(1 ⋅ log⁡(0.5498) + (1 − 1) ⋅ log⁡(1 − 0.5498)) ≈ −log⁡(0.5498) ≈ 0.5981

Step 3: Compute Gradients


To update the weights, we need the gradients of the BCE cost with respect to 𝑤 and 𝑏.
1 Gradient with respect to ul:
∂BCE
= (𝑦ˆ − 𝑦) − 𝑥 = (0.5498 − 1) ⋅ 0.5 = −0.2251
∂ℏ𝑤
2 Gradient with respect to 𝑘 :
∂BCE
= 𝑦ˆ − 𝑦 = 0.5498 − 1 = −0.4502
∂𝑏

Step 4: Update Weights and Bias


Using the learning rate 𝛼 = 0.1, we update 𝑤 and 𝑏 as follows:
1 Update u:
∂BCE
𝑤 =𝑤−𝛼− = 0.2 − 0.1 ⋅ (−0.2251) = 0.2 + 0.0225 = 0.2225
∂𝜗𝑤
2 Update b:
∂BCE
𝑏 =𝑏−𝛼⋅ = 0.1 − 0.1 ⋅ (−0.4502) = 0.1 + 0.0450 = 0.1450
∂𝑏
Summary of Updated Parameters
After one iteration, the updated weights and bias are:
• 𝑢 = 0.2225
• 𝑏 = 0.1450
Example (2)

The dataset with four samples:


Sample 𝑥 𝑦
1 0.5 1
2 1.5 0
3 2.0 1
4 3.0 0

Initial Conditions:
• Initial weight 𝑤 = 0.2
• Initial bias 𝑏 = 0.1
• Learning rate 𝛼 = 0.1
Goal:
We'll update the weights for each sample and go through one epoch of training.
---------------------------------------------------------------------------------------------------------------------

Step 1: Forward Pass, Prediction, and Cost Calculation


For each sample, we'll calculate the prediction 𝑦ˆ and the Binary Cross-Entropy cost.
Sample 1:
1 Calculate the linear combination 𝑧 :
𝑧 = 𝑤 ⋅ 𝑥 + 𝑏 = 0.2 ⋅ 0.5 + 0.1 = 0.2
2 Apply the sigmoid function to get 𝑦ˆ :
1
𝑦ˆ = 𝜎(𝑧) = ≈ 0.5498
1 + 𝑒 −0.2
3 Compute the BCE Cost:
BCE = −(𝑦 ⋅ log⁡(𝑦ˆ) + (1 − 𝑦) ⋅ log⁡(1 − 𝑦ˆ))
With 𝑦 = 1 and 𝑦ˆ ≈ 0.5498 :
BCE ≈ −log⁡(0.5498) ≈ 0.5981
Sample 2:
1 Calculate 𝑧 :
𝑧 = 𝑤 ⋅ 𝑥 + 𝑏 = 0.2 ⋅ 1.5 + 0.1 = 0.4
2 Apply the sigmoid to get 𝑦ˆ :
1
𝑦ˆ = 𝜎(𝑧) = ≈ 0.5987
1 + 𝑒 −0.4
3 Compute the BCE Cost: With 𝑦 = 0 and 𝑦ˆ ≈ 0.5987 :
BCE ≈ −log⁡(1 − 0.5987) ≈ 0.9130

Sample 3:
1 Calculate 𝑧 :
𝑧 = 𝑤 ⋅ 𝑥 + 𝑏 = 0.2 ⋅ 2.0 + 0.1 = 0.5
2 Apply the sigmoid to get 𝑦ˆ :
1
𝑦ˆ = 𝜎(𝑧) = ≈ 0.6225
1 + 𝑒 −0.5
3 Compute the BCE Cost: With 𝑦 = 1 and 𝑦ˆ ≈ 0.6225 :
BCE ≈ −log⁡(0.6225) ≈ 0.4741
Sample 4:
1 Calculate 𝑧 :
𝑧 = 𝑤 ⋅ 𝑥 + 𝑏 = 0.2 ⋅ 3.0 + 0.1 = 0.7
2 Apply the sigmoid to get 𝑦ˆ :
1
𝑦ˆ = 𝜎(𝑧) = ≈ 0.6682
1 + 𝑒 −0.7
3 Compute the BCE Cost: With 𝑦 = 0 and 𝑦ˆ ≈ 0.6682 :
BCE ≈ −log⁡(1 − 0.6682) ≈ 1.1015

Step 2: Compute Gradients for Each Sample


Now we'll compute the gradients of the BCE cost with respect to 𝑤 and 𝑏 for each sample.
Sample 1:
1 Gradient with respect to 𝑤 :
∂BCE
= (𝑦ˆ − 𝑦) ⋅ 𝑥 = (0.5498 − 1) ⋅ 0.5 = −0.2251
∂𝑤
2 Gradient with respect to 𝑏 :
∂BCE
= 𝑦ˆ − 𝑦 = 0.5498 − 1 = −0.4502
∂𝑏
Sample 2:
1 Gradient with respect to 𝑤 :
∂BCE
= (𝑦ˆ − 𝑦) ⋅ 𝑥 = (0.5987 − 0) ⋅ 1.5 = 0.8980
∂𝑤
2 Gradient with respect to 𝑏 :
∂BCE
= 𝑦ˆ − 𝑦 = 0.5987 − 0 = 0.5987
∂𝑏
Sample 3:
1 Gradient with respect to 𝑤 :
∂BCE
= (𝑦ˆ − 𝑦) ⋅ 𝑥 = (0.6225 − 1) ⋅ 2.0 = −0.755
∂𝑤
2 Gradient with respect to 𝑏 :
∂BCE
= 𝑦ˆ − 𝑦 = 0.6225 − 1 = −0.3775
∂𝑏
Sample 4:
1 Gradient with respect to 𝑤 :
∂BCE
= (𝑦ˆ − 𝑦) ⋅ 𝑥 = (0.6682 − 0) ⋅ 3.0 = 2.0046
∂𝑤
2 Gradient with respect to 𝑏 :
∂BCE
= 𝑦ˆ − 𝑦 = 0.6682 − 0 = 0.6682
∂𝑏

Step 3: Update Weights and Bias


Using the gradients and learning rate, we update 𝑤 and 𝑏 for each sample.
After Sample 1 Update:
1 Update 𝑤 :
∂BCE
𝑤 =𝑤−𝛼⋅ = 0.2 − 0.1 ⋅ (−0.2251) = 0.2 + 0.0225 = 0.2225
∂𝑤
2 Update 𝑏 :
∂BCE
𝑏 =𝑏−𝛼⋅ = 0.1 − 0.1 ⋅ (−0.4502) = 0.1 + 0.0450 = 0.1450
∂𝑏
After Sample 2 Update:
1 Update 𝑤 :
𝑤 = 0.2225 − 0.1 ⋅ 0.8980 = 0.2225 − 0.0898 = 0.1327
2 Update 𝑏 :
𝑏 = 0.1450 − 0.1 ⋅ 0.5987 = 0.1450 − 0.0599 = 0.0851

After Sample 3 Update:


1 Update 𝑤 :
𝑤 = 0.1327 − 0.1 ⋅ (−0.755) = 0.1327 + 0.0755 = 0.2082
2 Update 𝑏 :
𝑏 = 0.0851 − 0.1 ⋅ (−0.3775) = 0.0851 + 0.03775 = 0.1229
After Sample 4 Update:
1 Update 𝑤 :
𝑤 = 0.2082 − 0.1 ⋅ 2.0046 = 0.2082 − 0.2005 = 0.0077
2 Update 𝑏 :
𝑏 = 0.1229 − 0.1 ⋅ 0.6682 = 0.1229 − 0.0668 = 0.0561
Summary of Updated Parameters
After one epoch, the updated weights and bias are:
• 𝑤 = 0.0077
• 𝑏 = 0.0561

You might also like