Artificial Intelligence (AI)
Artificial Intelligence (AI)
Artificial Intelligence (AI)
We know we want to find the values of w and b that correspond to the minimum of the
cost function (marked with the red arrow). To start finding the right values we
initialize w and b with some random numbers. Gradient descent then starts at that
point (somewhere around the top of our illustration), and it takes one step after another
in the steepest downside direction (i.e., from the top to the bottom of the illustration)
until it reaches the point where the cost function is as small as possible.
17. Calculate gradient descent
18. Why need gradient descent
The main reason we use gradient descent to optimize our models over analytical
optimization is that it’s generally just faster! Analytical solutions typically require
complex linear algebra operations, such as matrix inversion, which are very
computationally expensive to compute at large scales and can be numerically unstable.
As an example to demonstrate this, we’ll use OLS linear regression for its simplicity. In
this case, given a matrix of input observations, X, and target vector y, the linear OLS
solution is: