Gradient Descent - Intuition
\(\theta_1:=\theta_1-\alpha \frac{d}{d\theta_1} J(\theta_1)\)
The derivative \(\frac{d}{d a_1}\) (if has 1 variable) or partial derivative \(\frac{\sigma}{\sigma a_1}\) (if has more than 1 variables) part in gradient descent formula is the slope of the tangent line at the point \((a_1, J(a_1))\). The minus sign in the formula is very important. It will either decrease or increase current value of \(a\) to help us get closer and closer to the minimum of cost function.
- If the Learning rate \(\alpha\) is too small, gradient descent can be slow.
- If the Learning rate \(\alpha\) is too large, gradient descent can over shoot the minimum and/or fail to converge
If \(a_1\) is at a local optimum of \(J(a_1)\), one step of gradient descent will leave \(a_1\) unchanged
Gradient descent can converge to a local minimum, even with the learning rate \(\alpha\) fixed. This is because we're taking smaller and smaller steps as we approach local optimum.
Resources:
Comments
Post a Comment