Gradient Descent - Intuition

Gradient Descent - Intuition


θ1:=θ1αddθ1J(θ1)

The derivative dda1 (if has 1 variable) or partial derivative σσa1 (if has more than 1 variables) part in gradient descent formula is the slope of the tangent line at the point (a1,J(a1)). The minus sign in the formula is very important. It will either decrease or increase current value of a to help us get closer and closer to the minimum of cost function.

  • If the Learning rate α is too small, gradient descent can be slow.
  • If the Learning rate α is too large, gradient descent can over shoot the minimum and/or fail to converge
If  a1 is at a local optimum of J(a1), one step of gradient descent will leave a1 unchanged

Gradient descent can converge to a local minimum, even with the learning rate α fixed. This is because we're taking smaller and smaller steps as we approach local optimum.

Resources:

Comments