Gradient Descent - Intuition

$\theta_1:=\theta_1-\alpha \frac{d}{d\theta_1} J(\theta_1)$

The derivative

$\frac{d}{d a_1}$ (if has 1 variable) or partial derivative

$\frac{\sigma}{\sigma a_1}$ (if has more than 1 variables) part in gradient descent formula is the slope of the tangent line at the point

$(a_1, J(a_1))$ . The minus sign in the formula is very important. It will either decrease or increase current value of

$a$ to help us get closer and closer to the minimum of cost function.

If the Learning rate $\alpha$ is too small, gradient descent can be slow.
If the Learning rate $\alpha$ is too large, gradient descent can over shoot the minimum and/or fail to converge

$a_1$ is at a local optimum of

$J(a_1)$ , one step of gradient descent will leave

$a_1$ unchanged

Gradient descent can converge to a local minimum, even with the learning rate

$\alpha$ fixed. This is because we're taking smaller and smaller steps as we approach local optimum.

Resources:

Machine Learning