Gradient Descent

Also known as steepest descent, or the method of steepest descent. It will help us find local minimum for cost function
Idea: start somewhere, keep changing to reduce cost function

Gradient Descent Algorithm:

Key concepts:

Repeat until convergence, a threshold we decide to stop the algorithm ;)
\(\alpha\) = learning rate, how big of a step we take down hill in above image
Simultaneously update \(a_0\) and \(a_1\)

Here is an example where if you don't do simultaneous update, your algorithm will fail and/or output different result:

Let \(J(a_0, a_1) = a_0a_1\), starting point \(a_0, a_1 = (0, 1)\), learning rate \(\alpha = 1\)
then \(J^{'}(a_0) = a_1\) and \(J^{'}(a_1) = a_0\)

Correct simultaneous update:
\(temp_0 := a_0 - 1 * a_1 = 0 - 1 * 1 = -1 \)
\(temp_1 := a_1 - 1 * a_0 = 0 - 1 * 0 = 0 \)
\(a_0 = temp_0 = -1\)
\(a_1 = temp_1 = 0\)

Incorrect not simultaneous update:
\(temp_0 := a_0 - 1 * a_1 = 0 - 1 * 1 = -1 \)
\(a_0 = temp_0 = -1\)
\(temp_1 := a_1 - 1 * a_0 = 0 - 1 * -1 = 1 \)
\(a_1 = temp_1 = 1\)

As you can see, values of \(a_1\) are not the same even after just one iteration

Resources:

- So when is convergence? http://stackoverflow.com/questions/17289082/gradient-descent-convergence-how-to-decide-convergence

- https://www.coursera.org/learn/machine-learning/lecture/8SpIM/gradient-descent

Machine Learning

Search This Blog

Gradient Descent

Gradient Descent

Gradient Descent Algorithm:

Comments

Post a Comment