Gradient Descent
Also known as steepest descent, or the method of steepest descent. It will help us find local minimum for cost functionIdea: start somewhere, keep changing to reduce cost function
Gradient Descent Algorithm:
Key concepts:
- Repeat until convergence, a threshold we decide to stop the algorithm ;)
- \(\alpha\) = learning rate, how big of a step we take down hill in above image
- Simultaneously update \(a_0\) and \(a_1\)
Here is an example where if you don't do simultaneous update, your algorithm will fail and/or output different result:
Let \(J(a_0, a_1) = a_0a_1\), starting point \(a_0, a_1 = (0, 1)\), learning rate \(\alpha = 1\)
then \(J^{'}(a_0) = a_1\) and \(J^{'}(a_1) = a_0\)
Correct simultaneous update:
\(temp_0 := a_0 - 1 * a_1 = 0 - 1 * 1 = -1 \)
\(temp_1 := a_1 - 1 * a_0 = 0 - 1 * 0 = 0 \)
\(a_0 = temp_0 = -1\)
\(a_1 = temp_1 = 0\)
Incorrect not simultaneous update:
\(temp_0 := a_0 - 1 * a_1 = 0 - 1 * 1 = -1 \)
\(a_0 = temp_0 = -1\)
\(temp_1 := a_1 - 1 * a_0 = 0 - 1 * -1 = 1 \)
\(a_1 = temp_1 = 1\)
As you can see, values of \(a_1\) are not the same even after just one iteration
Resources:
- So when is convergence? http://stackoverflow.com/questions/17289082/gradient-descent-convergence-how-to-decide-convergence
Comments
Post a Comment