Gradient Descent

Gradient Descent

Also known as steepest descent, or the method of steepest descent. It will help us find local minimum for cost function
Idea: start somewhere, keep changing to reduce cost function

Gradient Descent Algorithm:



Key concepts:

  • Repeat until convergence, a threshold we decide to stop the algorithm ;)
  • \(\alpha\) = learning rate, how big of a step we take down hill in above image
  • Simultaneously update \(a_0\) and \(a_1\)


Here is an example where if you don't do simultaneous update, your algorithm will fail and/or output different result:

Let \(J(a_0, a_1) = a_0a_1\), starting point \(a_0, a_1 = (0, 1)\), learning rate \(\alpha = 1\)
then \(J^{'}(a_0) = a_1\) and \(J^{'}(a_1) = a_0\)

Correct simultaneous update:
\(temp_0 := a_0 - 1 * a_1 = 0 - 1 * 1 = -1 \)
\(temp_1 := a_1 - 1 * a_0 = 0 - 1 * 0 = 0 \)
\(a_0 = temp_0 = -1\)
\(a_1 = temp_1 = 0\)

Incorrect not simultaneous update:
\(temp_0 := a_0 - 1 * a_1 = 0 - 1 * 1 = -1 \)
\(a_0 = temp_0 = -1\)
\(temp_1 := a_1 - 1 * a_0 = 0 - 1 * -1 = 1 \)
\(a_1 = temp_1 = 1\)

As you can see, values of \(a_1\) are not the same even after just one iteration

Resources:



Comments