Gradient Descent

Gradient Descent

Also known as steepest descent, or the method of steepest descent. It will help us find local minimum for cost function
Idea: start somewhere, keep changing to reduce cost function

Gradient Descent Algorithm:



Key concepts:

  • Repeat until convergence, a threshold we decide to stop the algorithm ;)
  • α = learning rate, how big of a step we take down hill in above image
  • Simultaneously update a0 and a1


Here is an example where if you don't do simultaneous update, your algorithm will fail and/or output different result:

Let J(a0,a1)=a0a1, starting point a0,a1=(0,1), learning rate α=1
then J(a0)=a1 and J(a1)=a0

Correct simultaneous update:
temp0:=a01a1=011=1
temp1:=a11a0=010=0
a0=temp0=1
a1=temp1=0

Incorrect not simultaneous update:
temp0:=a01a1=011=1
a0=temp0=1
temp1:=a11a0=011=1
a1=temp1=1

As you can see, values of a1 are not the same even after just one iteration

Resources:



Comments