Gradient Descent
Also known as steepest descent, or the method of steepest descent. It will help us find local minimum for cost functionIdea: start somewhere, keep changing to reduce cost function
Gradient Descent Algorithm:
Key concepts:
- Repeat until convergence, a threshold we decide to stop the algorithm ;)
- α = learning rate, how big of a step we take down hill in above image
- Simultaneously update a0 and a1
Here is an example where if you don't do simultaneous update, your algorithm will fail and/or output different result:
Let J(a0,a1)=a0a1, starting point a0,a1=(0,1), learning rate α=1
then J′(a0)=a1 and J′(a1)=a0
Correct simultaneous update:
temp0:=a0−1∗a1=0−1∗1=−1
temp1:=a1−1∗a0=0−1∗0=0
a0=temp0=−1
a1=temp1=0
Incorrect not simultaneous update:
temp0:=a0−1∗a1=0−1∗1=−1
a0=temp0=−1
temp1:=a1−1∗a0=0−1∗−1=1
a1=temp1=1
As you can see, values of a1 are not the same even after just one iteration
Resources:
- So when is convergence? http://stackoverflow.com/questions/17289082/gradient-descent-convergence-how-to-decide-convergence
Comments
Post a Comment