Learning Rate
\(a_j := a_j - \alpha\frac{\partial}{\partial_j}\)
Debugging gradient descent: how to make sure gradient descent is working correctly
The job of gradient descent is to minimize \(J(a)\) as a function of vector \(a\) (or \(\theta\)). So if we plot the graph of J against number of iterations, it should be decreasing after every iteration.
Automatic convergence test: declare convergence if \(J(a)\) decreases by less than \(10^{-3}\) in one iteration. However in practice, it's difficult to choose this threshold. Not recommended.
How to choose learning rate \(\alpha\):
- If \(J(a)\) is increasing or repeat, use smaller learning rate \(\alpha\).
- For sufficiently small enough \(\alpha\), \(J(a)\) should decrease on every iteration.
- If \(\alpha\) is too small, gradient descent can be slow to converge.
- To choose \(\alpha\), try: ..., 0.001 then 0.003 then 0.1 then 0.3 then 1, ... (roughly 3 times), then pick the one right before the largest possible/working learning rate
Resources:
Comments
Post a Comment