Learning Rate

\(a_j := a_j - \alpha\frac{\partial}{\partial_j}\)

Debugging gradient descent: how to make sure gradient descent is working correctly

The job of gradient descent is to minimize \(J(a)\) as a function of vector \(a\) (or \(\theta\)). So if we plot the graph of J against number of iterations, it should be decreasing after every iteration.

Automatic convergence test: declare convergence if \(J(a)\) decreases by less than \(10^{-3}\) in one iteration. However in practice, it's difficult to choose this threshold. Not recommended.

How to choose learning rate \(\alpha\):

If \(J(a)\) is increasing or repeat, use smaller learning rate \(\alpha\).
For sufficiently small enough \(\alpha\), \(J(a)\) should decrease on every iteration.
If \(\alpha\) is too small, gradient descent can be slow to converge.
To choose \(\alpha\), try: ..., 0.001 then 0.003 then 0.1 then 0.3 then 1, ... (roughly 3 times), then pick the one right before the largest possible/working learning rate

Resources:

- https://www.coursera.org/learn/machine-learning/lecture/3iawu/gradient-descent-in-practice-ii-learning-rate

Machine Learning

Search This Blog

Learning Rate

Learning Rate

Comments

Post a Comment