Learning Rate
aj:=aj−α∂∂j
Debugging gradient descent: how to make sure gradient descent is working correctly
The job of gradient descent is to minimize J(a) as a function of vector a (or θ). So if we plot the graph of J against number of iterations, it should be decreasing after every iteration.
Automatic convergence test: declare convergence if J(a) decreases by less than 10−3 in one iteration. However in practice, it's difficult to choose this threshold. Not recommended.
How to choose learning rate α:
- If J(a) is increasing or repeat, use smaller learning rate α.
- For sufficiently small enough α, J(a) should decrease on every iteration.
- If α is too small, gradient descent can be slow to converge.
- To choose α, try: ..., 0.001 then 0.003 then 0.1 then 0.3 then 1, ... (roughly 3 times), then pick the one right before the largest possible/working learning rate
Resources:
Comments
Post a Comment