Week 2 - Summary

This was an exciting week with many new information. Andrew created a post where people share top 5 things that you've learned this week. I find this awesome. Below are the most upvotes:

Shaji Parol:

Linear regression with multiple variables is also known as "multivariate linear regression". The gradient descent equation is generally the same form; we just have to repeat it for 'n' features
Speed up gradient descent by having each of the input values in roughly in the same range. Two techniques to help with this are feature scaling and mean normalization.
Feature scaling involves dividing the input values by the range (maximum - minimum) of the input variable
Mean normalization involves subtracting the average value for an input variable from the values for that input variable resulting in a new average value for the input variable of just zero.
Make a plot with number of iterations on the x-axis, cost function, J(θ) on y-axis, over the number of iterations of gradient descent. If J(θ) ever increases, then you probably need to decrease α the learning rate.
Declare convergence if J(θ) decreases by less than E (~10-3) in one iteration. If learning rate α is sufficiently small, then J(θ) will decrease on every iteration. If α is too small, slow convergence. If α is too large, : J(θ) may not decrease on every iteration and thus may not converge.
Change the behavior or curve of our hypothesis function by making it a quadratic, cubic or square root function.

Jayanth Bharadwaj:

Its effective that all the features are on a similar scale - Feature Scaling
Get every feature into range approximately between -1 and 1 (a little digression is acceptable) Eg: feature between -3 and 3 OR between -1/3 and1/3 is considered acceptable
Feature scaling is done to speed up the process, because normally parameter(theta) descends quickly on small ranges and slowly on large ranges
Cost function J(theta) should decrease after every iteration and declare convergence if J(theta) decreases by less than 0.001 in one iteration. Therefore optimized alpha(learning rate) should be chosen
It's very important to apply feature scaling as the values and ranges of features such as square, cubic are to be represented

Nancy:

Dealing with a problem with more than one variable/feature, including how to formulate its hypothesis function and gradient descent algorithm.
Implementing a more efficient gradient descent algorithm by applying feature scaling with or without mean normalization.
Debugging strategy for the Gradient descent algorithm using the cost function versus number of iterations plot, in addition to insights for choosing the appropriate learning rate (alpha) value.
Appropriate choice and creation of features to get a powerful Linear regression learning algorithm that can capture the shape and nature of your dataset, either by combining features or by fitting complex functions to your data.

Resources:
- https://www.coursera.org/learn/machine-learning/discussions/weeks/2/threads/hiHFNfMdEeaLjw4_t1TJig?sort=upvotesDesc&page=1

Machine Learning

Search This Blog

Week 2 - Summary

Comments

Post a Comment