This was an exciting week with many new information. Andrew created a post where people share top 5 things that you've learned this week. I find this awesome. Below are the most upvotes:
Shaji Parol:
Jayanth Bharadwaj:
Nancy:
Resources:
- https://www.coursera.org/learn/machine-learning/discussions/weeks/2/threads/hiHFNfMdEeaLjw4_t1TJig?sort=upvotesDesc&page=1
Shaji Parol:
- Linear regression with multiple variables is also known as "multivariate linear regression". The gradient descent equation is generally the same form; we just have to repeat it for 'n' features
- Speed up gradient descent by having each of the input values in roughly in the same range. Two techniques to help with this are feature scaling and mean normalization.
- Feature scaling involves dividing the input values by the range (maximum - minimum) of the input variable
- Mean normalization involves subtracting the average value for an input variable from the values for that input variable resulting in a new average value for the input variable of just zero.
- Make a plot with number of iterations on the x-axis, cost function, J(θ) on y-axis, over the number of iterations of gradient descent. If J(θ) ever increases, then you probably need to decrease α the learning rate.
- Declare convergence if J(θ) decreases by less than E (~10-3) in one iteration. If learning rate α is sufficiently small, then J(θ) will decrease on every iteration. If α is too small, slow convergence. If α is too large, : J(θ) may not decrease on every iteration and thus may not converge.
- Change the behavior or curve of our hypothesis function by making it a quadratic, cubic or square root function.
Jayanth Bharadwaj:
- Its effective that all the features are on a similar scale - Feature Scaling
- Get every feature into range approximately between -1 and 1 (a little digression is acceptable) Eg: feature between -3 and 3 OR between -1/3 and1/3 is considered acceptable
- Feature scaling is done to speed up the process, because normally parameter(theta) descends quickly on small ranges and slowly on large ranges
- Cost function J(theta) should decrease after every iteration and declare convergence if J(theta) decreases by less than 0.001 in one iteration. Therefore optimized alpha(learning rate) should be chosen
- It's very important to apply feature scaling as the values and ranges of features such as square, cubic are to be represented
Nancy:
- Dealing with a problem with more than one variable/feature, including how to formulate its hypothesis function and gradient descent algorithm.
- Implementing a more efficient gradient descent algorithm by applying feature scaling with or without mean normalization.
- Debugging strategy for the Gradient descent algorithm using the cost function versus number of iterations plot, in addition to insights for choosing the appropriate learning rate (alpha) value.
- Appropriate choice and creation of features to get a powerful Linear regression learning algorithm that can capture the shape and nature of your dataset, either by combining features or by fitting complex functions to your data.
Resources:
- https://www.coursera.org/learn/machine-learning/discussions/weeks/2/threads/hiHFNfMdEeaLjw4_t1TJig?sort=upvotesDesc&page=1
Comments
Post a Comment