The Problem of Overfitting
3 possible scenarios:
- Under-fitting: high bias, not fit the data very well. Have a strong bias that data will fit a straight line
- Just right:fit data not as well as overfitting but good enough and generalizable
- Over-fitting: high variance, try too hard to fit all data, hard to generalize to fit new data. This is because we have many features
How to fix overfitting:
Below options apply to both linear regression and logistic regression (classification)
Reduce number of features (penalize them) by
Reduce number of features (penalize them) by
- Manually select which features to keep - neither easy nor efficient
- Model selection algorithm - automatically decide which features to keep which to throw out (later)
Regularization:
- Keep all the features, but reduce the magnitude/values of parameters \(\theta_j\)
- Works well when we have a lot of features, each of which contributes a bit to predicting y
Resources:
Comments
Post a Comment