Index

Lecture7
- Regularization
  - The problem of overfitting
    ~ Overfitting examle (Intro)

Lecture7

Note for Coursera Machine Learning made by Andrew Ng.

Regularization

The problem of overfitting

Overfitting examle (Intro)

overfit_eg1

Overfitting: If we have too many features, the learned hypothesis may fit the training set very well (), but fail to generalize to new examples (predict prices on new examples).

Example of overfitting in classification problem
overfit_eg2

Addressing overfitting

For example we have an overfitting example with 100 features ().

In order to address it, we have following options:
1. Reduce number of features.
  - Manually select which features to keep.
  - Model selection algorithm (later in course).
2. Regularization
  - Keep all the features, but reduce magnitude/values of parameters
  - Works well when we have a lot of features, each of which contributes a bit to predicting .

Link to coursera section

https://www.coursera.org/learn/machine-learning/supplement/VTe37/the-problem-of-overfitting

Cost fumction

Intuition

cost_function_intuition1

Suppose we penalize and make , really small.
Where and *
Then we can flattern the graph by doing above process. (see pink part in the figure)

Regularization

Small values for parameters
- “Simpler” hypothesis
- Less prone to overfitting
Housing:
- Features:
- Parameters: (example in intuition part)

Below is the cost function with regularization term

The , or lambda, is the regularization parameter. It determines how much the costs of our theta parameters are inflated.
used here is to make sure we are dealing with magnitude/values of parameters

What if is set to an extremely large value?

Will cause underfit. (see reason below)

Link to coursera section

https://www.coursera.org/learn/machine-learning/supplement/1tJlY/cost-function

Regularized linear regression

Gradient descent

regularized_gd1
Note!
The regularization term sum from . This is because we do need to regularize the bias term().

Normal equation

Normal equation Non-invertibility

non-invertibility-normal-equation

proof and more details, see below:
https://web.mit.edu/zoya/www/linearRegression.pdf

Link to coursera section

https://www.coursera.org/learn/machine-learning/supplement/pKAsc/regularized-linear-regression

Regularized logistic regression

The second sum, means to explicitly exclude the bias term, . I.e. the vector is indexed from 0 to n (holding n+1 values, through ), and this sum explicitly skips , by running from 1 to n, skipping 0.

Gradient descent

regularized_gd1

Advanced optimization

ad_opt1

Link to coursera section

https://www.coursera.org/learn/machine-learning/supplement/v51eg/regularized-logistic-regression

Coursera-Machine-Learning-Lecture7-Regularization

Yadong Liu 发布于 2021-09-03

Index

Lecture7

Regularization

The problem of overfitting

Overfitting examle (Intro)

Addressing overfitting

Link to coursera section

Cost fumction

Intuition

Regularization

What if is set to an extremely large value?

Link to coursera section

Regularized linear regression

Gradient descent

Normal equation

Normal equation Non-invertibility

Link to coursera section

Regularized logistic regression

Gradient descent

Advanced optimization

Link to coursera section

Sukoshi