Index
Lecture4
note: Lecture 3 is about Linear Algebra review, check at the end of “week2 note” for more info.
Note for Coursera Machine Learning made by Andrew Ng.
Multiple features
We still use the previous examp;e (house price pridiction problem). Then we add the following notations for convenience.
Notations
= number of features. = input (features) of training example. = value of feature in training example.
Because the hypothesis is defined as
Which is equal to
We can then represent the hypothesis in martix form
Link to coursera section
Gradient descent for multiple variables
Below is the proof I did for better understand the vectorized gradient descent
Link to coursera section
Gradient descent in practice I: Feature Scaling
The purpose of Feature Scaling is to speedup gradient descent
- Idea: Make sure features are on a similar scale. (eg. see as below)
We can see from above figure that the gradient requires a lot more iterations to approch the minimum point if features are no on a similar scale.
Mean normalization
This is a technic which normalize the range (optimze gradient descent)
details see the figure below:
Link to coursera section
Gradient descent in practice II: Learning rate
Link to coursera section
Features and polynomial regression
Link to coursera section
Normal equation
Normal equation is a method to solve for
Proof see below
Comparison between gradient descent & normal equation
Link to coursera section
Normal equtaion and non-invertibility
- In Ocatave/Matlab, we use pinv(pseudo inverse) instead of inv. So it is rare that we can’t find the inverse of
.
Useful links
- Pseudo-inverse: https://www.youtube.com/watch?v=pTUfUjIQjoE
- Generalized_inverse(synonym of Moore–Penrose inverse): https://en.wikipedia.org/wiki/Generalized_inverse
- Reason of using pinv instead of inv (eg. when matrix in singular):https://stats.stackexchange.com/a/69459