note: Lecture 3 is about Linear Algebra review, check at the end of “week2 note” for more info.
Note for Coursera Machine Learning made by Andrew Ng.
We still use the previous examp;e (house price pridiction problem). Then we add the following notations for convenience.
= number of features. = input (features) of training example. = value of feature in training example.
Because the hypothesis is defined as
Which is equal to
We can then represent the hypothesis in martix form
Below is the proof I did for better understand the vectorized gradient descent
The purpose of Feature Scaling is to speedup gradient descent
- Idea: Make sure features are on a similar scale. (eg. see as below)
We can see from above figure that the gradient requires a lot more iterations to approch the minimum point if features are no on a similar scale.
This is a technic which normalize the range (optimze gradient descent)
details see the figure below:
Normal equation is a method to solve for
Proof see below
- In Ocatave/Matlab, we use pinv(pseudo inverse) instead of inv. So it is rare that we can’t find the inverse of
- Pseudo-inverse: https://www.youtube.com/watch?v=pTUfUjIQjoE
- Generalized_inverse(synonym of Moore–Penrose inverse): https://en.wikipedia.org/wiki/Generalized_inverse
- Reason of using pinv instead of inv (eg. when matrix in singular):https://stats.stackexchange.com/a/69459