机器学习笔记 02 · 简单易懂的现代魔法

考虑下面一种情况：我们打算用机器学习预测某地的房价，为此我们收集了一些样本作为参考

为了方便表示，我们用如下符号定义该样本

\[ \begin{align*} x_j^{(i)} &= \text{value of feature } j \text{ in the }i^{th}\text{ training example} \newline x^{(i)}& = \text{the input (features) of the }i^{th}\text{ training example} \newline m &= \text{the number of training examples} \newline n &= \text{the number of features} \end{align*} \]

对应的假设函数可以写作： \[ \begin{align*} h_\theta(x) &= \theta_0 + \theta_1x_1 + \theta_2x_2 + \theta_3x_3 + \cdots + \theta_nx_n \newline &=\begin{bmatrix}\theta_0 \hspace{2em} \theta_1 \hspace{2em} ... \hspace{2em} \theta_n\end{bmatrix}\begin{bmatrix}x_0 \newline x_1 \newline \vdots \newline x_n\end{bmatrix}= \theta^T x \end{align*} \] 对假设函数\(h_\theta(x)=\theta^T x\)使用梯度下降方法： \[ \begin{align*}& \text{repeat until convergence:} \; \lbrace \newline \; & \theta_j := \theta_j - \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_j^{(i)} \; & \text{for j := 0...n}\newline \rbrace\end{align*} \] 即可得到最优的假设函数\(h_\theta(x)\). 这就是 Multivariate Linear Regression (多元线性回归) 的基本方法。

Feature Scaling (特征缩放)

当样本中某些特征值 (\(x\)) 的范围过大或过小时，针对其的梯度下降算法可能会收敛的非常慢，因此我们引入 Feature Scaling (特征缩放) 来加快这一过程： \[ x_i := \frac{x_i-\mu_i}{s_i} \] 其中，\(\mu_i\)是所有\(x_i\)的平均值，\(s_i\)是其范围 \(max(x_i) - min(x_i)\) ，或者\(x_i\)的均方差。

Learning Rate (学习速率)

即梯度下降方程 \[ \theta_j := \theta_j - \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \] 中的迭代步长\(\alpha\)。在实际问题中，\(\alpha\)一般都应取得一个适当的值：太小的\(\alpha\)会使梯度下降收敛地太慢，而过大的\(\alpha\)也有可能使方程在局部极小值附近震荡，无法收敛 (如下图示) α过大导致代价方程无法收敛

Normal Equation (正规方程)

除了梯度下降以外，还有另一种常用的选择 \(\theta\) 的方法：设有代价函数 \[ J(\theta_0,\theta_1,\dots,\theta_m)=\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2 \] 对每个\(\theta_j\)，令 \[ \frac{\partial}{\partial\theta_j}J(\theta)=0 \] 解得 \[ \theta=(X^TX)^{-1}X^Ty \] 称作 正规方程 (Normal Equation)，其中，\(X\) 及 \(y\) 对应的值如下图示

梯度下降与正规方程两种算法的时间复杂度分别为 \(O(kn^2)\) (梯度下降) 和 \(O(n^3)\) (正规方程)。与梯度下降方法相比，求解正规方程无需进行迭代，但可以看出，随着 \(n\) ，即特征的数目上升，求解正规方程消耗的时间比梯度下降消耗的时间增加的要快，因此正规方程更适合 \(n\) 较小的情况。