Jacobian矩阵与Hessian矩阵是机器学习中常用到的工具,但大学数学课并没有详细探讨他们。说到Jacobian矩阵与Hessian矩阵,就不得不提到导数(Derivative)偏导数,与梯度,导数是最基础的知识,就不细说了,这里只给出定义:

$$ f^{\prime}(x)=\lim _{\Delta x \rightarrow 0} \frac{f(x+\Delta x)-f(x)}{\Delta x} $$

偏导数(Partial Derivative)是针对多元函数的导数。假设多元函数$f\left(x_{1}, x_{2}, \cdots, x_{n}\right)$,那么它对自变量$x_i$的偏导数的定义就是:

$$ \frac{\partial f}{\partial x_{i}}=\lim _{\Delta x_{i} \rightarrow 0} \frac{f\left(x_{1}, \cdots, x_{i}+\Delta x_{i}, \cdots, x_{n}\right)-f\left(x_{1}, \cdots, x_{i}, \cdots, x_{n}\right)}{\Delta x_{i}} $$

举个例子:

$$ \begin{array}{l}{\left(x^{3}+x y-y^{3}\right)_{x}^{\prime}=3 x^2+y} \\ {\left(x^3+x y-y^3\right)_{y}^{\prime}=x-3 y^2}\end{array} $$

梯度(Gradient)则是多元函数对每个自变量的偏导数的向量,梯度的定义为:

$$ \nabla f(x)=\left(\frac{\partial f}{\partial x_{1}}, \cdots, \frac{\partial f}{\partial x_{n}}\right)^{\mathrm{T}} $$

其中$\nabla$是梯度算子,举个例子:

$$ \nabla\left(x^{3}+x y-y^{3}\right)=(3 x^2+y, x-3y^2)^{\mathrm{T}} $$

根据费马定理,可导函数在某一点取极值的必要条件(非充分)是梯度为0,梯度为0的点也被成为函数的驻点

Jacobian Matrix

Jacobian Matrix中文叫做雅克比矩阵,假定有一个向量$\rightarrow$向量的映射函数$f$,也就是输入为向量,输也为向量,定义如下:

$$ y=f(x) $$

其中,向量$x \in \mathbb{R}^{n},y \in \mathbb{R}^{n}$。那么这个映射的element-wise写法就是:

$$ y_{i}=f_{i}(x) $$

雅各比矩阵的第$i$行第$j$列是指 第$i$个输出向量的各个元素对输入向量的第$j$个变量的偏导数构成的矩阵,定义如下:

$$ \left[\begin{array}{cccc}{\frac{\partial y_{1}}{\partial x_{1}}} & {\frac{\partial y_{1}}{\partial x_{2}}} & {\dots} & {\frac{\partial y_{1}}{\partial x_{n}}} \\ {\frac{\partial y_{2}}{\partial x_{1}}} & {\frac{\partial y_{2}}{\partial x_{2}}} & {\dots} & {\frac{\partial y_{2}}{\partial x_{n}}} \\ {\vdots} & {\vdots} & {\vdots} & {\vdots} \\ {\frac{\partial y_{m}}{\partial x_{1}}} & {\frac{\partial y_{m}}{\partial x_{2}}} & {\cdots} & {\frac{\partial y_{m}}{\partial x_{n}}}\end{array}\right] $$

举个例子:

$$ u=x^{2}+2 x y+z\\ v=x-y^{2}+z^{2} $$

它的雅克比矩阵为:

$$ \left[\begin{array}{lll}{\frac{\partial u}{\partial x}} & {\frac{\partial u}{\partial y}} & {\frac{\partial u}{\partial z}} \\ {\frac{\partial v}{\partial x}} & {\frac{\partial v}{\partial y}} & {\frac{\partial v}{\partial z}}\end{array}\right]=\left[\begin{array}{ccc}{2 x+2 y} & {2 x} & {1} \\ {1} & {-2 y} & {2 z}\end{array}\right] $$

Hessian Matrix

Hessian Matrix中文为海赛矩阵,是多元函数二阶偏导组成的矩阵。假如多元函数$f\left(x_{1}, \cdots, x_{n}\right)$二阶可导,它的Hessian矩阵是:

$$ \left[\begin{array}{cccc}{\frac{\partial^{2} f}{\partial x_{1}^{2}}} & {\frac{\partial^{2} f}{\partial x_{1} \partial x_{2}}} & {\cdots} & {\frac{\partial^{2} f}{\partial x_{1} \partial x_{n}}} \\ {\frac{\partial^{2} f}{\partial x_{2} \partial x_{1}}} & {\frac{\partial^{2} f}{\partial x_{2}^{2}}} & {\cdots} & {\frac{\partial^{2} f}{\partial x_{2} \partial x_{n}}} \\ {\vdots} & {\vdots} & {\vdots} & {\vdots} \\ {\frac{\partial^{2} f}{\partial x_{n} \partial x_{1}}} & {\frac{\partial^{2} f}{\partial x_{n} \partial x_{2}}} & {\cdots} & {\frac{\partial^{2} f}{\partial x_{n}^{2}}}\end{array}\right] $$

很多时候二阶偏导的求导顺序无关,也就是说$\frac{\partial^{2} f}{\partial x_{i} \partial x_{j}}=\frac{\partial^{2} f}{\partial x_{j} \partial x_{i}}$,那么Hessian矩阵就是个对称矩阵。

Hessian矩阵可以记为$\nabla^2f(x)$

举个例子:

$$ f(x, y, z)=3 x^{2}+x y+y^{2}+4 z^{2} $$

那么它的海赛矩阵为:

$$ \left[\begin{array}{ccc}{\frac{\partial^{2} f}{\partial x^{2}}} & {\frac{\partial^{2} f}{\partial x \partial y}} & {\frac{\partial^{2} f}{\partial x \partial z}} \\ {\frac{\partial^{2} f}{\partial y \partial x}} & {\frac{\partial^{2} f}{\partial y^{2}}} & {\frac{\partial^{2} f}{\partial y \partial z}} \\ {\frac{\partial^{2} f}{\partial z \partial x}} & {\frac{\partial^{2} f}{\partial z \partial y}} & {\frac{\partial^{2} f}{\partial z^{2}}}\end{array}\right]=\left[\begin{array}{rrr}{6} & {1} & {0} \\ {1} & {2} & {0} \\ {0} & {0} & {8}\end{array}\right] $$

根据多元函数极值判定法,如果多元函数在某点为0,那么该点为函数的驻点,则可以得出如下结论:

  • 如果海赛矩阵正定,函数在该点有极小值
  • 如果海塞矩阵负定,函数在该点有极大值
  • 如果海赛矩阵不定,该点不是函数的极值

其中,正定是指对于任何非0的n维向量$x$,都有:

$$ x^{\mathrm{T}} A x>0 $$

正定矩阵的判别法有以下几种:

  • 矩阵 特征值全部大于0
  • 矩阵的所有顺序主子式都大于0
  • 矩阵合同与单位矩阵$E$

如果矩阵满足

$$ x^{\mathrm{T}} A x \geqslant 0 $$

那么矩阵是半正定。

最后修改:2021 年 06 月 01 日 02 : 05 PM
如果觉得我的文章对你有用,请随意赞赏