E ( w ) = 1 2 ∑ i = 1 N ( y ( x i , w ) − t i ) 2 E ( w ) = 1 2 ( X w − T ) T ( X w − T ) = 1 2 ( w T X T − T T ) ( X w − T ) = 1 2 ( w T X T X w − T T X w − w T X T T + T T T ) min E ( w ) = min ( 1 2 w T X T X w − w T X T T ) = min 1 2 ( w T X T X w − T T X w ) ∂ E ∂ w = X T X w − X T T w = ( X T X ) − 1 X T T E(w)=\frac{1}{2}\sum\limits_{i=1}^{N}(y(x_i,w)-t_i)^2\\E(w)=\frac{1}{2}(Xw-T)^T(Xw-T)=\frac{1}{2}(w^TX^T-T^T)(Xw-T)=\frac{1}{2}(w^TX^TXw-T^TXw-w^TX^TT+T^TT)\\\min{E(w)}=\min{(\frac{1}{2}w^TX^TXw-w^TX^TT)}=\min{\frac{1}{2}(w^TX^TXw-T^TXw)}\\\frac{\partial E}{\partial w}=X^TXw-X^TT\\w=(X^TX)^{-1}X^TT E(w)=21i=1∑N(y(xi,w)−ti)2E(w)=21(Xw−T)T(Xw−T)=21(wTXT−TT)(Xw−T)=21(wTXTXw−TTXw−wTXTT+TTT)minE(w)=min(21wTXTXw−wTXTT)=min21(wTXTXw−TTXw)∂w∂E=XTXw−XTTw=(XTX)−1XTT
E ( w ) = 1 2 ∑ i = 1 N ( y ( x i , w ) − t i ) 2 + λ 2 ∣ ∣ w ∣ ∣ 2 E ( w ) = 1 2 ( X w − T ) T ( X w − T ) + λ 2 w T w min E ( w ) = min ( 1 2 w T X T X w − w T X T T + λ 2 w T w ) ∂ E ∂ w = X T X w − X T T + λ w w = ( X T X + λ I ) − 1 X T T E(w)=\frac{1}{2}\sum\limits_{i=1}^{N}(y(x_i,w)-t_i)^2+\frac{\lambda}{2}||w||^2\\E(w)=\frac{1}{2}(Xw-T)^T(Xw-T)+\frac{\lambda}{2}w^Tw\\\min{E(w)}=\min{(\frac{1}{2}w^TX^TXw-w^TX^TT+\frac{\lambda}{2}w^Tw)}\\\frac{\partial E}{\partial w}=X^TXw-X^TT+\lambda w\\w=(X^TX+\lambda I)^{-1}X^TT E(w)=21i=1∑N(y(xi,w)−ti)2+2λ∣∣w∣∣2E(w)=21(Xw−T)T(Xw−T)+2λwTwminE(w)=min(21wTXTXw−wTXTT+2λwTw)∂w∂E=XTXw−XTT+λww=(XTX+λI)−1XTT
参考:哈工大机器学习PPT、西瓜书、PRML
