为什么pytorch中nn.mse处理时需要乘以0.5

科技2022-08-17 109

https://datascience.stackexchange.com/questions/52157/why-do-we-have-to-divide-by-2-in-the-ml-squared-error-cost-function

It is simple. It is because when you take the derivative of the cost function, that is used in updating the parameters during gradient descent, that 2

in the power get cancelled with the 12 multiplier, thus the derivation is cleaner. These techniques are or somewhat similar are widely used in math in order "To make the derivations mathematically more convenient". You can simply remove the multiplier, see here for example, and expect the same result.

简单说就是为了反向传导时容易计算。这是在数学界常用的手法。

Processed: 0.015, SQL: 9