What is the role of Numerical Gradient Computation in Backpropagation algorithm?
-
05-11-2019 - |
题
I was listening CS231n (2017) lectures and noted that there is a lot of attention to Numerical Gradient Computation (NGC). It starts @5:53 in this video and appears a few times later.
Also, looking at the batch normalization materials (example), I found a lot of attention drawn to exactly the same topic (well, probably because it is the same backpropagation...).
As I understood, gradients we use in various optimization methods (vanilla SGD, Adam) require us to know activation function derivative. I suppose, if the activation function is complex or we are lazy enough to take derivative analytically, we need to compute gradient numerically and that is where we use NGC.
Questions:
Is that the only purpose of NGC in backpropagation?
Isn't it faster to use analytically form of the activation function derivative to calculate gradients?
没有正确的解决方案