문제

I just found the animation below from Alec Radford's presentation:

enter image description here

As visible, all algorithms are considerably slowed down at saddle point (where derivative is 0) and quicken up once they get out of it. Regular SGD itself is simply stuck at the saddle point.

Why is this happening? Isn't the "movement speed" constant value that is dependent on the learning rate?

For example, weight for each point on regular SGD algorithm would be:

$$w_{t+1}=w_t-v*\frac{\partial L}{\partial w}$$

where $v$ is a learning rate and $L$ is a loss function.

In short, why are all optimization algorithms slowed down by the saddle point even though step size is constant value? Shouldn't a movement speed be constantly same?

올바른 솔루션이 없습니다

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 datascience.stackexchange
scroll top