문제

I have a question regarding appropriate activation functions with environments that have both positive and negative rewards.

In reinforcement learning, our output, I believe, should be the expected reward for all possible actions. Since some options have a negative reward, we would want an output range that includes negative numbers.

This would lead me to believe that the only appropriate activation functions would either be linear or tanh. However, I see any many RL papers the use of Relu.

So two questions:

If you do want to have both negative and positive outputs, are you limited to just tanh and linear?

Is it a better strategy (if possible) to scale rewards up so that they are all in the positive domain (i.e. instead of [-1,0,1], [0, 1, 2]) in order for the model to leverage alternative activation functions?

올바른 솔루션이 없습니다

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 datascience.stackexchange
scroll top