If you have a priori knowledge of the environment, but imperfect ability to sense your location in the environment (and probably imperfect actions) then what you're describing may be a partially observable markov decision process (POMDP.) Your four-button sensor fits nicely into that idea.
If you don't even have prior knowledge of the environment, then you need to augment the POMDP notion with elements of exploration or machine learning.
Note that POMDPs are geared toward scenarios with "rewards", and that while POMDPs are pretty well understood, efficient solutions are still a topic of research. As are ML/POMDP hybrids.