I am doing some experimentation with Reinforcement Learning and Behavior Trees in the Engine. The first step is obvious creation of “Learning” nodes which may run the k-arm bandit algorithm. One thing that I am stuck at is “the right way to gather reward from environment”. Is there a perception component for this purpose or I am supposed to write that?
The reward system I am thinking now is based on the score of player. One way is to hardcode the reward system through C++ and invoke the methods via blueprint behavior tasks, but something tells me that it won’t be scalable and modular.