Good questions!
Is there any way to see/visualize things that are happening in the learning? more specifically, is there a way to check if any observation that has been set, is actually being used?
Also to expand on this, do observations on their own need any extra rewards or something set up for them, or do observations, when observed ‘’‘just work’’ for the learning?
To use new observations, you simply “declare” they exist in your interactor by calling their “Add X Observation” function during Setup Observations. Then in SetObservations, you fill in their data. If you don’t set the values in an observation, you will get a warning + your agent will not be used during training (look at LogLearning in the Output Log).
One thing that can help with debugging/understanding is to look at Unreal’s Visual Logger | Unreal Engine 4.27 Documentation - Learning Agents already logs all the observations and actions to the visual logger (using the names you provided during “Add X Obs/Action”). I’ve used this tool a lot to look at the data getting fed to the network.
im my case, i’ve added a few raycasts to the vehicle to probe walls around the track, assuming that simply providing it with more information about its surroundings would help the vehicle learn.
I think this is unlikely to help significantly but I never tried it. One thing to be careful of is ensuring that your new observations are normalized correctly. You want all the inputs to roughly live in the range of (-1,1) and setting an appropriate Scale to achieve this.
If you want to improve the behavior of the vehicles (i.e. getting proper racing lines to emerge), you probably need to tune the reward function more than anything. The vehicles need to be rewarded based on achieving the fastest time on a fairly big chunk of the track, but I felt like setting this up was too much work for the tutorial.
should i be feeding some kind of rewards to these ray cast results, or should the mere act of feeding it the numbers in an observation be enough?
Rewards are not given to individual observations. The agent as a whole gets rewarded. The agent is trying to learn a function (called the policy) that given a state, produces the best action (State → Action). The combination of all your observations is the “state” and the combination of all the actions is the “action”. The best action is the one which will give the agent the highest return (Part 1: Key Concepts in RL — Spinning Up documentation).
Thanks again for this excellent plugin!
Anytime, thanks for providing excellent questions and good feedback.
edit:
For future readers:
When adding new observations to the interactor, you MUST reset the associated neural-network asset, or it wont work, and your entities will be braindead.
If you try to use a model which was trained on different obs/actions, you should see errors every frame like:
LogLearning: Error: BP_DrivingPolicy: Setup not complete.
LogLearning: Error: BP_DrivingPolicy: Setup not complete.
LogLearning: Error: BP_DrivingPolicy: Setup not complete.
And if you scroll further up, you should see:
LogLearning: Error: BP_DrivingPolicy: Neural Network Asset provided during Setup is incorrect size: Inputs and outputs don't match what is required.
This error is trying to tell you that your observations + actions don’t match how the network was previously trained.
Thanks,
Brendan