First of all thanks for anyone that helps me with this topic, i copied this message from the learning agents 5.4 courses because makes more sense to belong to the forum, sorry for the double posting
POST1
However, I have a question about the observations—I’m not sure if I’m understanding them correctly.
I tried to add obstacles to the track, and if a car collides with them, it should receive negative rewards to encourage avoiding the obstacles. I created an actor that produces overlaps and stores information about who collided with it.
Interactor:
- In the “specify agent observation” section, I added a location observation and constructed an array observation to add to the map containing the track and car observations. Then, I added a pin there called “Obstacles.”
- In the “gather agent observation” section, I added a pin to the sequence to gather the observations of the obstacles.
I didn’t change anything in the actions of the agent because I thought the actions should remain the same.
Trainer:
- I updated the reward function based on whether an obstacle was hit or not, giving -1 per obstacle hit by the agent.
- I also updated the reset function to return the obstacles to a clean state.
After these changes, I broke it, haha. Now, the score is not going above 0.85, no matter how long I let it run.
Could it be that my reward function or my observations are wrong? It would be nice to have an example of how to gather observations of objects on the track that are not part of the spline. I tried following the example of the robot with the gun and so on, but I don’t know if I’m doing something wrong.
POST2
This are the obstacles, they are fix just to test if they are able to avoid something that i put in the track:
The obstacle actor:
Now the interactor:
Specify Agent obervations function:
I added tha part Obstacle observation + de entry on the map for the obstacles.
Gather Agent observations function
I added this as an step of the sequence
I added this to add the gathered observations on the sequence to the map
Now the trainer:
Gather Agent Reward function
I added this as part of a sequence to calculate when the agent hitted any of the obstacles and if it happens it add the negative reward.
Reset Agent Episode function
I added the reset of the obstacles to not mantain when a agent hitted an obstacle in the next iteration.
Also i modified the Reset function of the pawn to not fall in an obstacle when they are moved at the begining of the episode.
And this were my modifications, i was able to run it headless, using tensorboard etc… the only step that i need to do is restart from an snapshot to continue the training, but that is something that i can try later.
Thanks to anyone that can give me some light to the topic!