Using Learning Agents to create a simple cube bot

Greetings, fellow devs. I am doing some research into the capabilities of learning agents in various engines, currently dealing with UE5. I wanted to create a simple scenario: the agent is a cube, tasked with reaching a randomly positioned goal. Worked just fine in other engines (Unity, Godot), but in UE5 it’s struggling and I don’t know why. Maybe y’all could help and steer me into the right direction.

As you can see, I am using the Learning Agents 5.5 plugin for this project. The only tutorial on it was about cars though, so I had to improvise and figure things out on my own. The cube can move in four directions, and it can observe its own location and the location of the target. It gains a reward for reaching the goal, and a penalty for hitting the wall. There’s also a location difference calculator, though it seems to give slightly better results if it gives a reward for approaching the target rather than a penalty for not being close enough. Finally, positions reset after time-out. The goal (green ball) changes location on the board randomly.

This set-up worked well enough in other engines, but no matter how long I let the program train, its performance gets worse. At some point, it picks a wall or a corner and starts racing towards it, episode after episode, completely ignoring the goal. I’ve attempted to place in multiple agents to speed up the learning process, but the results are the same. It’s as if the Agent wasn’t even receiving the location or reward data. Using the visual logger, all the observations are clearly working, and the output log gives me no errors either.

feitbot

Trying to get this to work correctly is like herding cats, and the lack of tutorials on the topic doesn’t help. It’s the most basic example of an ML-Agent, surely someone else has attempted this in the past? If anyone has any insight at all, I would greatly appreciate it.

Don’t hesitate to ask me for additional info.

Can you send a screenshot or send code of 1) interactor, and 2) TrainingEnv? Also, what is your manager’s tick rate?

I use the 5.5 plugin and it works really well for both Reinforcement Learning and imitation learning, so keep at it. I generally train with 64-128 agents at a time, for simple cubes you can probably do more. Make sure your rewards make sense… to keep things from blowing up, try to scale everything (rewards and NN inputs) to -1 to 1 range if possible. I can watch it train for 10-15mins and can tell if my rewards aren’t set up properly and will pause the train and tweak them, restarting without reinit to continue training if it hasnt hit some weird minimum.

I would say start simple, the cube shouldn’t know it’s own location XY but it should be it’s location relative to the center of the board (normalized to close to -1 to 1), and it should know target’s location relative to itself(normalized). Make sure your values are ego-centric to the agent, not world relative. Add reward for distance to target and reward distance to center. Training in avoiding the specific walls will be more difficult and depends on your design intent… for that type of stuff giving your agents ‘whiskers’ using line traces is a pretty good solution.

As an example my rewards for a flight combat game are set up to reward staying within an altitude range, maintaining target position in front (dot product x vector), maintaining a distance to target, and reward minimizing roll (I use Map Range Clamped a lot here to scale the world values to reward values). The agent controls floats yaw/pitch/roll/throttle and boolean ‘fireweapon’. I can have somewhat workable agents trained in a couple hours with that setup that fly around pretty well… training more advanced behavior I use Imitation Learning with surprisingly good results with care in creating good training data, it honestly feels like magic when it works.

1 Like

Certainly. Here are the blueprints. (The forum will only let me upload one image per post, so I had to merge them all together like this, apologies >.<)

As for the tick rate, I’m assuming you mean the tick interval, which is at 0. Anything more than that and the bot moves too slowly. Thanks for taking a look ^^

I know using multiple instances can speed up the process by a lot, but I’m having trouble figuring out how to do that in UE5. In Unity and Godot, you can designate the whole platform as the training environment and just copy paste it. The tutorial only shows multiple agents training on the same track, so I couldn’t extrapolate much from that, other than placing all the agents on the same board and disabling collision. But I am trying to normalize the rewards at least. Not sure if that’s the culprit though.

You bring up some interesting points. However, there are limits to how much I can alter the observations, as the entire point of the experiment is to compare how the same ML-Agent setup works in different game engines. While I can play around with how it calculates distance to the target, I can’t add sensors that use line traces and that kinda stuff, since I didn’t use those in previous examples. Maybe UE5 handles distance differently, and that’s my oversight. I’ll start messing around with the observations like you suggested.

Thanks for the advice! ^^

As a starting point, I would try adjusting the manager tickrate from 0 to 0.2 (5 Hz) and then adjust the cubes movement until it works well. Your interactor’s actions are an issue because the agent will only move on the frames where the interactor is ticking. I would have the interactor output a velocity and then have the cube apply the velocity every frame in its own tick, e.g. create a public SetVelocity function on the cube and then have the interactor call it periodically.

The reason I want you to make these changes is because it can be hard to learn to control an object when ticking at a very high rate.

BTW for your issues with gyms, we’ve added a simple gym manager called ALearningAgentsGymsManager and a gym class ALearningAgentsGymSimple, which make it easier to spawn and manage basic gyms like you need. I don’t recall however if this was added by 5.5 or not, and also its an undocumented feature so you’ll need to look at the two classes I mentioned plus ALearningAgentsGymBase to figure it out.

1 Like

Probably the main reason its not training is your gatheragentobservation location setups. you are gathering location, and using the same actor’s transform to localize it, so it will return 0,0,0.

To give the ObsActor pawn knowledge of it’s world location, I would pass in world 0,0,0 to that RelativeTransform pin, or wherever the center of the pawns ‘world’ is. You can right-click the pin and ‘split struct’ and just leave the default.

For Goal, use the ObsActor’s transform to localize it, so the target location is ‘ego-centric’, from the agents perspective.

2 Likes

I would def change the tick interval on the Manager BP as Brendan suggested, otherwise your training will be super noisy and will have a really hard time converging. I personally use a 0.1 tick rate for training.

1 Like

Oh wow, now I feel silly. No wonder the agent couldn’t learn how to do anything, it was practically blind!

I’ve applied the changes you suggested, as well as the adjustments to movement and the tick rate from Brendan, and now the Agent works like a charm! I could barely believe it. A whole week messing around with this program and turns out I was just put in the wrong values. The tick rate part I would never have figured out on my own though, so I thank you guys for that one especially! Y’all saved the project ^^

Next step is to set up multiple instances, I suppose. Fun stuff.

Thanks again y’all. Hope to see this extension get even better in the future ^^

1 Like

Genius solution. I was wondering how to get the Agent to move more at a lower tick rate. Never thought to make use of continuous action in-between ticks like that. Had to adjust the speed a little, but it works like a charm now!

As for the gyms, I’ve found the blueprints (though they’re named a little differently, there’s no A). Currently messing around with those to figure something out, but it’s gonna take a while without a tutorial or documentation, all I have to go off of is how they’re made in Unity and Godot. I suppose a future version of the plugin will have more tutorials? I would love to see it ^^

Thanks again!

If you do any comparisons in either a blog or video whatever, would be happy to check it out :slight_smile:

1 Like