DeepMimic-like System With Learning Agents Plugin

ncmcclure · September 29, 2023, 12:08am

I’m creating a new thread upon @Deathcalibur 's request based on a performance problem I’ve been encountering with my Learning Agents system. This should act as a place to discuss these issues and hopefully share findings should we get to the bottom of it.

Since 5.3 launched I’ve been working on a DeepMimic-like system to train a 100% physics-based character to learn animations with the help of the Physics Controller plugin. It’s taken a couple weeks, but I’ve finally gotten the system to a good point and have begun training.

Problem: I’m working on a pretty high-end machine (i9 13900k, RTX 4090, 64GB RAM, NVMe storage) and I’m only getting ~60fps when training with only a single agent, and Unreal only seems to be consuming around 25% of both my GPU and CPU, which would indicate that Unreal is somehow the bottleneck here. For reference, with my Learning Agents systems disabled, I can run 48 of these Physics Control driven (target orientation & strength set every frame) skeletal meshes before dipping below 60fps, and my CPU is at 50% utilization.

I understand that the system I’ve built is quite intricate and requires significantly more resources than the provided driving tutorial, seeing as my character is driven by 23 physics controls, with my interactor having 46 total actions (Target Rotation & Strength for each joint), 69 Observations (Angular velocity, Linear Velocity, Position for each anim reference spline), three rewards, and a couple termination events… with my learning manager ticking at 30hz.

What’s even more strange is that when increasing the number of agents to 16, my framerate tanks to around 5fps… but my CPU and GPU utilization decrease to roughly 10%, from 25% with 1 agent. I’m really stumped what’s going on here. Keep in mind this is with my Learning Manager ticking at 30hz. If I decrease this to 1hz, I get 100fps, but 1hz obviously isn’t very constructive for learning in this case.

Paths to begin troubleshooting: Deathcaliber mentioned that they’ve created even more intensive scenarios with LearningAgents and haven’t had much issue, so there’s likely an issue somewhere in my code that I can uncover through Unreal Insights. This is something that I’ll begin looking into but would appreciate any good suggestions since I haven’t used Unreal Insights before.

ncmcclure · September 29, 2023, 1:09am

Unreal Insights Update:

So after doing some digging into what’s causing these long frametimes it looks like the function GetPositionAtSplineDistance is definitely the culprit with each call taking approximately ~290μs on the game thread. As you can see in the attached screenshots, GetPositionAtSplineDistance is being called numerous times each frame during SetAgentRewards and SetAgentObservations, taking significantly longer than any other function.

I’m sure it can be gathered why it’s being called so many times. I’m generating spline components at runtime (with accompanying per frame linear/angular velocity data) for each needed bone before the training begins, totaling 23 splines in total, not per agent; I’m doing some clever transform conversions so each agent is technically only checking against these generated master spline components in a local reference frame. These spline components (and their accompanying data) end up being used as helper splines during training for rewarding, as per the DeepMimic paper.

I’m very curious why this function alone would be causing such a drastic slowdown though compared to all the other SplineComponentHelper functions.

FYI: For this trace I was using 4 agents, hence why there are so many function calls.

Total frame overview:

GetPositionAtSplineDistance Highlighted:

PM_Zahid · September 29, 2023, 4:06am

i am religiously eager to follow your progress and witness success.

ncmcclure · September 29, 2023, 4:36am

I found a solution … kind of

So for each spline that I’m generating before training begins, I was creating a new point each frame during the animation, and I wasn’t limiting the framerate during this step. Since I’m easily hitting around 400fps during this step, each one of these spline points would end up having over 10k points for 2.5s of an animation. So when GetPositionAtSplineDistance was called, it was checking against a highly complex spline.

I’ve now written a C++ function that greatly simplifies each spline after generation by about 180% while still retaining an almost identical path, bringing that function call duration down to <10μs. After simplification I can now run 9 agents while still hitting around 45fps which is certainly an improvement.

The reason why I say “kind of” a solution, is because Unreal is still only utilizing 25% of my CPU and 25% of my GPU, which I would imagine isn’t right… right? There must be something else with Unreal outside of my implementation that is the bottleneck now.

PM_Zahid · September 29, 2023, 4:07pm

u cld try using perplexity to search for UE cpu bottleneck results. and come up with a list of possible causes.

ncmcclure · September 29, 2023, 9:05pm

Okay so since my last reply I’ve optimized a few functions by converting them to C++, and now I’m getting a pretty consistent 60fps with 9 agents. Unreal Insights is really amazing, it’s too bad I hadn’t come across it earlier.

I think the “low” CPU utilization is starting to make more sense to me now, which I guess should have been more obvious this entire time. The heaviest LearningAgents events (SetRewards, SetObservations) are all being called on the game thread, so only a single one of my threads is actually pegged at 100% during training… Pretty incredible that only a single thread is currently training this many complex agents at 60fps; the 13900k really is a beast!

The obvious next step would be to move over each agent’s SetReward & SetObservations functionality (which includes a lot of the expensive function calls) to separate threads such that each agent gets its own thread to calculate rewards and observations. In my case this should allow me to utilize all 32 threads on my CPU with 32 agents… ideally.

I’m definitely curious to know how LearningAgents handles multithreading. I’m a little worried that multithreading some of my functions could break some LearningAgents functionality if SetReward & SetObservations must be called in a specific order by the RunTraining function. I guess I’ll find out!

PM_Zahid · September 29, 2023, 9:50pm

how do you plan on moving the agent’s functions to other threads?

ncmcclure · September 30, 2023, 7:06am

So I successfully implemented multithreading! I can now easily run 32 of my agents at 60fps with my CPU at ~90% utilization!

One of the bigger hurdles to get it working was I had to ensure (as per my original worry) that each needed training event (SetRewards & SetObservations in my case) would properly wait for all of my multithreaded tasks to complete before the game thread resumed. If any of the training events are called out of order, then my training would fail and I would get warnings about “non-matching iteration numbers”, which makes perfect sense.

@PM_Zahid Unfortunately, there’s quite a lot of intricacies to my implementation and I’m not quite in a position just yet to elaborate on all of the details at the moment. I do plan on trying to release this as a plugin (or project, depending on marketplace guidelines) in the near-ish future once I’ve ironed out more bugs and implemented the task portion of the training/inference as laid out in the DeepMimic paper.

Deathcalibur · October 3, 2023, 4:04pm

Leonard-nimoy-the-simpsons GIFs - Get the best GIF on GIPHY

I’m glad I could help by pointing you to unreal insights. Thanks for sharing your progress! Exciting stuff

Deathcalibur · October 3, 2023, 4:12pm

That’s cool you were able to multi-thread those events. I guess exposing the full agents array was a good idea after all. Originally we had the event called with one agent id at a time but then what you accomplished would have been pretty much impossible.

Did you find the iteration number thing more helpful or annoying? We wanted to help users avoid silent failures where they were not doing what they thought they were.

tomhalpin8 · October 26, 2023, 5:29pm

@ncmcclure The anticipation is killing me. Do you have a sneak peak?

MEIJW1 · July 23, 2024, 4:24am

@ncmcclure Very much looking forward to your multithreaded optimization sharing

tomhalpin8 · January 17, 2025, 1:11am

@ncmcclure Any update to this system?