What is the proper way to use Async Pathfinding?

I created a custom AI MoveTo node that uses the Async Path request instead of sync path requests. My current flow for using it is as follows:

  • ExecuteTask is called and I call the function FindPathAsync to receive a uint32 handle for the query result and store the result in the node memory block (non-instanced C++ native Task node)
  • In TickTask, I check each frame to see when the Async path is formed
  • In TickTask, if a query is pending and the uint32 handle is valid, I collect the NavPathSharedPtr and I call RequestMove() on the AI’s AIController which passes the path to the PathFollowingComponent
  • Sometimes (I will clarify later), when RequestMove is called, reallocation of memory for the multi-cast delegate FPathObserverDelegate ObserverDelegate on the FNavPathSharedPtr cached on the UPathFollowingComponent will fail on a memory allocation on an AddObserver call with a read access violation 0xFFFFFFFF, which I believe is uninitialized memory

To clarify on the sometimes, this only becomes a reliable repro if I crank my unit test stress test of many AI’s requesting and re-requesting Async paths constantly. If I don’t stress test, then this crash has a pretty low repro to almost non-existent rate with ~12 AI’s pathing. When the AI’s do path, they do path properly to the correct locations.

What I first want to know is if this is the proper way to incorporate Async pathing for AI’s? Is the flow of requesting an Async path and then calling RequestMove on the PathFollowingComponent with the FNavSharedPathPtr the intended way it is used? Or, could the problem be in the use/setup of the Async MoveTo BehaviorTask Node? I know C++ native nodes are non-instanced, so I’ve only used the memory pool for the node for any persistent data and I am not making changes to any Node member data itself.

I do have a way to test the Async pathfinding outside of the behavior node. I wrote some functions to draw a path based on NavPolyRefs (Async path doesn’t populate more than the start and end for path points and instead uses PolyRefs from the NavMesh). So, I can separate and test if needed.

I am happy to share code or Pseudo-code if it helps. Thanks for any advice.

Edit:
Adding some more information to this. It looks like the current way Async Path Queries are done is not thread-safe. It is accessing an Actor (UObject derivative outside of the main game-thread. When using CrowdAvoidance, they process the same data set and ocassionally allocate delegate memory outside of the main game-thread, when certain steering conditions hit.

Here is an image of the parallel stacks:

The For loop on the non-main-game thread mentions this itself:

for (FAsyncPathFindingQuery& Query : PathFindingQueries)
{
	// @todo this is not necessarily the safest way to use UObjects outside of main thread. 
	//	think about something else.
	const ANavigationData* NavData = Query.NavData.IsValid() ? Query.NavData.Get() : GetDefaultNavDataInstance(FNavigationSystem::DontCreate);

	// perform query
	if (NavData)
	{
		if (Query.Mode == EPathFindingMode::Hierarchical)
		{
			Query.Result = NavData->FindHierarchicalPath(Query.NavAgentProperties, Query);
		}
		else
		{
			Query.Result = NavData->FindPath(Query.NavAgentProperties, Query);
		}
	}
	else
	{
		Query.Result = ENavigationQueryResult::Error;
	}

	// @todo make it return more informative results (bResult == false)
	// trigger calling delegate on main thread - otherwise it may depend too much on stuff being thread safe
	DECLARE_CYCLE_STAT(TEXT("FSimpleDelegateGraphTask.Async nav query finished"),
		STAT_FSimpleDelegateGraphTask_AsyncNavQueryFinished,
		STATGROUP_TaskGraphTasks);

	FSimpleDelegateGraphTask::CreateAndDispatchWhenReady(
		FSimpleDelegateGraphTask::FDelegate::CreateStatic(AsyncQueryDone, Query),
		GET_STATID(STAT_FSimpleDelegateGraphTask_AsyncNavQueryFinished), NULL, ENamedThreads::GameThread);
}

My guess is that the Async queries were never tested rigorously while CrowdAvoidance was turned on. Would the best current workaround be to override the NavigationSystem and get rid of the reliance on accessing a UObject outside of the main game-thread?

2 Likes

For the very few that might care about this, I was able to fix this crash by putting a scoped critical section on any use of a NavPathSharedPtr that is also used as a part of a RequestMove. While FNavPathSharePtr is supposed to be thread-safe, its contents are certainly not, such as the PathCorridors. Fortunately, we only draw paths when the user opts into it to see a path preview, so the lock won’t hinder performance very much. I haven’t had much time to dive deeper into this but for those who also ran into this issue, this is a potential solution

TArray navPathPointsDoubleBuffer;
TArray navPolyRefsDoubleBuffer;
{
   check( IsInGameThread() );
   FScopeLock scopedLock( &PathPointPolyEdgeCorridorsCriticalSection );

   const TArray& navPathPoints = navPath->GetPathPoints();

   // According to their docs, FNavPathSharedPtr is valid when PathPoints is > 1
   constexpr int ValidPathResults = 1;
   const bool bHasValidPath = ( navPathPoints.Num() > ValidPathResults );
   if ( bHasValidPath == false )
   {
      return false;
   }

   const TArray& navPolyRefs = navMeshPath->PathCorridor;

   // Append takes care of Reserving Memory Blocks
   navPathPointsDoubleBuffer.Append( navPathPoints );
   navPolyRefsDoubleBuffer.Append( navPolyRefs );
}
5 Likes

Five years later but this looks like a good solution to the problem. In theory, even though it violates thread-safety, the uobjects that Unreal are accessing in background threads are essentially de facto read-only, with the only exception being the nav mesh, but if the nav mesh is marked dirty while recalculating, pathfinding will abort and reset. As you discovered, the issue is that RVO can tamper with this delicate balance. Your solution is elegant and simple, as the other option would be to force RVO async logic onto the same thread as async path-finding, which could end up clogging up another thread and defeat the purpose of async-ing in the first place. I thought about taking a read-only snapshot of the nav mesh before launching path-finding but with 100+ AI, I didn’t like the thought of so much memory overhead.

It seems odd that Unreal doesn’t offer an Async move to out of the box. Maybe they don’t have faith in their own approach. Regardless, I’m very glad I came across your thread before I started working on my own node. Great work! :slight_smile:

Ok so after a few hours of poking around and thinking, I came up with a slightly different solution. Note that this is pretty advanced but if you’re at the point where you’re even considering sending pathfinding to background threads, I assume you can handle it. This solution begins from the assumption that RVO is of secondary importance to path finding, and blocks RVO if any async pathfinding is in progress.

Firstly, we create a child class of CrowdManager, override tick, create a boolean gate and an int that keeps track of async operations:

// MyCrowdManager.h
public:
  static void SetAsyncInFlight(UWorld* World, bool bInFlight);

protected:
	virtual void Tick(float DeltaTime) override;
	bool bAsyncPathfindingInFlight;
	int32 OperationsInProgress;

then we keep track of how many ongoing pathfinding requests are ongoing and only allow tick to fire if there are none:

// MyCrowdManager.cpp

void UMyCrowdManager::Tick(float DeltaTime)
{
	if (!bAsyncPathfindingInFlight)
	{
		Super::Tick(DeltaTime);
	}
}

void UMyCrowdManager::SetAsyncInFlight(UWorld* World, bool bInFlight)
{
    if (auto* NavSys = FNavigationSystem::GetCurrent<UNavigationSystemV1>(World))
    {
        if (auto* CM = Cast<UShoniCrowdManager>(NavSys->GetCrowdManager()))
        {
            CM->OperationsInProgress += (bInFlight ? 1 : -1);
            CM->OperationsInProgress = FMath::Max(0, CM->OperationsInProgress);
            CM->bAsyncPathfindingInFlight = (CM->OperationsInProgress > 0);
        }
    }
}

Then, once you’ve built your async MoveTo, just make sure you fire off SetAsyncInFlight (true) before calling the asyn pathfind and then again (false) when your bound delegate is fired.