Optimization for Detour addTile

I’ve been getting reports from a couple of projects having traversal stutters with Recast, and finally had a chance to reproduce. I tracked down the problem to Detour and also how Unreal uses Detour, and I’m offering a fix I developed that removes the issue entirely at a small memory cost.

I’m going to try to explain what happens as best as I can, but feel free to ask if you have questions.

This project is a World Partition project, and when a new section is streamed in, tile generators are launched in async tasks. Recast actually tries to spread this over various frames (generally every 3rd frame). Once the tile generation is complete, it calls FRecastNavMeshGenerator::AddGeneratedTilesAndGetUpdatedTiles() in the game thread to add the new generated tiles.

In this function it first checks if the tiles were fully regenerated and it removes all the tiles first if that’s the case:

if (TileGenerator.IsFullyRegenerated()) { // remove all layers ResultTileRefs = RemoveTileLayersAndGetUpdatedTiles(TileX, TileY, &OldLayerTileIdMap); }Then it proceeds to go into a loop adding all the newly generated tiles, by calling FRecastNavMeshGenerator::AddGeneratedTileLayer().

Now, in FRecastNavMeshGenerator::AddGeneratedTileLayer(), Unreal first tries to remove the tile if it exists, then tries to add the new tile using an old reference to the removed tile. This is an optimization in Detour to avoid having to recreate the seed for the tile. If it fails to add it that way, then it tries to add the new tile without an old reference.

The issue we were running into is that dtNavMesh::addTile() can sometimes take inordinate amounts of time to add every tile. I tracked it down with the profiler to this bit of code:

while (tile && tile != target) { prev = tile; tile = tile->next; }That code is fulfilling two tasks:

1) Ensure that the node is on the free list, thus still free.

2) Get the prev pointer in case it’s free and we need to unlink it from the list.

It looks pretty innocuous, and fast, but if you look at how the m_nextFree free nodes linked list is implemented in Detour, you’ll notice that as you remove tiles, the nodes get appended to the head of the list. So if you’re just deleting a single node, and then using the old ref to re-add it, it’s super fast, because the node you just removed is right there at the head of the free list.

Now, if you remove 5000 tiles and then try to do the old-ref match with the first node you removed, you’ll be traversing the entire free list to find it. Then when you do the same with the next tile, you do it all over again. Rinse and repeat. Noteworthy as well is that while the tiles array is in a block of contiguous memory, the free list links to arbitrary nodes, meaning they’re scattered through the memory block, so the traversal can also thrash the cache.

FRecastNavMeshGenerator::AddGeneratedTileLayer() does the right thing. It removes the node first, collects the old-ref then re-adds it. That’s the fast case. But if we go back to the TileGenerator.IsFullyRegenerated() condition, we’ll run into the problem that it’ll remove all the tiles first, bulk up the free list in Detour, then add the new ones individually, still trying to re-use, which triggers the worst case of dtNavMesh::addTile().

What I did to address this, is effectively turn the free list into a double-linked list, keeping a prev node, and reusing it also to mark if the node is still effectively free. This avoids the entire iteration through the free list, at the cost of an extra 8 bytes per tile node. The memory cost is negligible and completely eliminates the traversal stuttering. That said, if you wanted to turn this into only 4 bytes per node, you could store the node index instead at the expense of some extra pointer math.

Attached is the patch applied to Unreal 5.5. I marked the change with OPTIMIZE_FREELIST_LOOKUP.

Hope it helps!

Ernesto.

Steps to Reproduce
This happens on a large world partition project, using dynamic Recast navmesh, watershed generation (the default) and normal cell sizes.

Whenever you trigger this code in FRecastNavMeshGenerator::AddGeneratedTilesAndGetUpdatedTiles() with a sizeable amount (5000+) of tiles:

if (TileGenerator.IsFullyRegenerated()) { // remove all layers ResultTileRefs = RemoveTileLayersAndGetUpdatedTiles(TileX, TileY, &OldLayerTileIdMap); }The following call to FRecastNavMeshGenerator::AddGeneratedTileLayer() will take an inordinate amount of time to run (8+ msec), causing hitches.

Thank you for bringing this to our attention. It is not something we have run into ourselves. I have added a task to our JIRA to investigate adding the change similar to a GitHub PR, but I do not know when/if it might get integrated. Our current thoughts are likely not until 5.8.

-James

No worries for any time on your part! We are very appreciative of the additional info and captures you provided. This change was being discussed during the AI team sync today. We will need to test it with FN and Lego FN to see how the changes affect perf (if any).

Hi James.

Thanks. I’m attaching a zip file containing screenshots of Razor captures, unpatched and patched, along with the instruction heatmap on the unpatched version and the Recast settings used in this project. Sorry I couldn’t get this to you earlier, we’ve been out all last week.

Ernesto.