How to efficiently simulate hundreds of instances of something at once?

anonymous_user_4fe9f58b1 · March 29, 2015, 4:51pm

Yeah. If you only need distances for comparisons, then the squared size is sufficient. It’s a common minor optimization. Only use exact distances when you need them, for instance for linearly interpolating damage according to distance from a blast or some such.

You definitely want to pool actors to avoid constant respawning, yeah. Which is another point in favour of a centralized missile manager.

Remember we’re looking to speed up iterating over the missiles. The critical path of the algorithm is processing the locations, filtering, etc. If you take dereferencing actors out of said critical path, the important part can go that much faster. Inputs for the algorithm are target location, missile location and missile direction. Outputs depend on your missile’s movement component, currently it seems to take the missile’s homing target but it could also just be missile direction. Your missile’s data buffer would end up looking like this:


struct FMissileData
{
	FVector Location;
	FVector Direction;
	AActor* HomingTarget;
};

By using a manager that arranges all of this information in a contiguous, cache-friendly fashion, the costly part of the algorithm can go much faster. It might not sound very intuitive, but the added overhead of copying data in and out of these data buffers can still result in a net gain. Each missile tracked by the manager has an index property that keeps track of where its data is in the buffers, and you use that index to copy data to/from the buffers during the copy pass.

Single threaded

Data copy: Copy all target locations to a buffer, copy all missile locations and directions to a buffer
Data update: Iterate over missile data, choose targets, write outputs
Data copy: Iterate over missile data, copy missile targets from buffer into movement component

Remember that I’m always keeping multithreading on the table as an eventual further optimization, and that’s where data rearrangement stands to gain the most. To illustrate:

Multithreaded

Data copy: Copy all target locations to a buffer, copy all missile locations and directions to a buffer
Kick off (1…n) worker threads
Blocking wait: Until worker threads are finished
Parallel data update: 1…n threads iterate over missile data, choose targets, write outputs
Data copy: Iterate over missile data, copy missile targets from buffer into movement component

That still doesn’t make the best the best use of parallelism because you’re still blocking the game thread while waiting for jobs to finish. A fully multithreaded and asynchronous solution would avoid blocking the main thread, so that the only penalty from having a sudden, huge spikes in missiles flying around will be that they are slower to acquire their target, rather than affecting frame rate. Async would look roughly like this:

Multithreaded async

Non-blocking wait: If there are still worker threads busy, skip this tick
Data copy: Copy all missile targets from buffer into movement component, copy missile location/direction into buffer, copy all target locations to a buffer
Kick off (1…n) worker threads
Parallel data update: 1…n threads iterate over missile data, choose targets, write outputs

Then you could further refine this by adding double buffering, so that missile update threads can still work on a “back” buffer even while the main thread copies data to/from the “front” buffer. But that would only be a gain if you further rearrange the entire algorithm so that even missile movement is part of the worker threads, and said threads essentially never stop working for a data copy.

The biggest caveat with this approach is that you have a nested loop, i.e.: iterate over missiles then iterate over targets. There’s no way to tell without writing it and profiling it, but I think having to switch between missile and target “contexts” will still end up causing cache misses and defeat the purpose of cramming all your data in a linear buffer. You might need to rearrange the algorithm so that you iterate over targets and then over missiles. Or maybe somehow include target data inline with missile data – if you have a quadtree, your data copy could single out (n) closest targets and only operate on those.

Another possible approach is to ignore the linear buffer altogether but still use multithreading. Each worker thread would still dereference actors and get their location, direction, etc. This is a pretty risky rabbit hole to go down, however, as UObject access is not very thread safe. It might be made to work if your missile manager strictly controls UObject allocations so that worker threads never have a missile object deleted from under them. Instead of destroying spent missiles, you just flag them, and worker threads would just ignore those missiles when encountering them. But Epic has repeatedly stated that UObjects are not safe to use outside of the game thread, I’ve just never investigated to what extent this is the case.

Yet another potential lead for multithreading is that scene components support using a “custom location” through bRequiresCustomLocation. You could turn that on and make have your missile’s root components return their location as known by the missile manager instead of RelativeLocation as stored in the component. Hell, you could take this further and, similarly to what kamrann suggests, have a single actor manage all your missiles while still using StaticMeshComponents for each of them.

tl;dr Most likely this will leave you more confused than you were before – just remember this is basically a long term overview of possible optimization paths. It may well be that you don’t need to go that far. I certainly haven’t seen enough missiles flying around in that video of yours to justify anything this aggressive, I’ve been approaching this with the idea of 1000+ missiles in mind.

Also, multithreading might sound scary, but once you get past the inital hump of learning the terminology, synchronization and the common pitfalls, it’s not that different from single threaded programming. The above algorithms wouldn’t even require thread synchronization, just a shared counter for the position in the missile buffer, that each worker thread would update using FPlatformAtomics::InterlockedIncrement.