How to efficiently simulate hundreds of instances of something at once?

Jambax · March 24, 2015, 7:32pm

I’m wondering if somebody can give me a mini-lecture on simulating huge instances of objects. I’ve posted about this on twitter and got some promising ideas, but am yet to really come up with something I can get my teeth into.

I’m attempting to make a very simple heat-tracking missile of some kind. The behaviour of the missile is extremely simple and mimics a ‘Fire-And-Forget’ style behaviour. It simple scans ahead of itself in a cone shape, finds all actors with a certain actor component I’ve created (GameObjectComponent), and reads a ‘Heat’ float variable, a ‘Team’ number and it’s current distance to the target. Based on this, I just want the missile to pick the best target in it’s field of view and home in on it.

The logic of adjusting velocity and homing in on a target is relatively simple and Epic even gives an example of it in their own code. The hard part is efficiently picking a target to home in on. My current method is pretty basic and not very efficient at all, I have a huge (invisible) collision cone out of the front of the missile that looks for overlaps, and from that gets all of the actors with a certain Actor Component and gets the variables, it sorts through them all and picks one to home in on.

The thing is, it’s acceptable for say 1-10 missiles, but anything more than that and I start hitting slow-downs. The worst part is that because I’m using a collision overlap to find targets, it’s very unscalable, so the more missiles I add, the worse it’s going to get. I’m basing the game off of a much older game from around 1999, and although it doesn’t have the same complex physics/graphical overhead of Unreal, it still manages to simulate several hundreds (possibly thousands) of these missiles at once, and accross a multiworld-lockstep style network connection too.

–

I’m not massively experienced with programming but I know enough to know that I need some kind of very fast look-up for the missiles to be able to find a target really quickly and fluidly. I want to be able to simulate hundreds if not thousands of these things without batting an eyelid, so how would I go about setting up an efficient system for this? I need it to work in multiplayer, and in the game I’m basing it off of, the missiles don’t seem to follow any kind of group/leader behaviour, they really act like individual missiles.

I expect I might have to get my hands dirty with some advanced programming techniques to do this, like storing an array of all of my custom ActorComponents in the world, maybe in a quad-tree (no idea how to implement that) to look them up based on distance etc. I’ve just got no idea where to really start, and what I should be looking at, so any help at this stage is most welcome!

kamrann · March 24, 2015, 8:21pm

It’s no doubt the use of a collision cone that is causing your performance problems, it’s total overkill for something like this. Essentially all you need to do is as follows:

For each candidate target actor:

Get the vector from missile to candidate, call it RelLocation.
Take the dot product of RelLocation and the missile forward vector.
If Dot <= 0, skip this candidate.
Get the SquaredDistance = RelLocation.DistSquared(). If > MaxDistanceSquared, skip.
Get Angle = FMath::Acos(Dot / FMath::Sqrt(SquaredDistance)). If > MaxAngle, skip.
Score the candidate and save if it’s the best so far.

I think you’ll probably find doing that alone would allow you to deal with fairly large numbers. If you still need to optimize further, you can do a couple of things.

Only update the missiles as often as is really needed. You could batch them, do a chunk one tick, next chunk the next tick, etc.
Keep a spatial data structure of your candidate targets like you mentioned. There’s an octree in the engine code, though I couldn’t make much of it when I took a look a while ago. Even just using a regular 3D grid could make a big difference though if you have heaps of targets as well as missiles. The target actor component would update which grid cell its actor was in in the component tick. Then when you’re updating a missile, you can test first against grid cells - if the cell doesn’t intersect the missile’s cone, you don’t bother testing the actors inside it.

anonymous_user_4fe9f58b1 · March 24, 2015, 8:28pm

Basically, anything you would normally have to do for costly queries. Lots of optimizations, staggered updates, asynchronous updates, etc.

For the optimization path, do you do any checks before performing the cone overlap? While overlaps are one of the cheaper physics queries, they still have a decent overhead if overly large. While I’m not privy to the innards of PhysX, it’s fairly clear they already have heavy amounts of spatial optimizations. So you’d have to either whittle down the overhead from your PhysX requests, or reduce the amount of requests performed altogether.

Depending on how you built it, a cone can be a very expensive collision primitive. An actual convex mesh in the shape of a cone will be insanely more expensive to use than a plain sphere. It would likely be faster to overlap a sphere and discard results outside of a certain dot product (the cosine of your cone half-angle).

If you’re tracing the World Dynamic channel and you have lots of actors that shouldn’t be considered for homing, then you’re getting overhead from useless checks. You would likely want to create your own collision channel and only enable it for the actors you do want missiles to home in on.

On the game side of things, it could be worth manually iterating your actors rather than using an overlap. If you have a limited subset of actors to home in on, you could maintain an array as you’ve said and just iterate through it. If you have a limited radius, compare against distance squared first. Then filter out using a cheap dot product check equivalent to your cone (be sure to cache your cosine to avoid recalculating it on each cone check). Then do any extra checks you might have, unless they happen to be cheaper than the dot product.

You could also use a spatial structure, yeah. For a 3D scenario, you’d be looking for an octree, not a quadtree. The engine includes a generic octree you can use, TOctree in the conveniently named GenericOctree.h file. It’s not a simple concept to wrap your head around, but essentially what you would be doing is inserting all your actors in the octree, then insert a node for eahc missile of the size you want to check, and loop through the children of that node. For lots of moving items, you’d have to rearrange the tree regularly, which might or might not turn out to be a gain in the end.

There’s the simple solution of staggering updates. Let’s say you want each missile to update its homing once per second, then you’d set up their tick so that it only occurs that often. In order to avoid sudden spikes from lots of missiles updating in the same frame, you’d need some sort of mechanism to stagger those updates, likely a manager or some wrapped value that is assigned to each missile in turn.

A centralized missile manager would also allow you to time box your updates. Say you want missile update to cost at most 2ms per tick, have your missile manager keep track of when its tick starts, start looping through the missiles, and stop as soon as the alloted budget has elapsed. Since world time is only updated between ticks, you’d have to use platform time:


uint32 CycleEnd = FPlatformTime::Cycles() + uint32((kMaxTimeSliceMS*1.0e-3) / FPlatformTime::GetSecondsPerCycle());
while( FPlatformTime::Cycles() < CycleEnd )
{
	UpdateNextMissile();
}

Lastly, there’s async updates so you don’t end up stalling the main thread while these updates occur. This could be tricky, because a lot of functions are not safe to call outside of the game thread. A hardcore approach would involve batching all your missile updates in a separate thread, double buffer this so the game thread can continuously update positions and other relevant information, then lock and flip the buffer whenever you’re done with an update. This is obviously glossing over a lot, but I need to get back to work and I’m thinking it probably won’t come to that.

Also, it’s worth mentioning that PhysX traces all have async equivalents, such as UWorld::AsyncOverlap. I haven’t used those yet but it looks straightforward enough.

Slayemin · March 24, 2015, 9:25pm

I wish I knew more about the internals of UE4’s scene management. Do they use octrees to process object collisions or not, and if so, do we get access to it? I just haven’t needed to look into the internals yet and I probably wouldn’t know where to start digging for it yet. I’m assuming they use something really efficient, so it may be unnecessary to re-implement something like an octree if the backend already contains one which you can query against.

As for optimizations, I’d also consider setting a “target” reference within the heat seeking missile. It’ll lock onto this object at the beginning of its lifetime. Your “seek and relock on nearest heat signature” update can be run once every few seconds or at a random interval to fake the appearance of real time target switching. The fewer times you have to find a new heat signature, the better performance. You can also create a sphere centered on your missile. You know the missiles orientation vector, so you can use a bit of vector and angle math to find which objects lie within the sensory cone by comparing angles.

If you want, you can also try to avoid refinding valid heat targets every frame. Find those targets once every 5 seconds or so and store them in an internal list. Then process your seek AI against that internal list more frequently. You don’t have to run the expensive “find nearby stuff” method each frame, but you can still track them with a local list. Make sure to check whether these objects are valid though! if an object dies, it may still exist in your missiles list of known objects.

Jambax · March 24, 2015, 9:39pm

Yeah the target selection is the easy part, it’s more getting the list of targets and iterating through them as quickly as possible. The easiest way I can think of that’s more optimized than a collision cone is a massive array that stores all of the ‘GameObject’ actor components. My only worry is that that array itself can grow to a huge size, probably over 1K in some circumstances. Iterating through all of those and getting the closest ones within the cone would be slow, so I landed on the Octree idea. Problem is, I’m not really sure of how to go about implementing those things, still fairly new to programming.

I don’t do any checks as such, but the cone itself is huge, up to 8K units long by 4K units wide, so that it has enough time to hit the target (they also travel pretty quickly, since they’re at a vehicle-scale, not a character-scale). The code quite literally get’s all overlapping actors and iterates through them to find the best target, but there are casts and for loops and just ugh. Too much codez, I don’t really want to stick with the physics / overlap route since it just seems like such a bad way to go about it.

The cone itself is indeed a mesh, with one triangle but a cone-shaped collision. I just set it’s 3D scale to match the missiles cone settings. The sphere/angle approach is interesting, I always thought that reducing the area-of-effect for the collision would make it somewhat cheaper.

Almost every actor in the world that I can interact with in some way will have the ActorComponent. Things like pickups, vehicles, characters, buildings etc, maybe even other pieces or ordnance in some cases. Manually iterating over the actors definitely seems like the best way to go about it since it avoids using the collision system altogether, but to make it uber efficient so I can simulate hundreds (maybe thousands!) of them at a time, it would also be good to cull out the ones that can never be considered targets since they’re just so far away. With an Octree I might be able to do something like that maybe… if I can figure out how on earth to implement one. The advantage is that I can use that Octree for other future endeavours too, like AI stuff etc.

The manager approach is also interesting, I think a combination of those things could make this work. Guess I’ll pester the tech manager at work about setting up octrees later down the line

By the way just to help explain the situation, this is the game in question (from 1999), and essentially what I’d like to be able to do. I couldn’t find a good way to show you the missiles dynamically changing targets as closer / hotter objects moved into their field of view (imagine a flare / countermeasure or something). I have however switched off the smoke to keep the framerate up, and I’m spawning 500 everytime I hit the mouse button. I even added some random movement to each missle as well. Stays way above 60 FPS. Obviously this game has the advantage of being programmed to do it’s own specific task so it doesn’t have any engine overhead, but it is surely possible to achieve.

It DOES however, run at 10 TPS (turns-per-second), and smoothing is done in-between the turns. I imagine this is how it got away with doing this kind of stuff in 1999 when the hardware wasn’t up to the par it is today. Picture this + 100’s of units with their own AI, and ten players all shooting hundreds of bits of ordnance at each other all the time. Pretty hectic.

I’m already hitting latency issues with just 30-40 straight-flying bullets in UE4 (using the default replication method), so if anything this project of mine is going to teach me a lot about netcode and optimization…

kamrann · March 24, 2015, 9:57pm

Yep, understood. However, my point was that (I would guess) your current bottleneck is the overlap tests. The pseudocode I posted is essentially doing a ‘Point inside cone’ test, which is a great deal more simple than a cone/sphere overlap test (especially if it’s a mesh approximation to a cone!). So what I was suggesting is that you try just removing the collision shape from your missile, and for now just use a TActorIterator on the world to give you your candidate set (ie. everything). Sure it’s super suboptimal, but you may find that doing a much simpler test on a larger candidate set works out to be a lot more efficient than expensively maintaining a small candidate set. Add to that some of the more simple optimizations suggested, and you may not even need to bother with spatial subdivision (though if you eventually want thousands of both missiles and targets, you probably will).

Jambax · March 24, 2015, 10:04pm

kamrann;256026:

Yep, understood. However, my point was that (I would guess) your current bottleneck is the overlap tests. The pseudocode I posted is essentially doing a ‘Point inside cone’ test, which is a great deal more simple than a cone/sphere overlap test (especially if it’s a mesh approximation to a cone!). So what I was suggesting is that you try just removing the collision shape from your missile, and for now just use a TActorIterator on the world to give you your candidate set (ie. everything). Sure it’s super suboptimal, but you may find that doing a much simpler test on a larger candidate set works out to be a lot more efficient than expensively maintaining a small candidate set. Add to that some of the more simple optimizations suggested, and you may not even need to bother with spatial subdivision (though if you eventually want thousands of both missiles and targets, you probably will).

Thanks! I’ll certainly try that approach first, I didn’t like the idea of huge invisible collisions on the missiles, it seemed so hacky.

Also @, that code snippet for the Platform-based timer, is that literally referencing the system “clock”? Is it possible to get a delta value from that too? My physics simulation for all the hovering vehicles for example could also reach a point of being quite expensive, that too might be something I batch into a managed update function soon, but I’d need to pass in a delta value to correctly adjust all the forces. Well, I guess I wouldn’t really since that value shouldn’t ever really change…

Edit: Oh an one other thing, how does one go about Caching a Cosine?

kamrann · March 24, 2015, 11:01pm

He’s referring to simply working out the cosine of your cone’s angle only once, since it doesn’t vary. Which makes me realise my pseudocode is suboptimal. Better to compare cosines instead of comparing angles. Step 4 reduces to:
if(Dot < ConeCosine * FMath::Sqrt(SquaredDistance)) then skip.

Where you calculate ConeCosine = FMath::Cos(ConeHalfAngle) once and store it. By the way in case you’re wondering, this just comes from the dot product formula A dot B = |A||B|(Cosine of angle between A & B).

Thanks !

Jambax · March 24, 2015, 11:29pm

Oh okay I see what you’re saying, makes sense now!

anonymous_user_4fe9f58b1 · March 25, 2015, 2:43pm

It’s she, by the way.

And yeah, I was going to point out that you don’t need to do an arc cosine since you don’t actually care about the actual value of the angle, only rule it out. And since you’re already doing a dot product check, no need to first check for negative dot products since the cosine check includes the sign check.

You’ll likely want to further condense your algorithm by combining your target selection with the target culling. For instance, I assume you’ll pick a homing target based on things like proximity, angle, etc. Since you’re already dealing with those when filtering candidates, you can do apply your selection heuristic within the filtering loop and end up with only your best candidate instead of a list of candidates.

Yes, FPlatformTime is just the cross platform wrapper around the system clock. It is what the engine uses to update the world time every tick and calculate the DeltaTime as well. See UEngine::UpdateTimeAndHandleMaxTickRate(), which updates FApp times, which in turn is used by UWorlds to update their own values. FApp has a DeltaTime but it is only updated once per tick, so if you want finer grained deltas, you’ll have to compute your own using FPlatformTime requests.

The snippet I grabbed is from our other engine programmer who is much more old fashioned, so it uses cycles, but you can also just get plain old seconds through FPlatformTime::Seconds(). Bear in mind that comparing cycles might be slightly faster since it is an integer rather than a floating point value.

kamrann · March 25, 2015, 7:16pm

My bad. I had assumed so, but recently saw someone post referring to you in the masculine. I made the mistake of assuming they knew better! Apologies, I can imagine that could get pretty annoying.

As for the negative dot product check, it’s true it’s logically superfluous however I believe it’s faster in practice. It should generally cull 50% with a simple comparison, thereby saving doing unnecessary more expensive checks such as a sqrt in those cases. I’m behind the times in terms of technology so I suppose it’s possible this kind of thing isn’t helpful on modern hardware, though I’d be surprised.

Jambax · March 25, 2015, 9:58pm

Thanks for all the help folks!

I’ve started looking at Quad & Octrees. A Quadtree might actually be beneficial enough for me, since I can just generate it from a top-down perspective. I think the extra processing and overhead of the Octree might actually make it slower in the long run, given that most of my objects will always be in near enough the same Z-altitude anyway, and therefore I probably won’t get a great deal of benefit.

So, say I do a Quatree, containing pointers to all of my GameObjectComponents (and of course, the locations of the owning actors in World-Space). I’m predicting into the future here, but I imagine I’m going to want to access this big list of sorted actors a lot in the future. Say I generate more weapons or AI that needs to tie into this stuff, does it mean I can just have one big list of GameObjectComponents in a Quadtree variable? Obviously, I’m fairly new to this.

The way I imagine it works is, I create a ‘Quadtree’ class, and store a Quadtree variable somewhere that’s globally accessible, so maybe a GameInstance class or something? Now I imagine that this Quadtree that everything references from could only exist, or at least only be used Server-Side, since replicating the QuadTree would just be horrific, especially if it’s changing frame-by-frame? That said, it could be that I let Clients generate their own Quadtrees and rely on replicated data from the Actors themselves to put everything in the correct position… though I’m not sure if that’s very safe.

So, everytime I spawn or create a GameObjectComponent in the world, it’s added to this big global Quadtree. Now, any outside actor can just get all nodes within a radius and the objects within them, to perform functions on? Is that the basic gist of it? Found a good article here, it’s done in Java but the languages are pretty close.

To be honest, my worries about optimizing the amount I can do locally is subsiding thanks to this new knowledge, my main concern is the replication of this stuff. The engine struggles to keep up with replication with less than 20 straight-flying bullets on screen using localhost, so how it’ll manage this I have no idea.

anonymous_user_4fe9f58b1 · March 26, 2015, 3:14pm

No worries. is actually a gender neutral name historically, though recently it’s been pretty much feminine in the French speaking world.

By normalizing both vectors, you remove the need to adjust the dot product as it is already the cosine of the cone angle. Normalizing (through GetSafeNormal, not GetSafeNormalPrecise) uses SSE extensions that are a fair amount faster than a full on square root, at a negligible cost in accuracy. There’s a lot of factors in play when dealing with optimization, removing one extra check could actually speed up the process if branch predication is happening, etc. etc. The only way to find out would be for TheJamsh to profile and optimize as needed.

TheJamsh, the situation you described sounded like a space shooter, so I had assumed that your missiles and actors would have a lot more Z variety to them. You’re right, a quadtree would likely be enough. I’d recommend trying plain old optimizations before you go down that route, though! There’s that old adage about premature optimization, and you may well find that a quadtree wouldn’t be necessary after all.

This is just my gut feeling and you’ll have to profile to find out, but if you need to further optimize, I think you might get bigger gains from working to get your missile calculations into a more cache friendly algorithm instead. As it is, each missile actor requires a pointer dereference so getting the location each for each missile is not cheap. By instead storing all the locations and direction vectors needed into a contiguous buffer, you could see a huge gain. Plus, if done in a thread safe fashion, you could actually parallelize the use of that buffer using a few worker threads. The game thread would end up feeding locations to this buffer (likely a double buffer) and have your worker threads consume this data, then have the game thread flip the buffer and update locations as needed. It’s fairly grueling low level work, mind you, and I may be too much in the mindset of console optimization vs PC.

But if you just want the exercise in implementing a quadtree, then by all means, go hog wild.

For the networking details, it’s ultimately up to how you want to simulate them. If the server is the only network authority, then it would likely have to do the calculations itself. You could reduce replication load by sending less updates for missiles, and have clients keep a local version of the quadtree to interpolate between updates. It’s the sort of thing that’s tricky to get looking smooth, though.

kamrann · March 26, 2015, 9:25pm

Interesting to know that this doesn’t make use of a standard sqrt call, I wasn’t aware of that. Yeah, like you say, with tweaks of this size it’s all about trial and error in the end. I guess this stuff is getting harder and harder to predict as hardware gets more complex.

@TheJamsh
Please report back with your progress on this if you’re willing. I’m interested in this kind of thing, but I’ve forgotten a lot of what I used to know when I was working in graphics years ago, so happy to both try to help and also learn.

illYay · March 26, 2015, 10:27pm

I’d think that using scene queries should be pretty fast and done by the physics engine. Maybe you can instead use a simple box since that is the most basic collision shape. A sphere might work too.

There should be a scene graph inside PhysX if Unreal doesn’t override it with their own somehow, and it should be relatively fast. I doubt you need to implement your own octree otherwise something as simple as view frustum culling wouldn’t even be scalable.

Using large scene query objects results in large results per missile to sort through, so if you try to narrow it down to smaller areas being checked per missile should make it faster. Now that I think about it, a cone may be better for this. Maybe your cone is too broad, but then you’d miss objects. Maybe a sphere is good but make the range not too distant.

You could also try using a less strict sort if possible, like not choosing the closest object, but instead something that is somewhat close.

It could also be that the missiles work in groups. Like say you fired 100 missiles. It may be that a swarm of 10 of them stay close together and there’s one swarm manager or something that tracks a target. Each individual missile just goes to the target. I’m not really sure what the game you’re talking about is, (although your video really reminds me of Mech Warrior 3 which is like the best one IMO!!! anyway) it’s possible that all missiles go to the same target and there isn’t a check being done per missile at all. If it is MW3, there were weapons like LRM 20’s which only fired 20 missiles at once, not hundreds, and those required the mech to lock on once.

Jambax · March 27, 2015, 12:22pm

Re-writing this, forums ate my original post.

cmartel;257438:

This is just my gut feeling and you’ll have to profile to find out, but if you need to further optimize, I think you might get bigger gains from working to get your missile calculations into a more cache friendly algorithm instead. As it is, each missile actor requires a pointer dereference so getting the location each for each missile is not cheap. By instead storing all the locations and direction vectors needed into a contiguous buffer, you could see a huge gain. Plus, if done in a thread safe fashion, you could actually parallelize the use of that buffer using a few worker threads. The game thread would end up feeding locations to this buffer (likely a double buffer) and have your worker threads consume this data, then have the game thread flip the buffer and update locations as needed. It’s fairly gruelling low level work, mind you, and I may be too much in the mindset of console optimization vs PC.

To give you some idea of my lack of expertise, most of that paragraph went over my head haha, but I know what you mean about the premature optimization stuff. When you mention the contiguous buffer, do you mean purely storing the values somehow, rather than the actual actor component references themselves? Almost sounds like generating an influence map of some kind or something.

I suppose the first step is out of the way, I know that iterating over the actors is a gazillion times faster than doing huge collision cones on each missile, so avoiding the physics system is pretty much a given now, and I kind of prefer it that way, it seems more solid. Avoiding Physics gives me a lot more control.

If you read above in regards to the cone, that’s exactly what I did. Unfortunately in reality it’s actually a LOT slower using Physics Queries to get the actors, and the problem is that it’s not very scalable either. Even using the collision primitives wasn’t that much quicker. I can’t batch that kind of behaviour together either so it’ll always have an upper limit on what it can do, and the performance varies by a number of factors. The other problem, is that for the missile to have enough time to react and change it’s behaviour, the query radius needs to be massive, at the moment the cone is 8K * 4K units in size to give the missile the desired behaviour and even then it’s still not quite good enough.

As for group missiles etc, I want to avoid that simply by design. The point of this is to be able to simulate so many different objects individually as fast as possible. The networking side will sure be a challenge, currently I just rely on replicated movement from the engine, but that in itself is not very optimized, and it struggles to cope with more than a few actors using it. To be honest, I have massively over-scoped myself by trying to do Planetside-2 style networking when I know no C++ outside of what I’ve learnt from the Engine, but hey it’s a fun learning exercise!

EDIT:

Oh I also have relatively loose ties with one of the original programmers of the game, the above test was done in the latest version. Running that test in 1999 on a mid-spec box wouldn’t have yielded the same results. He apparently wrote a lot of SSE Intrinsic functions (thanks MSDN for telling me what they are), not specifically for missiles but for the game in general, which he says allows him to do a lot more work in parallel, and ultimately allow for stuff like this.

anonymous_user_4fe9f58b1 · March 27, 2015, 3:49pm

Yes and no. It’s basically being aware of how CPU cache works and coding in a way that makes the best use of it. It’s a broad topic, but the jist of it is that memory was not created equal. A RAM access is incredibly costly because the CPU has to wait many cycles before being able to use the read value. While the CPU can sometimes schedule other things while waiting for the RAM access, it’s not always optimal. So the CPU cache is a layer in between RAM and the CPU that is much faster and allows access in just a few cycles. It’s also fairly limited due to an exponential cost in storage the closer it is to the CPU. L1 and L2 cache, for instance, are generally measured in terms of kilobytes, but are incredibly fast (single cycle reads for L1). (As a related anecdote, one of the reasons AMD has been lagging behind in terms of CPU performance is because of the comparatively slow architecture of their cache compared to Intel’s.)

When you read from memory, the architecture assumes you’ll likely be reading close to it, so an entire “row” of memory gets pulled into each cache level for subsequent reads. If the memory you are reading is not in L1 cache, then L2 is tried, then L3, and so on. Subsequent cache levels sometimes exist, but in current technology, L3 is the last on-chip cache level. Failing to read from the cache and having to pull from memory is called a cache miss, and the ensuing stall that comes with it is undesirable for high speed algorithms.

Being aware of this, you can plan algorithms so that your data is contiguous and causes less cache misses. Since actor locations is stored in its root component and those actors are scattered all over the place in memory, it is very hard to cache this data. On the other hand, if you were to create a special buffer where you copy all those locations, then you end up in a situation where the CPU can just walk through its cache and churn out all its calculations with little to no cost in read cycles. An added advantage in multi core CPUs is that L2 and L3 caches are normally shared between cores, so if you have multiple worker cores working on the same algorithm, the same data will be there for all of them to access.

Making the best use of caching is a tricky thing but it’s one of the biggest gains when optimizing. Profilers for console platforms tend to have a lot of information on memory access, it’s trickier on desktop platforms because the game doesn’t have exclusive access to the CPU and the OS can schedule time away from it. Visual Studio’s profiler provides information about cache misses, though I haven’t used it for that on Windows yet.

Anyway, if it comes to that level of optimization, you’ll be in for a treat. In the meantime, use UE4’s and try to eke out every little bit of performance.

Also, when illYay mentioned querying PhysX’s scene, it wasn’t meant as actual traces as much as trying to harness whatever spatial organization structure it is using internally. I’m not familiar with its internals, but I do know that static collision is baked into some sort of octree. I don’t know if dynamic objects are also stored within that tree or if they use a different mechanism. In any event, that information is all tucked away behind the physics interface, so getting access to that information is likely fairly challenging.

Jambax · March 28, 2015, 1:12pm

Well, most of this is going way over my head but I’ll try and boil it down to psuedo-steps so I can start with this

What I think you mean is, store an array of pointers to ActorComponents, and their locations in that array too. Store that somewhere, say the GameInstance, and have the missiles read from that? At which point, the missiles first check the locations, then does some Math to work out if they’re valid targets, then if they are it then goes through the pointer to figure out which is the best target based on it’s distance and heat.

My question now is, does that Array manage itself in terms of the locations stored in it? It seems expensive to call GetActorLocation() on those items constantly.

I’ve probably massively misunderstood, I’m still fairly new to programming in general!

Jambax · March 28, 2015, 5:18pm

Okay, here’s the fastest method I have so far, it seems to work pretty well. I can simulate around 200-250 of these and keep FPS above 60. I’m using a search radius of 50K units and a Cone Angle of 0.4 for the moment (Radians). That’s on a fairly high-spec machine though, (i7-3820K + GTX 980).

I think with a manager class that monitors the update times of the missiles, I can get that a little higher (not sure where to start on that though). The ObjectInterators are pretty useful, though I need to find a way to have it only scoped to objects that are actually running in the GameWorld, as ObjectIterator seems to pick up anything from objects in the Content browser to objects in disconnected instances of the game…

One other hit I’m noticing, is the collisions when the ordnance actually hits it’s target object. It’s more noticeable when I have around 150 hitting at once (which is going to be rare, let’s be honest) It’s either the collision itself or the spawning of the effect actor. I wonder if it’s possible to do that collision check using the ASync scene?

Missile.cpp



// Copyright James Baxter 2015

#include "BZGame.h"
#include "BZGame_Missile.h"

ABZGame_Missile::ABZGame_Missile(const FObjectInitializer& ObjectInitializer)
	: Super(ObjectInitializer.SetDefaultSubobjectClass<UBZGame_MissileMovementComponent>(ABZGame_Ordnance::ProjectileMovementComponentName))
{
	UpdateRate = 0.1f;

 	ConeAngle = 0.4f;
	MaxSearchRadius = 50000.0f;
	ConeCosine = 0.f;

	MovementComp->bIsHomingProjectile = true;
}

void ABZGame_Missile::BeginPlay()
{
	Super::BeginPlay();

	/* Start Timer */
	GetWorldTimerManager().SetTimer(TimerHandle_ConeCheck, this, &ABZGame_Missile::UpdateTarget, UpdateRate, true);
}

void ABZGame_Missile::PostInitializeComponents()
{
	Super::PostInitializeComponents();

	ConeCosine = FMath::Cos(ConeAngle);
}

void ABZGame_Missile::UpdateTarget()
{
	/* Store to prevent multiple function calls (isn't location accessible anyway?!) */
	FVector MyLocation = GetActorLocation();

	/* Reset the Highest Target Factor */
	HighestTargetFactor = 0.f;

	/* For all GameObjectComponents In the World */
	for (TObjectIterator<UBZGame_GameObjectComponent> Itr; Itr; ++Itr)
	{
		AActor* GOCOwner = Itr->GetOwner();
		if (GOCOwner)
		{
			FVector TargetLocation = GOCOwner->GetActorLocation();

			/* Discard Instigator and any actors being destroyed */
			if (GOCOwner == NULL || GOCOwner == Instigator || GOCOwner->IsPendingKillPending() || !GOCOwner->GetWorld())
			{
				continue;
			}

			/* Calculate Distance - Discard Targets Outside of Search Radius */
			float Distance = FVector(MyLocation - TargetLocation).Size();
			if (Distance > MaxSearchRadius)
			{
				continue;
			}

			/* Calculate Dot Product / Angle to Target. Discard Target outside of Cone Radius */
			float Dot = FVector::DotProduct(GetActorForwardVector().GetSafeNormal(), FVector(TargetLocation - MyLocation).GetSafeNormal());
			if (Dot < ConeCosine)
			{
				continue;
			}

			/* If we passed all these checks, consider using this actor for the target */
			float TestFactor = Itr->Signatures.HeatSignature;
			if (TestFactor > HighestTargetFactor)
			{
				HighestTargetFactor = TestFactor;
				MovementComp->HomingTargetComponent = Cast<USceneComponent>(GOCOwner->GetRootComponent());
				continue;
			}
			else
			{
				continue;
			}
		}
		else
		{
			continue;
		}
	}
}

kamrann · March 28, 2015, 6:13pm

A few minor things with your code.

This is speculation, I haven’t looked into how it works, but it may be faster to use TActorIterator and check each one for having the relevant component, rather than using an object iterator. That way the issue of finding objects not in the world is also taken care of.
You’re calculating the difference vector twice, so cache it: RelativeLocation = TargetLocation - MyLocation.
Compare the squared distance rather than the distance. Precalculate MaxSearchRadius * MaxSearchRadius, then compare that with RelativeLocation.SizeSquared().
GetActorForwardVector() will already be normalized, so just use it as it is.
Since your search radius is big, you’ll do most culling through your cone test, so it’s probably best to put that before your distance test.

Even then there would still be a little redundancy in there using both SizeSquared and GetSafeNormal, which you could get rid of, but I think that would come under premature optimization at this stage.

Still, just a couple of hundred seems like a low limit to be getting. What kind of number of actors with the component do you have? Also, you removed the cone collision from the missile, but presumably still have some collision component for testing for target hits. Is it just a sphere? Can they collide with each other? And what about on the target actor/component side? It’s possible you’re still being limited by the collision overlap testing.