Best Practice - Data Actors or Giant Struct Arrays?

Zeustiak · April 1, 2015, 10:34am

I thought about posting this in answerhub but I feel a better discussion could be had here.

I am pondering the best way to store and retrieve data. I need to store lots of specific data for up to 40,000 tiles. Each tile will have information related to it’s terrain, neighbors, units, improvements, cities, which mesh to use, etc, etc. Dozens of data points that any given tile can have.

I am looking at 2 main options for storing and accessing this data:

Data actor blueprints. 1 Blueprint instance per tile, and each spawned blueprint holds all the data the tile will ever need. Up to 40,000 of these might need to be spawned into the world at any given time.
Giant, all encompassing Struct Arrays. Perhaps 1, or a small number of Struct Arrays that each hold all the data for all the tiles. Anytime tile data must be accessed, these giant arrays would also have to be accessed.

So of the 2 options, which would be the most efficient? Are there any viable alternatives to these 2 methods?

Monokkel · April 1, 2015, 11:14am

I’d like to get this discussion going as well, as these questions concern my project. For my large tile maps, I’m personally using multiple giant arrays for the turn based tile toolkit and it does seem to work very well. I have not attempted to use data actors, however, so I’m not able to compare the two directly. I remember there was already a bit of discussion about this a year ago when I started working on my project (between you, Ian Shadden and others) where I believe you landed on using actors. For me, following my gut since I had very little programming experience, it seemed more efficient to store data in arrays and thus not having to go back and forth between actors in the level and variables in a blueprint. Ian Shadden’s turn based strategy system proves that actors are certainly a viable way of doing it, and our pathfinding algorithms are both efficient and quick even though we use different solutions.

As for my own testing with arrays, I discovered that getting data from large struct arrays and arrays of actors is slightly slower than using simpler arrays. This difference is very small, however, with less than a 10% speed decrease for setting items in a struct array with structs containing three variables compared to setting items in an integer array. This was still enough to convince me not to use a single array of huge structs, as I was obsessed with making the most efficient pathfinding possible. Still, I tried to keep the amount of arrays the size of the entire map to a minimum and only use simple integer arrays for arrays that would be get and set hundreds of times per tick during gameplay.

It is worth noting that I’ve only used blueprints, C++ not being allowed in the marketplace, and for these sorts of things it makes a lot more sense to use C++. Speed differences between the different methods might be almost nonexistent in C++, which might mean your only concern is to find the method that uses the least memory. I do not know whether it is more memory efficient to use one array of two-variable structs compared to two arrays of single variables.

I look forward to seeing what thoughts others on the forums might have on the question you’ve presented.

Zeustiak · April 1, 2015, 1:22pm

Monokkel;262034:

I’d like to get this discussion going as well, as these questions concern my project. For my large tile maps, I’m personally using multiple giant arrays for the turn based tile toolkit and it does seem to work very well. I have not attempted to use data actors, however, so I’m not able to compare the two directly. I remember there was already a bit of discussion about this a year ago when I started working on my project (between you, Ian Shadden and others) where I believe you landed on using actors. For me, following my gut since I had very little programming experience, it seemed more efficient to store data in arrays and thus not having to go back and forth between actors in the level and variables in a blueprint. Ian Shadden’s turn based strategy system proves that actors are certainly a viable way of doing it, and our pathfinding algorithms are both efficient and quick even though we use different solutions.

As for my own testing with arrays, I discovered that getting data from large struct arrays and arrays of actors is slightly slower than using simpler arrays. This difference is very small, however, with less than a 10% speed decrease for setting items in a struct array with structs containing three variables compared to setting items in an integer array. This was still enough to convince me not to use a single array of huge structs, as I was obsessed with making the most efficient pathfinding possible. Still, I tried to keep the amount of arrays the size of the entire map to a minimum and only use simple integer arrays for arrays that would be get and set hundreds of times per tick during gameplay.

It is worth noting that I’ve only used blueprints, C++ not being allowed in the marketplace, and for these sorts of things it makes a lot more sense to use C++. Speed differences between the different methods might be almost nonexistent in C++, which might mean your only concern is to find the method that uses the least memory. I do not know whether it is more memory efficient to use one array of two-variable structs compared to two arrays of single variables.

I look forward to seeing what thoughts others on the forums might have on the question you’ve presented.

I am still using giant arrays just as you are. Most of them aren’t even combined into a struct. They get the job done.

Data actors seem like they could have some potential performance benefits, because all the tile data can be stored in one object, that is referenced in 1 array. Lets say I need Elevation, Precipitation, and whether the tile has a river or not… that could be 3 gets from 3 80K member arrays. With a data actor, you can do 1 get from a 80K member array, and then 3 smaller gets(from a plain variable) for the specific information held in that actor. Maybe a cast or some other trivial stuff in there too.

How many arrays do you use when pathfinding? Just a tile cost array? Do you create other arrays to hold the distance from the end point for each tile? If you only need to look at 1 array, and the rest is math, then I can see data actors not being useful in that situation. If there is more than 1 bit of information, and you can get that information at the same time you are already in the actor, then there should be performance benefits, theoretically.

For cases where I only need a single array with a single bit of data, I will still have the giant arrays hanging around. If having 80K(worst case for me) data actors sitting in the world isn’t a big deal, then that could be a good way to speed certain other actions up. Hopefully.

Oh btw, I spawned 500KB data actors into my map in 40K increments, and it didn’t seem to even affect anything until I got to several hundred thousand. Then there were hiccups every few seconds where FPS would drop to 0. Most of the variables in the BP had data in them, but they were all the same data, so I might have to try randomizing it for each instance to see if some kind of batching made it easier on performance.

Monokkel · April 1, 2015, 2:28pm

When pathfinding I use one cost array as well as an integer array holding the index locations of pawns, so I Get from two different arrays each step of the search. I also Get from a vector array when placing the instanced meshes displaying which tiles can be moved to.

So I just did some testing. I first made two forloops. One where I added the index of the forloop to six different float arrays and one where I added the index to six different floats in a struct array. Adding to the struct array took approximately half the time of adding to the six float arrays. I then ran a foreachloop on the struct array where I printed the values of all the floats in each struct and another loop where I printed the floats from all the six separate arrays. The time this took was identical in both scenarios.

From this, it seems to me that getting from multiple arrays when doing pathfinding does not matter, and is just as quick as getting from a single struct array. Adding to a struct array is quicker than adding to a number of arrays equalling the number of variables in your struct. However, setting a single variable is quicker with simple arrays than with struct arrays, as you have to get and reset all the variables in each struct each time you want to change a single one. I rarely add many variables to different arrays at the same time, while I often set a single item in an array, which means that keeping multiple arrays still seems like the superior solution in my case. So to reiterate your example of three gets from 3 80K arrays should be just as quick as getting all the three variables from a 80K struct array if my results are correct.

Using actors, this is different, since you can change a single variable without resetting all the others. This might be the best of both worlds performance wise, but that presupposes that casting to and getting actors in the game world is very quick and efficient. Ian Shaddens pathfinding seems to indicate that this is the case, but I’m unsure what solution would be quickest in a direct comparison.

That you got little performance drop from spawning so many invisible actors is promising. However, I will only go that route if there are significant performance gains in using them. I guess to me placing thousands of actors feels a bit inelegant when you could instead have it stored in arrays, though such matters of taste should probably be left out of this. That performance drops with very high numbers also shows that the invisible actors do have an impact, even if it is small, and even though it takes a lot of actors to get there I’m assuming you’re using a pretty beefy computer.

I am perhaps a bit biased in this since I’ve been using arrays and would have to do some major changes if invisible actors turned out to be superior. I’m trying to keep an open mind here, though, and will test things out and let the results speak for themselves.

Monokkel · April 1, 2015, 2:57pm

Ok, I’ve done some testing with invisible actors and to my surprise, getting a single float from an actor in an actor array and printing it is actually slightly quicker than getting a single float from an array of single floats and printing it. The difference is very slight, but I was expecting a noticable difference in the other direction. I have no idea why this should be the case, but this is a good argument for using actor arrays. They also bypass the problem of having to reset variables as you would in a struct array when setting, as single variables can be set in an actor. Getting six floats from an actor with six float variables was similarly slightly quicker than getting six floats from a struct array. If the impact of invisible actors is indeed minimal there seem to be few reasons not to use them. Now I have to start testing the impact of invisible actors.

Hakabane · April 1, 2015, 6:43pm

This is a pretty great discussion. I’m glad to see the two people I follow most talking about something I’m about to get into for my project.

Zeustiak · April 2, 2015, 11:17am

Monokkel;262145:

When pathfinding I use one cost array as well as an integer array holding the index locations of pawns, so I Get from two different arrays each step of the search. I also Get from a vector array when placing the instanced meshes displaying which tiles can be moved to.

So I just did some testing. I first made two forloops. One where I added the index of the forloop to six different float arrays and one where I added the index to six different floats in a struct array. Adding to the struct array took approximately half the time of adding to the six float arrays. I then ran a foreachloop on the struct array where I printed the values of all the floats in each struct and another loop where I printed the floats from all the six separate arrays. The time this took was identical in both scenarios.

From this, it seems to me that getting from multiple arrays when doing pathfinding does not matter, and is just as quick as getting from a single struct array. Adding to a struct array is quicker than adding to a number of arrays equalling the number of variables in your struct. However, setting a single variable is quicker with simple arrays than with struct arrays, as you have to get and reset all the variables in each struct each time you want to change a single one. I rarely add many variables to different arrays at the same time, while I often set a single item in an array, which means that keeping multiple arrays still seems like the superior solution in my case. So to reiterate your example of three gets from 3 80K arrays should be just as quick as getting all the three variables from a 80K struct array if my results are correct.

Using actors, this is different, since you can change a single variable without resetting all the others. This might be the best of both worlds performance wise, but that presupposes that casting to and getting actors in the game world is very quick and efficient. Ian Shaddens pathfinding seems to indicate that this is the case, but I’m unsure what solution would be quickest in a direct comparison.

By giant struct array I was just generalizing for simply using arrays in general. Structs may make sense for performance somewhere, but right now they would mostly be useful for data organization.

I did see someone do a test on casts, and after doing thousands or millions of them he seemed to suggest that they are pretty efficient operations. Of course when you have to do a billion iterations, one man’s “efficient” may be different than yours or mine…

I just tried randomizing the values for 29 variables in each data actors, and it doesn’t seem to make a difference. Seems stable enough to test under fire.

Yeah, I have a lot of work ahead of me if this is significantly faster than otherwise. I am in no rush to revamp the generator though so I would probably focus it on current projects and get to the revamp at a later date.

Monokkel;262178:

Ok, I’ve done some testing with invisible actors and to my surprise, getting a single float from an actor in an actor array and printing it is actually slightly quicker than getting a single float from an array of single floats and printing it. The difference is very slight, but I was expecting a noticable difference in the other direction. I have no idea why this should be the case, but this is a good argument for using actor arrays. They also bypass the problem of having to reset variables as you would in a struct array when setting, as single variables can be set in an actor. Getting six floats from an actor with six float variables was similarly slightly quicker than getting six floats from a struct array. If the impact of invisible actors is indeed minimal there seem to be few reasons not to use them. Now I have to start testing the impact of invisible actors.

Would be nice to hear from Epic how they view it. I sometimes hear people say you don’t want that many actors in the world due to overhead, but they don’t really lay out what constitutes too many. Millions of voxel actors or tree actors would clearly be too many. At least from my testing you start to hit a wall after a few hundred thousand. Of course, if you are trying to work on mobile the limit is probably much lower.

Let us know how your work turns out. The more info the better.

Thommie · April 3, 2015, 1:10pm

I was just wondering, i know you’re not supposed to use object based blueprints but with Rama’s plugin you can at least create them and it might be worth looking into. I’ve just done some testing with creating object based blueprints in 50k batches and besides the occasional hiccup it didn’t seem to affect my performance, even when i reached 3 million objects.

Zeustiak · April 4, 2015, 1:04pm

What is the difference between an object based blueprint and an empty actor blueprint?

I just did a more extensive test with the actors, and I was seeing hiccups at 80,000 actors. Each actor BP was about 2MB in size. Makes it seem like the hiccups are memory related, and not dependent upon the quantity of them. I could streamline it now, but I am going to have far more data to store in the future, so I probably won’t be going this route.

I was trying to find a memory logger like the “stat scenerendering” console command, but all the memory stat loggers didn’t seem to show any difference in any of my testing. Anyone know what stat I need to look at for this kind of testing?

Given the potential benefits of not having to search several giant arrays, I think you could emulate data actors using struct arrays with proper organization/indexing. To mirror the data actor concept, create a giant struct array, and in the struct array, each member struct would have all the info that the data actor would hold(dozens of variables if need be). In both systems you should have the same array call to get the tile, and then a cast(data actor) or break struct to get the needed info.