Framedrop after loopig big array

BioMedKUL · October 15, 2020, 9:40am

Hi,

I’m currently developing a system that validates data against a datatable which is imported through a CSV file. When a button “Validate” is pressed, the system loops over every row in the datatable to find the corresponding data.
In the beginning, this worked perfectly. But since then, the datatable has grown to 450+ rows and I get a framedrop everytime the system loops over it. Each row consists of 70 fields so it is a lot of data that gets processed.

At beginplay the system converts the datatable to an array of rows which in turn are converted to an array of a self defined struct type. When validating, each row (array of self defined struct type) is compared against the current setup in the system (also converted to an array of that self defined struct type. I simply use the “==” node because I sort both arrays beforehand. Once a match is found, the loop breaks. But if no match is found, that means that every item in the datarows array has been sorted and compared, which can be a heavy operation I think.

Is there any way to make this system more performant so I don’t get framedrops?
How much impact does a “==” node have on the processing time?

EvilCleric · October 15, 2020, 11:00am

Can’t you search the row name directly in the table? Have you considered using hashes?

BioMedKUL · October 16, 2020, 9:41am

No, each row represent a situation (active elements, some variables). The system checks each row against the situation the system is currently in. If it finds a row where the data present is exactly the same as currently presented in the system, it fires a function.

I’m not really familiar with hashes, could you explain it?

Also sidenote, I’m working in a blueprint project.

SouthB2 · October 16, 2020, 10:25am

Does your ‘situation’ consist of any single element that defines that situation? Perhaps give us an example of what a situation looks like?

EvilCleric · October 16, 2020, 10:52am

Let’s say that a given situation is identified by these 4 elements:

Armour : float
Mana: float
HasWeapon: boolean
Location: Enum

So, the goal is to transform all these elements into an unique identifier or number.

There are a myriad of ways to calculate this. So, in order to avoid extensive math, we go with the simplest way possible:

convert all elements into strings and append them, creating just one
sum all the characters of that string
calculate hash = the sum % N, where N is recommended to be a prime number, and in this case, a lot higher than the predicted future number rows on the table so to avoid collisions (same hash number for diferent element values).

Note: be careful when using floats to calculate the hash. Is better to ceil, floor or turn them into integers before using them. Also the values of both N and the sum operation is dependent not only on the size of the table but also the nature of the data to be hashed. A more complex operation may be required to avoid collisions.

Example (without considering the float issue, considering a table with around 500 elements and 5503 as the prime number for the module):

And that’s it.

Now the problem is that you already have 450+ rows, and you will need to calculate the hash for all rows and set the respective hash in the row name column.
So, you will need to export your table to csv and use a spreedsheet, R or some other software to calculate the hash and replace the current row name for it. And then import it again to Unreal.

Now, that each row or situation is uniquely identified, when you required to search for a given situation, calculate the hash, and directly search for it in the table in O(1) time or whathever time it takes for “Get Data Table Row” to search.

BioMedKUL · October 16, 2020, 11:38am

All the different scenario’s are unique but are slight variations of one another. The program is a simulator for dental prosthesis building. A model of the jaw is present with all teeth present. On each tooth, there are several types of extra things that can be toggled on or off. So one scenario can be tooth x with a certain thing attached, tooth y with 3 extra things attached (thing 1, 2, 3), tooth z with 2 extra things attached, etc. With a max of 6 teeth that have extras attached. The datatable row has 11 fields per tooth for checking the active extras. So that already gives 66 fields. After those fields, there are some other fields that aren’t used for validating the situation but rather give output values like “is the situation acceptable”.

BioMedKUL · October 16, 2020, 11:39am

I understand! I’ll try to implement it, thanks!

BioMedKUL · November 4, 2020, 1:16pm

I was able to efficiently hash each row in the same way as the nodes you shared calculate the hash to find the corresponding row. The main problem with this approach is that although each row in my spreadsheet is unique, some rows have exactly the same character, which makes the hashing of those rows impossible in this way.

For example a row has a column with value “a” and a column with value “b”, the row beneath has “b” in the first column and “a” in the second column. The rows are unique but are identical in characters.

How do I create a hashing algorithm that can create unique hash ID’s for each row that can also be easily calculated in my system?

EvilCleric · November 4, 2020, 2:53pm

There’s a myriad of algorithms that can be used. The one I presented, was one of simplest to implement. To avoid the case you presented, you can try using left circular shifts and xor operations.

Check for example the PJW or xxHash hash functions.

BioMedKUL · December 4, 2020, 7:39am

After some code reviewing and checking the program with the session frontend, we were able to find where the systems stalls. The code loops over a very big array but that in itself wasn’t the problem. The problem was a bit mroe complicated. The ‘big array’ as I described actually was a map with keys and each referring to a struct that consists of 3 arrays, this was to make a preliminary division of rows so the actual arrays would be smaller and quicker to loop over. What I did wrong was place the ‘Find’ node just before where the separate arrays where needed. This approach fetches the map and find the specific array every time, so instead of doing the ‘find’ one time, the system actually ran that node 3 times and that slowed everything down considerably. I didn’t know at the time of coding that unreal nodes work backwards and could slow down by going through nodes several times.

How we fixed it:
Move the find node to where the key to the arrays that need looping is available. There we save the resulting arrays as variables and use these variables to loop over. Where the system had a stall of about 0.5-1s before, it now runs smooth with no noticeable stall. Just a little framedrop of a few frames because of the looping of the big arrays.