Well tick rate alone is overrated for permormance optimization (it can help in certain cases, but is not a silverbuet)- the key is to have several object to span the tick.
E.G.
Assume you have 10k objects with heavy computing in tick:
if they all will tick together even once per second - you will have a hiccup in this specific tick.
The goal is lets say: devide the objects in X groups that are ticking separately (when group A ticks othet groups are not ticking).
For example lets devide the objects in 2 groups 5k objects each.
now regardless the tick rate - we will have an integer that we will increment each tick
then we will check if the integer is even or odd…
For example in groupA will bypass the whole tick if it is even and execute the operations if it is odd.
We will set the opposite rulles for groupB.
Now you dont have 10k object ticking together but only half - which will increase the performance twice (in theory).
Well that was only a simplified example to illustrate the idea.
And yes GroupB won’t tick when GroupA ticks → this is the idea.
I don’t really get all your theory there.
However yes even or odd check is not good performance wise since it will have lots of cache misses,
however
Then test the low order bit of the
quotient to tell us even or odd.
Leads to as much cache misses…
If you want a silverbullet performance “load balancer” algorithm you would do it this way.
Have a dedicated object that has it’s own tick method.
in this object you will have arrays of object you want to tick (number of arrays will correspond to number of groups)…
in the object’s tick you would just manually call a tick function of each actor that is in the array.
it will make case misses as less as possible also it can automagically assign each object(you want to optimize) to the separate groups → managing an even balance of the groups.
or even better, but a bit more headache to manage you will have 1 C like array
and you will tick it’s objects by the C-like pointer arithmetic, based on the array length and desired group length amount.
Hello I have a few questions. I want to use a Tick rate of about 20 times a second because I want to keep things light but my question now is does this really save computing performance or is it a waste of my time?
Second is how do I do it? with a TimerManager?
And my last question is what does really save computing performance? in a FPS game
20 times a second, may or may not be a bit fast for your game. There is no answer that anyone else can give you. For no one else but you knows your expectations in terms of what is good and bad performance. No one else knows what you wish to do per tick? Is it just a few blueprints nodes being kicked in? or are you going to have 100 different objects that you wish to act upon. So yes, changing from 20 times a second to 10 times a second, can make a large difference, if you are doing a lot per tick.
How to do it? I doubt i’m the person to ask, because I have always just used one Tick call, and whether it was blueprints, or C++ I kept my own lists of actions to perform, then sorted them by the number of milliseconds into the future they should execute, and handled all of that processing myself. So I just needed UE4 engine to call on the tick, and nothing else.
But when speaking of performance, always examine the algorithms you are using do they inherently give good performance over a wide range of conditions. You would never use a bubble sort, if you knew you were sorting 5000 objects. Yet, I have coded up a bubble sort in blueprints, when I KNEW, that the max would be about 10 objects.
It cannot be stressed enough though, the performance comes from the algorithm selection. Algorithms that perform badly in a certain situation, well they just perform badly, and that’s it.
and this example of an even or odd integer, shows exactly what I meant by an algorithm selection.
if we
A. use a brute force method, of just allocating an integer variable, and then incrementing, and checking when ever we need to know, a lot of cycles will be wasted.
or we could go with
B. grab the clock, which has greater than millisecond accuracy. Save the initial clock at object creation, then when we needed to know, Even or Odd, we would do a simple subtract of the orignal time, from the current time. Divide by the average time when a tick occurs (on most machines I have seen that is around 20 to 22 milliseconds, but remember the time span of the tick is not guaranteed at all). Then test the low order bit of the quotient to tell us even or odd.
but this is a great example if you will, of why the selection of the algorithm, is far more important than the tick itself. In method B, we don’t even need the tick, we only need the object creation time, and the time that we wish to know, even or odd.
In the example above, of doing half the objects on tick N, and then half the objects on tick N+1. Remember that your only doing half the work as well, the half of the objects that you did not do anything with on tick N, are in an invalid state until tick N+1 occurs. So yes, your going to have twice the performance, because your only doing half of the work.
Of-course this will work with assumption that your logic don’t have to tick every frame and it can wait for certain amount of frames.
If this is not the case → you will have to find another way…
Usually I follow the following rules: #1 Rule Find the performance boilerplate and try to optimize the algorithm. #2 If you fail to optimize the alg → reduce the calls if possible (my example goes to one of those) #3 If you fail try to offload certain heavy calculation to a worker thread (sometimes I prefer this over #2 when possible).
What I proposed as a “SilverBullet” is a kind of Tree (not remember it name sorry I just went to deep in to implementation details → implementing it with a C array - but same idea as a tree actually, if you are interested I can draw it ).
The term silver bullet was applying to my previous example where you have those 10k lazy actors who are willing to calculate something unnecessary ;-).
Cache miss due to a branch-predictor fail prediction are very expensive.
if we are talking about an Even and Odd example we will have a cache miss every time :-).
I glad that my example was good enough, I thought you want to tell me that it is bad.
Yes it’s kind of bad → but simple enough to illustrate the idea :).
of course you gave a simple example, but it showed perfectly what I meant by algorithm selection. I’m unsure why you think there will be some form of cache miss. To test the the variable as to odd or even, it’s either going to be a straight compare of some form, or the masking off of all bits then a compare again, or just go straight to the bit. The point was, via algorithm selection, one can bypass the tick altogether, and go straight to the bit of importance, bit on odd, bit off even. If one were to play around in assembler, you could do it with rotates, setting the condition code directly, and bypass the test. Assuming that the processor supports that form of instruction.
I have never seen a “silverbullet” in terms of performance. In general all algorithms will fail under certain conditions. That may only be 1% of the conditions that exist, but still they will fail. A binary tree will do this, if it becomes left or right skewed. A bubble sort is faster than most general sorts, if the number of objects is kept to a small size. Because the set up time, for the more optimum sorts, can also be atrocious as well.
Still your simple example highlighted exactly what I was trying to get across to the OP, the algorithm is what is important.
Then wiki is not explaining a cache miss very well. A cache miss generally refers to a requesting a line from cacheing memory (assuming that there is locality in the references, this would also be found in the processors on board N levels of cache), hard drives also experience this. Before there was read ahead of tracks, etc. One would use what was called RPS (Rotational Positiion Sensing) such that the code, would know, when the start of the track was about to pass under the head, and then could read in an entire track at one shot, by using chained CCW’s (IBM terminology for Channel Command Words, this was passed to the SIO (Start I/O instruction) as it’s parameter. Later changed to Start SubChannel in the XA/ESA series of machines.
While a branch prediction failure, can have it’s costs, there is no way around it, unless one just doesn’t code up any branches at all. That would be a program that made very few decisions! While I agree, that not predicting which way a branch is going to go, will lead to releasing some amount of data, out of the pipeline, with a subsequent fill of the pipeline, that “fill” should be coming from the on chip cache itself. Assuming locality of code. it’s always the far Jumps/Branches, that one starts to see paging happening.
Nope your example was perfect, it showed how algorithm selection, could lead to a different path, where the tick was not needed at all. With a few simple computations, the same answer could be achieved, assuming some margin of error with the divisor of the “average” amount of times between ticks. Even that margin of error could be reduced, by just keeping a running average. Which even as bad as some C/C++ compilers are, shouldn’t be that many instructions.
the less branches came from a many to one relationship, (many objects, with 1 state). What I was actually trying to get at, is that the State would only be computed, only when it was needed. Hence the only loss of instructions, would have been the retrieving of the clock, and the storing of the clock with the object, and that assumes that the object never needed to know the current state. Which to my way of coding is very reflective of real world coding. In that one only generates what is needed, exactly at the time it is needed.
My solution would be hard to determine, as to N/* if you will, in that it would have to be determined at run time, because the whole basis was to illustrate, moving from a tick (brute force method), to a “compute when needed” method, while still retaining the “average tick”. So my solution could be N/N (i.e. ocurring every tick anyway), or, it could be N / any divisor you choose.
Exactly what I proposed in my 2nd solution less branches.
While a branch prediction failure, can have it’s costs, there is no way around it, unless one just doesn’t code up any branches at all.
First solution had N to N branches
Your solution if I understood correctly N/2 branches (but applicable only to even/odd case).
The Not Silver Bullet would have 1 branch for all groups (of-course in even/odd case 1 = 2/2).
Yes probably wiki wasn’t the best link → but it suggests to a detail of a branch:
When branch predictor makes a guess it caches the subsequent commands in to L# cache.
Misprediction leads to flushing the Cache and loading the commands again which is very taxing in a real world situation).
You can change the tick interval in the actor settings, for every actor. But if you don’t change it (default setting), the tick will be every frame. So for globally changing the tick frequency you can lower or higher the framrate.
But the less the framerate the less smoothly the game will be.
And you can just higher the framerate if the device which runs the app has enough power.