[Attachment Removed]
[Attachment Removed]
[Attachment Removed]
[Attachment Removed]
[Attachment Removed]
Thanks. We will review and implement this.
Also, we will soon re-add and review our streaming based terrain strategy impact.
The post is unclear to me what action is needed besides activation of the plugin. Should we review individual mentioned parameters from the posts (at least pointed by future traces) ? Is there any reference documentation for this plugin ?
Do you know if the plugin must be enabled before cooking (as our terrain is cooked as pak), the question is rather : Must it be enabled while cooking or used when shipping the application loading the paks is enough ?
[Attachment Removed]
Well prealignement will ensure they will be starting at the same time. This is not much of concern, as long as, as you say, they do stop simultaneously. This is the part that is challenged by the traces.
IG2 has 50ms between 2 consecutives returns while on the same time frame, both other nodes see 2 33ms each consecutive frames.
[Attachment Removed]
Like OpenGL, DX provides its own Present() approach. The idea is the same as glSwap() - to swap the output buffers. In DX it actually goes through the swap chain, but the result is the same. And here are few important points that I have to mention.
1. For proper synchronization, your nDisplay instances must run either explicit fullscreen or in independent flip mode (windowed, but covers whole display). This is required to allow DWM to provide full control over the swap chain, and exclude itself (DWM) from it. This is required by the driver-level NVIDIA synchronization stuff.
2. You should not have any pop-ups or other transparent things that may appear on the screen. Those could force the DWM to grab control over the swap chain back, and probably break the synchronization.
There must be other things to double-check your pre-conditions are good for NVIDIA sync. Just to make sure we’re good on this.
To support hardware level synchronization, NVIDIA provides their own presentation API function called NvAPI_D3D1x_Present(). It’s a part of the “NVIDIA Swap Barrier” synchronization approach. This is what nDisplay calls when “NVIDIA” sync policy is used, and what you can see in those traces.
In UE D3D11, there was a way to force non-buffered presentation in order to decrease the latency (between the moment when simulation is done and the moment it’s appeared on the screen). And we used it by explicitly setting the latency to 1 (i.e. a single presentation buffer). This allowed to make presentation calls blocked if the previous frame has not been presented yet. We also used additional Ethernet barrier based synchronization before calling NvAPI_D3D1x_Present(). So we’re 100% sure that we present the same frame (same frame number for every cluster node). It worked pretty fine, but there is no more DX11.
In UE D3D12, there is no way to manipulate the presentation latency, or it doesn’t work (I remember it didn’t work, and the corresponding warnings always appeared in the logs). As far as I remember, 3 buffers are used in the presentation queue by default. So technically, when nDisplay calls NvAPI_D3D1x_Present(), it doesn’t actually flip the backbuffers, but rather schedules this command to the queue.
To play with the latency, you can try the following CVar:
nDisplay.sync.nvidia.ForceLatencyUpdateEveryFrame=1
To improve synchronization, NVIDIA introduced that new synchronization approach called “NVIDIA Present Barrier”. This is what I have mentioned earlier. And it’s supposed to work better with the presentation queue and probably DWM as it’s integrated somewhere deeper. We don’t even have to call NvAPI_D3D1x_Present() anymore.
Now we came to the main point - why those intervals are different. What I see is:
IG 1:
1:55.131
1:55.147 +16
1:55.176 +29
1:55.214 +38 - the longest
1:55.231 +16
IG 2:
1:25:262
1:25:279 +16
1:25:329 +50 - the longest
1:25:345 +16
IG 3:
1:09:740
1:09:756 +16
1:09:804 +48 - the longest
1:09:823 +19
Yeah, the intervals are 38, 48 and 50. And roughly they all cover 3 periods of 16.(6) ms. 50ms is exactly 3*16.(6), while two others are incomplete three periods (but still more than two full 33.(3)ms). Another thing that may confuse is it all comes through the presentation queue, and we don’t know the size of it during those longest presentation calls. It just makes it harder to read/predict. Especially without pre-sync that I also mentioned previously.
Again, the synchronization implementation is a black box. I believe there is a waiting threshold inside to prevent infinite locks. Even if a node takes too long to arrive to the barrier, the master sync node may decide to unlock others earlier, and wait for that one which could have probably stuck.
You can see there are so many things that may make the sync traces look bad. The best way to detect a broken synchronization is a moveable object. We always use a bouncing ball or some moving/rotating objects to visually notice any discrepancy on vertical and horizontal seams. That is a reliable way to find the synchronization doesn’t work as expected. And I strongly recommend using it.
[Attachment Removed]