Vulkan and DX12 parity (SingleLayerWater issues?)

Some of our artists have been testing various approaches for rendering water in UE. As of now, our artists are on Linux.

They started experimenting with the Niagara Fluids plugin to see examples of water, and they’ve quickly come across some issues. And after some digging, we’ve found that the issues generally stem from using Vulkan as the RHI method.

Generally speaking, there appears to be lots of visual artifacts and rendering issues when using Vulkan, and it does not match what we see with an identical setup with DX12. It’s somewhat difficult to describe, so please see the uploaded screen recordings. Perhaps it is more specifically related to raytracing and/or translucency.

We’re curious if there is something we’re doing wrong when it comes to the Project configuration, or if Vulkan & certain shader combinations haven’t reached visual parity with their DX12 counterparts. We’ve attached our Project Rendering settings, as well as Nvidia driver versions and hardware specs.

example_screen_recordings_vulkan_issues.zip(36.6 MB)

Steps to Reproduce

  1. Ensure the project’s Platform RHI Method is set to Vulkan (see below for more specific Project Rendering config)
  2. Enable Niagara Fluids plugin, restart the Engine
  3. Either:
    1. Create/Add a default Grid 3D Flip Hose, or any other 3D Flip Fluid Niagara System to a Level. Observe over time.
    2. Open up the Grid_3D_FLIP_Hose Niagara sim asset, and observe the strange rendering artifacts.

This rendering issue is not visible if the RHI Method is set to DirectX12. Tested across both 5.6 and 5.7, on Windows and Linux (Vulkan).

---

Project Rendering settings:

Reflections

Reflection Method: Lumen

Capture resolution: 128

Reduce lightmap mixing on smooth surfaces: True

Global clip plane: False

Lumen

HWRT when available: True

Ray Lighting Mode: Hit Lighting for Reflections

HQ Translucency Reflections: True

Software Ray Tracing Mode: Detail Tracing

Screen Tracing Source: Scene Color

Ray Traced Translucent Refractions: True

Direct Lighting

MegaLights: True

Ray Traced Shadows: True

Shadow Map Method: VSM

Hardware Ray Tracing

Support HWRT: True

Generate Ray Tracing Proxies: False

Texture LOD: False

Path Tracing: False

Software Ray Tracing

Generate Mesh Distance Fields: True

Translucency

Separate Translucency: True

Translucent Sort Policy: Soft by Distance

Local Fog Volume Apply on Translucent: False

Enable Order Independent Transparency: False

---

Machine details:

Ryzen 9 9950X3D, RTX4080 (Driver: 591.86)

Windows 11 Pro 23H2 (Build 22631.6199)

Vulkan SM6 tested

UE 5.6.1, 5.7.3

Intel Xeon E-2246G, RTXA4500 (Driver: 570.153.02)

Kubuntu 22.04

Vulkan SM6 demonstrates same behaviour as on Windows

UE 5.6.0, 5.6.1

Good to know regarding the rigorous testing Vulkan undergoes!

I am in a position to build the Editor from source, especially on Linux (but we can scrounge up a Windows machine if need be). Generally speaking, we are trying to avoid custom builds of the Engine, but if it’s for debugging / solution confirmation purposes, we are more than happy to.

Was there a reason this was closed?

Hi Jackson,

You can try running with “-gpucrashdebugging” at the command line which should dump breadcrumbs to identify the pass that may have caused the crash.

You can also attach a set of logs from a crashed session for me to look at if you want (they are located in the Saved/Logs folder of your project). It’s a quick way for me to see all of your session’s information: features that were enabled, driver version, OS version, etc.

If you narrow it down to a specific cause with a repro, be sure to let me know!

Just curious, is there a reason you went with 5.6 instead of 5.7? Was it because of the water issue?

Cheers,

JN

Hello Jackson,

The Vulkan RHI should be able to deliver the same rendering as d3d12 in the editor, especially with the machine specs you listed. We run the same tests and validation before it goes out the door, but the truth is that it probably gets less traffic overall during development so it’s possible that some corner cases slip through.

You give a very detailed repro here (thanks for that!), let me have a look and get back to you.

Out of curiosity, are you currently in a position to build the editor from source if a solution is found?

Cheers,

JN

OK, that’s good to know. So if ever the solution happens to be in code, I can point you to a changelist from our Main branch for you to try out.

Also just to let you know, I’ve been able to reproduce the issue… I’m looking into it now.

JN

Hi Jackson,

Here’s what I have so far:

  • With that configuration (the default SM6 configuration really), the single layer water is rendered with a depth prepass to work properly with the virtual shadow maps. When the actual rendering occurs later with with a DepthOp=Equal (meaning only pixels rendered at exactly the same depth as the prepass rendered earlier), some pixels fail the test.
  • You can see it doesn’t happen if you load Vulkan in SM5: no Nanite and no VSMs, so no depth prepass by default. Not a great solution, but just saying… :slight_smile:
  • You can also forcefully disable the depth prepass with “r.Water.SingleLayer.DepthPrepass=0” (in Engine/Config/ConsoleVariables.ini for example), it should make the issue go away, but again this isn’t an ideal solution.
  • I saw that SingleLayerWaterRendering also marked the pixels in the stencil and then tested for both depth and stencil… If you make it only test for the stencil, things render correctly as well in simple scenes:
// SingleLayerWaterRendering.cpp around line 2146
FMeshPassProcessor* CreateSingleLayerWaterPassProcessor(ERHIFeatureLevel::Type FeatureLevel, const FScene* Scene, const FSceneView* InViewIfDynamicMeshCommand, FMeshPassDrawListContext* InDrawListContext)
{
...
		// Set depth stencil test to only pass if depth and stencil are equal to the values written by the prepass
		DrawRenderState.SetDepthStencilState(TStaticDepthStencilState<
			false, CF_Always /* previously CF_Equal */,		// Depth test
			true, CF_Equal, SO_Keep, SO_Keep, SO_Keep,	// Front face stencil 
			true, CF_Equal, SO_Keep, SO_Keep, SO_Keep,	// Back face stencil
			STENCIL_SANDBOX_MASK, 0x0 				// Stencil read/write masks
		>::GetRHI());
		DrawRenderState.SetStencilRef(STENCIL_SANDBOX_MASK);

None of these are permanent solutions however… I can see in the system’s history that changes were made to it for exactly these types of z-fighting issues a couple of weeks ago, it’s not clear to me why these changes were sufficient for D3D12 and not Vulkan… but I’ll see with the system owners if they have suggestions for a way forward.

I’ll get back to you when I have something!

Cheers,

JN

Fascinating, thanks for the update!

Hello Jackson,

Sorry about that, the tickets close automatically after a prolonged delay without any activity. That’s on me, I transferred the information to our water rendering group and I forgot to follow up (mostly because there is still no solution). The latest information from the investigation is that we are actually able to reproduce the issue in D3D12, although to a much lesser degree (it results in “holes” in the water mesh). The solutions I suggested are still the best I have to offer, it’s not clear when they might have time to tackle this (my field is more Vulkan and/or RHI).

JN

No worries! Figured it was an inactivity thing - all good. Thanks for the update, will share with the team internally.

On a slightly-related note, as we’ve started rolling out 5.6 internally for artists to tinker with, we’ve started to get reports of seemingly Vulkan-related crashes. There isn’t anything concrete to share just yet as far as repro steps or really anything specific, especially since a lot of the logs just seem to abruptly end with:

LogVulkanRHI: Error: Result failed, VkResult=-3
at ./Runtime/VulkanRHI/Private/VulkanRayTracing.cpp:1528
with error VK_ERROR_INITIALIZATION_FAILED
LogRenderer: Using fallback RTPSO
LogVulkanRHI: Error: Result failed, VkResult=-3
at ./Runtime/VulkanRHI/Private/VulkanRayTracing.cpp:1528
with error VK_ERROR_INITIALIZATION_FAILED
LogCore: FUnixPlatformMisc::RequestExit(1, VerifyVulkanResult

In an effort to dive into this, are there any CLI flags or ENV vars to set prior to launching UE that might provide additional logs or other data as to what went wrong? (Or any other suggestions you have, I’m all ears as I try to find root causes)

Once we have more concrete data, I’ll definitely open a separate EPS post for this.

Hi Jean-Noe,

Thanks! That’s useful, I’ll start recommending that flag to users who are hitting hard walls with GPU crashes.

Part of my investigations are also to determine whether we’re simply hitting “normal” limitations of massive scenes (200K actors [nulls + Static Meshes, no ISMs] was the circumstances around one crash). So I’m trying to get a Windows box provisioned for me to compare against DX12 on Windows.

The only reason we’re on 5.6 is purely a development reason - we have a bevy of internal plugins that haven’t been built against 5.7 yet, so as a facility we’re still on 5.6. We’re planning on leapfrogging to 5.8 when it drops. In the meantime, as I investigate these crashes, I’ve been using an unmodified source build of 5.7.3. I’m trying to avoid sending us down wild goose chases for bugs that have already been fixed in 5.7+.

Not too sure how quickly I’ll be getting a machine to properly test, but will keep you updated.

Best,

Jackson

Hi Jackson,

Pushing the limits with large scenes might explain *some *of your crashes
(they would show up with “out of memory” somewhere in the logs)… Memory
management in Linux is great for CPU memory, but for GPU memory there are
parts that are offloaded to the applications (using Vulkan in Windows would
potentially give better results, you can compare by just adding “-vulkan”
to the editor in Windows when you run it). Currently in Linux if you
allocate too much memory too fast, it’s possible to outpace the engine’s
memory “balancing” in our Vulkan RHI. Also, if you have another
application running using VRAM running at the same time (say you also have
3D authoring software running), it can create noise in the engine’s
available memory readings. We’ve been meaning to improve the Linux memory
handling in Unreal for a while now, but it got pushed back. I’m scheduled
(no promises!) to work on memory management right after we ship 5.8,
hopefully we can solve this for good.

In the meantime, if you hit a situation where the crash is confirmed to be
because of memory, you can try setting these for the project as a
workaround:
*r.Vulkan.EvictionLimitPercentage=30*
*r.Vulkan.EvictionLimitPercentageRenableLimit=15*
This configuration forces Unreal to use host memory for resources much
earlier, providing more space for large peaks. However, setting these
values too low on a project that doesn’t require it will negatively affect
performance. This manual balancing act is why we intend to revisit the
system to create a more seamless experience.

If you get a chance, let me know how the comparisons to 5.7.3 go! (or
5.7.4 if you have a chance, there’s a swapchain issue that was resolved for
Vulkan if I remember correctly)… And the comparison to Windows too
(Vulkan and/or D3D)! Feedback on issues you encounter in real world
scenarios is precious information for us.

Cheers,
JN