Issue for performance with Vulkan target (issue redirected from Khronos)

Hello. I’m was faced the touble: I have really poor pefomance with Vulkan (whet it should be better that OpenGL4).

For example:

  1. Windows 10 = 100 FPS (+/- 5) Ubuntu
  2. 18.04 (KDE Neon) with OpenGL4 target = 85-90 FPS
  3. Ubuntu 18.04 (KDE Neon) with Vulkan taget = 25-35 FPS

afaik Vulkan should have a better performance than OpenGL4

Here is answer from Khronos community:

Khronos only provides the api
specification for Vulkan. The actual
implementation is done inside the
driver. Judging from my experience
with NVIDIA and Vulkan their driver is
very good and performance with Vulkan
is usually better than with OpenGL due
to the much lower overhead. And most
of the time low Vulkan performance is
an application problem, so you’d have
to ask the people that implemented
Vulkan in the Unreal Engine why
performance is so low compared to
OpenGL. Usually that’s caused by bad
api usage, e.g. over-synchronization,
not moving buffer data to the GPU,
etc.

Here are themes on Khronos forum and on NVIDIA forum.

With UE 4.22 / 4.23 I was happy with OpenGL4 target.
However UE 4.24 (master branch) that have some fixes (critically for me) can’t be launched with OpenGL4 anymore (constantly crashes at shaders compilation process). So, I should use default Vulkan target.

Here is detailed informationlink text about hardware/software staff.

Anybody watches same difference between Vulkan and OpenGL 4?
Please, help!

Fix: NVIDIA forum.

Here are different results of utility nvidia-smi (when UE4 are lauched with OpenGL4 and Vulkan tragets)

SMI output without UE4 loaded:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce 930MX       Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   47C    P0    N/A /  N/A |    515MiB /  2004MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2189      G   /usr/lib/xorg/Xorg                           182MiB |
|    0      2427      G   /usr/bin/kwin_x11                              4MiB |
|    0      2433      G   /usr/bin/plasmashell                          74MiB |
|    0      2498      G   /usr/bin/latte-dock                           17MiB |
|    0      7246      G   /snap/anbox/158/usr/bin/anbox                230MiB |
+-----------------------------------------------------------------------------+

SMI output: UE4 scene loaded with OpenGL4 target ~65FPS

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce 930MX       Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   56C    P0    N/A /  N/A |    794MiB /  2004MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2189      G   /usr/lib/xorg/Xorg                           185MiB |
|    0      2427      G   /usr/bin/kwin_x11                              1MiB |
|    0      2433      G   /usr/bin/plasmashell                          74MiB |
|    0      2498      G   /usr/bin/latte-dock                           17MiB |
|    0      3429    C+G   ...lEngine/Engine/Binaries/Linux/UE4Editor   277MiB |
|    0      7246      G   /snap/anbox/158/usr/bin/anbox                230MiB |
+-----------------------------------------------------------------------------+

SMI output: UE4 scene loaded with Vulkan target ~25-30FPS:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce 930MX       Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   53C    P0    N/A /  N/A |   1111MiB /  2004MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2189      G   /usr/lib/xorg/Xorg                           190MiB |
|    0      2427      G   /usr/bin/kwin_x11                              1MiB |
|    0      2433      G   /usr/bin/plasmashell                          74MiB |
|    0      2498      G   /usr/bin/latte-dock                           17MiB |
|    0      6243    C+G   ...lEngine/Engine/Binaries/Linux/UE4Editor   586MiB |
|    0      7246      G   /snap/anbox/158/usr/bin/anbox                230MiB |
+-----------------------------------------------------------------------------+

Related old theme

Im surprised you’re able to manage running the Editor on Vulkan with only a 2GB card. There is some issues currently with running out of video memory on lower amounts such as 2GB. I would be curious to see a perf result of the engine running for something like 30s opengl vs vulkan to try to capture what is the major bottle neck here:

perf record -g -p sleep 30

Should run it for 30 seconds, and from there you can take a look at it with:

perf report -g --no-children (–no-children for just the leaf)

and just show the highest % or like 10-20 entries from there

Currently the opengl drivers do a better job at managing memory versus the vulkan driver but you should see an issue in increased video memory but odd to see a direct lower performance. Something I have not seen

As I understand right I did:

  1. I was launched UE4 with some scene and Vulkan target
  2. I was run those commands:

$ sudo perf record -a -g sleep 30 [
perf record: Woken up 189 times to
write data ] [ perf record: Captured
and wrote 49.684 MB perf.data (330148
samples) ]

$ sudo perf report --no-children
–sort comm,dso,sym

  • 32.11% swapper [kernel.kallsyms] [k] intel_idle
  • 3.40% swapper [kernel.kallsyms] [k] update_blocked_averages
  • 2.56% swapper [unknown] [.] 0000000000000000
  • 1.25% swapper [kernel.kallsyms] [k] menu_select
  • 1.03% swapper [kernel.kallsyms] [k] __update_load_avg_cfs_rq
  • 0.77% swapper [kernel.kallsyms] [k] psi_task_change
    0.74% swapper [kernel.kallsyms] [k] cpuidle_enter_state
  • 0.63% AudioMi-nder(1) libUE4Editor-AudioMixer.so [.] Audio::FMixerSubmix::FormatChangeBuffer
  • 0.61% swapper [kernel.kallsyms] [k] switch_mm_irqs_off
  • 0.54% alsa-sink-CX820 [kernel.kallsyms] [k] pci_azx_readl
    0.49% swapper [kernel.kallsyms] [k] __schedule
    0.43% swapper [kernel.kallsyms] [k] load_new_mm_cr3

Uh sorry:

$ sudo perf report --no-children --sort comm,dso,sym 
+   32.11%  swapper          [kernel.kallsyms]                          [k] intel_idle
+    3.40%  swapper          [kernel.kallsyms]                          [k] update_blocked_averages
+    2.56%  swapper          [unknown]                                  [.] 0000000000000000
+    1.25%  swapper          [kernel.kallsyms]                          [k] menu_select
+    1.03%  swapper          [kernel.kallsyms]                          [k] __update_load_avg_cfs_rq
+    0.77%  swapper          [kernel.kallsyms]                          [k] psi_task_change
     0.74%  swapper          [kernel.kallsyms]                          [k] cpuidle_enter_state
+    0.63%  AudioMi-nder(1)  libUE4Editor-AudioMixer.so                 [.] Audio::FMixerSubmix::FormatChangeBuffer
+    0.61%  swapper          [kernel.kallsyms]                          [k] switch_mm_irqs_off
+    0.54%  alsa-sink-CX820  [kernel.kallsyms]                          [k] pci_azx_readl
     0.49%  swapper          [kernel.kallsyms]                          [k] __schedule
     0.43%  swapper          [kernel.kallsyms]                          [k] load_new_mm_cr3
...
...
...

Im surprised you’re able to manage running the Editor on Vulkan with only a 2GB card.

Why are you surprised?
With launched UE4 and some simple scene I still have about 50% free GPU memory.

and also I’m a bit confused: how Vulkan can have a better perfomance, if it should require a better hardware.

However, Khoros told that they are get about 80% increased perfomance in same hardware that thay are tested with OpenGL4 before.

…they have got about 80%…

Here is another one test. I was changing position of camera while first command was running:

+   15.36%  swapper          [kernel.kallsyms]                            [k] intel_idle
+   13.73%  RenderThread 2   [kernel.kallsyms]                            [k] do_syscall_64
+    7.50%  RenderThread 2   [kernel.kallsyms]                            [k] syscall_return_via_sysret
+    6.96%  RenderThread 2   [kernel.kallsyms]                            [k] entry_SYSCALL_64
+    2.17%  RenderThread 2   [kernel.kallsyms]                            [k] __schedule
+    1.70%  RenderThread 2   [kernel.kallsyms]                            [k] pick_next_task_fair
+    1.60%  RenderThread 2   [vdso]                                       [.] 0x00000000000006ac
+    1.48%  RenderThread 2   libUE4Editor-VulkanRHI.so                    [.] FVulkanOcclusionQueryPool::InternalTryGetResults
+    1.26%  RenderThread 2   libc-2.27.so                                 [.] __sched_yield
+    1.23%  RenderThread 2   [kernel.kallsyms]                            [k] _raw_spin_lock
+    1.09%  RenderThread 2   [kernel.kallsyms]                            [k] yield_task_fair
+    1.01%  RenderThread 2   [kernel.kallsyms]                            [k] native_sched_clock
+    0.99%  RenderThread 2   [kernel.kallsyms]                            [k] update_curr

I’ve got a bit funny situation: I’ve moved UE4 from HDD to SSD and it increases FPS in viewport from 25-30 to 35-40 FPS. How it can be related?