@BMAliens: I am fully aware that there are massive differences in the processing architectures between CPU and GPUs, and therefore require entirely different approaches in many aspects, but they aren’t so fundamentally different that it’s impossible to use one to do some of the work of the other. Of course it requires significant changes in code and the how the algorithms make use of hardware, but it can still be a viable solution. But I understand why you made the point that you have, and I appreciate it nonetheless.
@jwatte: I’ve done extensive profiling and can rest easy that it is neither number of draw cells nor texture bandwidth. Many of my static meshes rely on a small assortment of seamless albedo, roughness/metal/ao, and detail normal maps that use material instances that have parameterized Luo’s WorldAlignedTexture/Normal material functions as well as a base normal map uved to static meshes that have been traced from high poly versions in SP which never stray above 2k. I also decided to change the gruffer to only use 8bit precision for normal maps instead of 16bit. And all of the static meshes are either using HISM in a blueprint or have been merged together for draw cell reduction while maintaining effective occlusion culling bounds.
My texture streaming pool never strays above 300MB in indoor levels and 1100MB on outdoor levels. So as far as I know I should be alright given that my GPU has 4GB of GDDR5. Unless I’ve missed another crucial detail, which is certainly possible. And nearly everything is statically lit as well. Aside from that I’m only using a small number of dbuffer decals and only two (at the moment) particle effects that take up minimal on-screen pixels. (Maybe 1/16th of the actual screen @ 1920x1080 at the absolute most).
It seems for the most part that it’s an issue with Screen-Space Reflections. It usually is taking up somewhere around 4-8ms at 1920x1080. Which seems a bit absurd especially since I have it below the default settings. I’m not at my computer atm, but if I remember correctly I have it clocked at 100 intensity, 30 quality, and 0.4 roughness. I can’t seem to determine the issue beyond that. I would much prefer using Planar Reflections but while it works great in some areas, it absolutely destroys the performance of larger scenes, and since I can’t turn off the “support global clipping plane” depending on the level due to it being a project-wide setting. It’s just not a practical option. And I can’t imagine any benefit stemming from somehow moving the apparently heavy workload of the SSR onto the CPU and expecting any improvement.
So I am a bit stuck, because dynamic real-time reflections are so important for my project’s visual style, but at the same time the performance toll is downright absurd.