Hello!
Thanks for your continued interest on this thread!
Like RNDA2 gpus right? If someone wanted UE5 games to run the best on there pc, and thought to equip it with a RDNA2 gpu, would the engine actually check the system and use a more optimized pipeline(The next gen console pipeline) for that RNDA2 gpu?
Amount of optimisations we can offer on PC is directly limited to what is exposed to us in D3D specifications and also what the drivers says it can do. For 16bit ops in 4.3’s TSR, it’s directly based of D3D12_FEATURE_DATA_D3D12_OPTIONS4::Native16BitShaderOpsSupported. And the draw events of TSR in ProfileGPU or DumpGPU will says which TSR shader is taking advantage of them:
LogRHI: 8.4% 1.88ms TSR RejectShading(WaveSize=32 FlickeringFramePeriod=2.276193 ComposeTranslucency) 1857x721 1 dispatch 155x61 groups
LogRHI: 2.5% 0.51ms TSR RejectShading(WaveSize=32 FlickeringFramePeriod=2.425195 16bit ComposeTranslucency) 1857x721 1 dispatch 155x61 groups
So much this will saves in the runtime cost is largely if the drivers says it supports, but also how the hardware is capable to take advantage of this optimisation or not. 16bit often saves register pressure that when register bound can saves up to 2x performance improvement, but for instance RDNA GPUs also have the packed instructions like v_pack_mul_f16 capable to two two multiplications of the price of 1 which is another 2x. So that is the use of 16bit instruction on that shader can almost do a x4 perf improvement.
Also a concern of mine, if this is true. Is there a list of GPUS that have this 16bit instruction? I have a 3060 and fortnite 5.1 runs 42-47fps on native 1080p, low textures, high settings with high TSR. 50-53fps with no AA
Frame rate are not a great measurement of the cost of something due to it being inverse propersional to actual runtime costs. Whereas milliseconds are proportional to GPU runtime costs. It’s hard to comment on your frame rate as is, since so many things can be at play when it comes frame rate. This is where stat gpu
, ProfileGPU
and stat tsr
(or stat unit
in version before that) will give you further insight into where the GPU runtime costs goes for you scene on your GPU.
It seems to me that when targeting a card recommended resolution (1080p for A770, 3060) more optimization and cuts need to be made for those brands.
That is just not based of runtime cost of TSR, but also with it’s input feed shown in stat tsr
in megapixel/s that I explained earlier that is also important metric that directly influence decent displayed image quality.
So the way I understand it atm.: pc version is not using the 16 bit instructions in 5.2 and even that is only supported in DX12 right? I’m asking because we are still on DX11 and SM5. So is DX12 required to get reasonable TSR performance? (the buggy slate api rework really slowed migration a lot so I haven’t had time to migrate stuff that is not throwing errors or deprecations warnings at me :D).
Yes that is right, 16bit ops are in 5.3 and currently only on D3D12. It’s not possible to query the driver in D3D11 to know whether 16bit ops are available. Also to be able to compile HLSL 2021 with Microsoft DXC’s on D3D11, we have to do a crazy shader compiler pipeline that compile TSR shaders as such: TSR shader in HLSL 2021 -(DXC)> Vulkan SPIRV -(SPIRV-Cross)> HLSL 2018 -FXC> D3D11 Bytecode. Problems is DXC’s SPIRV backend currently is hitting multiple minutes long compilation time of TSR shaders due to their complexity. So we had to disable this optimisations where the 16bits are the most important too. Maintenance of TSR on D3D11 is becoming ever increasingly dificult overall and I would not be surprised at some point the support of D3D11 will have top be abandonned for TSR.
So the valid usage of TSR is to always render at lower res since it should look the same as native, but performance won’t tank? Because that’s what eludes me all this time. If I can render native 4K faster with TAA, why bother with TSR (ignoring all temporal related stuff that has been improved since that could be done for TAA as well), but if it is just as you say about DX12, 16bit stuff etc., then I understand I guess. The thing is I don’t want to upgrade to newest generation gpus so I can optimize it for the lower end first, but if there is “radical” difference and not just “MORE FASTER BIGGER”, then it’s different story.
If the quality difference between TSR and TAA are not obvious in your content then maybe perhaps TSR isn’t the right fit for your project if this is only slower. Thaw maybe one of your players will have different opinion too. That is why in Fortnite we expose the many anti-aliasing settings possible.
With all that said, I’m basically just asking, if there are plans to write the documentation only about configuration stuff or also a little bit about what is actually happening, because as someone who searched for “UAVs” and got mostly pictures of drones, I’d very much appreciate every bit of info you can spare on “why” opposed to “this is how you do it”.
I would love to publish more technical details of how TSR works, and tried multiple times! But with all the great rendering tech coming in 5.0, TSR publications had to be abandoned two year in row because there was just to much great tech to talk about already in unreal engine. Adds to that how the pandemic has impacted personal lives and you end up where we are today.
EDIT: @Guillaume.Abadie I heard the matrix awakens was locked at 30fps on console, but how was it 4K60fps on the video presentation on youtube?
Yes The Matrix Awakens was 30hz. I imagine the youtube ended up 60hz because this is the refresh rate of HDMI cable this was recording from regardless of the refresh rate of the actual game, and recording only odd or even or even average of both frame from HDMI cable could lead to issues whether game present odd or even frames based of when a frame was completed a bit late. So in a sence recording at HDMI 60hz refresh rate even on lower frame rate game is the most throughul option to make sure the fidelity of the experience of game fluidity, when frames are display to users identically like they would experience branching their console directly on their TV, and this also include showing in the video when frames were completed bit late by the console and changing from odd to even frame presenting (and vice verso). In a way it is a consequence of us not wanting to cheat the fluidity experience through editing the video after recording.
Please note the updated TSR doc is now live: