TSR feedback thread

Guillaume.Abadie · April 20, 2023, 8:19pm

Hi, lot to unpack there, thanks for taking the time!

I’ll start off with the character’s motion(I think her name is Echo?). When she walks, runs, rotates her body, starts flying: I NEVER see ghosting.

Yes that is right, we shipped this demo on TAA even thaw it was eating away lot of Nanite amount of details. We got a little bit away with murder there thanks to public seeing this amount of geometric details in real time only for the first time anyway. This is this very demo that motivated the start of development of our new anti-aliasing tech. The reason the character mostly doesn’t ghost in thanks to TemporalAA.usf’s AA_DYNAMIC_ANTIGHOST which reject the history when a dynamic pixel (that draws velocity like animated skeletal mesh) no longer draw velocity (like the grand majority of this environement that are just static nanite mesh).

Matrix awakens, now I know this used TSR.

Yeap. 5.0’s TSR.

After the City Sample released, then it was on. A lot of people including myself are very confused as to why we devs can’t achieve that level of clarity in these two presentations from Epic.
Everything seems so blurry/fuzzy unless the camera and content are slow and boring. I haven’t been able to test it on hardware higher than my 3060 but UE5 looks blurry and fuzzy on every project like the third person starter content, City Sample, etc.

I even tested the City Sample at 4k with my 3060 at 40fps. TSR didn’t look sharp like that matrix demo. This is painful. I tested with and without motion blur, lumen and still got disappointing results compared to those Unreal demos.

Me and some others lost hope in UE5’s ability to actually achieve this desired clearity from those demos. I’m not the only one who has spent hours and hours of research, testing this. For months now, I and others I’ve met online have been trying to achieve that simple clearity in those Unreal demo videos.

I can’t put words in the name of all the users we have, but very often I’ve seen that happening when supporting our licensees, that is for instance CitySample in 5.0 is very CPU intensive with the entire simulation of city that is very demanding, tanking frame rate and as result the amount of MP/s feed into TSR which is very important.

But what I’ve learnt through out awswering our users is that connection between CPU performance and the consequence it can have on the look of TSR was really non intuitive and unexpected to the users. So this is why in 5.2 that stat tsr not only show the microscopic and macroscopic metrics that directly display how fast TSR can accumulate details in the history, but also all of the stat unit too. So in one command you can identify TSR is being pushed beyond it’s limits due to very low MP/s, which might be due not because of rendering resolution, but the frame rate, which the stat unit also shows game thread, render thread and rhi thread timings too. So in one command it becomes a lot more intuitive that the image quality can in fact end up consequence of something completly unrelated to TSR bottlenecking the entire frame rate.

TSR in fortnite looked okay at 4k and 1080p.
I didn’t find a lot of ghosting even with the shader compiling stutter poking at the temporal frame rate stability.
Clarity didn’t look too fuzzy, if not fuzzy at all when sprinting(I tested high and epic TSR at native resolution).

I apreciate the compliment but that is not entirely representative too. For instance water waves could ghost and just been recently fixed in 5.3 ( https://github.com/EpicGames/UnrealEngine/commit/7c20c06df00b9a8f8fc5641c468800dd6f7c0199 ). There is also still quality problem on the grass always moving due to the tricky challenge of upscaling a velocity buffer for better accuracy in history reprojection.

But there was one major problem I found: Characters eyelids have major ghosting. It looks like the eyes are glitching and it looks painfully bad.

Yeap this a challenging case where there is clear motion vectors of the eye lid, but because the eyelid are stylized in such way it is so thin, it makes it impossible to detect the eye lid actually occluding and disoccluding the eye reliably with motion vectors. The eye lid once fully reopened also completly disapear into the head, not drawing velocity at all making it impossible to see the difference. We didn’t run into this issue in the Matrix Awakens because the eye lid are a lot more thicker than in fortnite, and a lot more closer to the camera.

The eye lid once up also completly disapear into the head, not drawing velocity at all so. So there it’s only with difference in shading between the eye and the eye lid we can work with but challenge start to arise to make the difference between a eye lid that close and open in very few frames versus a moire pattern between strcutured geometry and structured pixel grid. So until a better and cheap solution is found, had to come up with compromise. The problem of this compromise here is that one issue only happens on small part of the screen versus another can happen much more widely in an environement. Except we are human, and our brain is trained at looking at other human for biology and social reasons and which make this very few pixels of the eye matter a lot more.

Second: Gameplay footage from new Tekken 8 trailers after that first reveal looks ridiculously fuzzy.

I find it hard to comment on the quality of the anti-aliasing here, mostly because in 4k the video in gameplay looks heavily compressed with all sort of tiled wavelet compression artifacts that have quite an impact on the overall image quality. I understand the role of anti-aliasing tech is to deliver image quality, but blaming the anti-aliasing for every reasons to why an image quality can be compromised seams a little bit unfair, don’t you think?

Also one last thing. It would be great if we could switch the unreal engine Anti-Aliasing pipeline to tensor cores, Xe cores, AMD matrix, for performance reasons. (A console command would allow devs to let players to decide to tap into unused GPU components if DLSS,XeSS, etc is not available).
TSR could run on tensor cores instead of the main graphical computing units providing the games main render frames.

Would be great! But we are limited to the APIs exposed to us: for instance DirectML that requires roundtrip to main memory between a programmed shader versus matrix multiplications which is not in the interest of performance. This is where each HIV have a lot less constrains compared to us because knows exactly what and how their hardware can do and exploit many advantages that are not standardized like avoiding roundtrip to main memory to squeeze as much as possible runtime GPU performance to invest even more on quality.

What we can do specifically on PS5 and XSX is that conveniently most of their AMD GPU is public ( https://www.amd.com/system/files/TechDocs/rdna2-shader-instruction-set-architecture.pdf ) , so we can go really deep into hardware details and imagine and experiments crazy ideas. And this is what TSR did in 5.1, it exploit performance characterists of RDNA’s 16bit instructions heavily which can have huge performance benefits. In UE5/Main for 5.3, added shader permutation ( https://github.com/EpicGames/UnrealEngine/commit/c83036de30e8ffb03abe9f9040fed899ecc94422 ) to finaly tap on these instructions exposed in standard HLSL in Shader model 6.2 ( 16 Bit Scalar Types · microsoft/DirectXShaderCompiler Wiki · GitHub ) and for instance on an AMD 5700 XT, the performance savings in TSR are identical to how much these consoles are optimised too:

LogRHI: 12.5% 2.77ms TemporalSuperResolution() 1857x721 -> 3039x1179 8 dispatches
LogRHI: 0.2% 0.04ms TSR ClearPrevTextures 1857x721 1 dispatch 117x46 groups
LogRHI: 0.1% 0.03ms TSR ForwardScatterDepth 1857x721 1 dispatch 233x91 groups
LogRHI: 0.7% 0.16ms TSR DilateVelocity(MotionBlurDirections=0 OutputIsMoving SubpixelDepth) 1857x721 1 dispatch 233x91 groups
LogRHI: 0.6% 0.13ms TSR DecimateHistory(ReprojectMoire) 1857x721 1 dispatch 233x91 groups
LogRHI: 8.4% 1.88ms TSR RejectShading(WaveSize=32 FlickeringFramePeriod=2.276193 ComposeTranslucency) 1857x721 1 dispatch 155x61 groups
LogRHI: 0.3% 0.07ms TSR SpatialAntiAliasing(Quality=1) 1857x721 1 dispatch 233x91 groups
LogRHI: 0.1% 0.03ms TSR FilterAntiAliasing 1857x721 1 dispatch 233x91 groups
LogRHI: 2.0% 0.44ms TSR UpdateHistory(Quality=High R11G11B10 OutputMip1) 3039x1179 1 dispatch 380x148 groups
AMD 5700 XT 16bit ops:

LogRHI: 6.5% 1.33ms TemporalSuperResolution() 1857x721 -> 3039x1179 8 dispatches
LogRHI: 0.2% 0.03ms TSR ClearPrevTextures 1857x721 1 dispatch 117x46 groups
LogRHI: 0.1% 0.03ms TSR ForwardScatterDepth 1857x721 1 dispatch 233x91 groups
LogRHI: 0.7% 0.15ms TSR DilateVelocity(MotionBlurDirections=0 OutputIsMoving SubpixelDepth) 1857x721 1 dispatch 233x91 groups
LogRHI: 0.7% 0.14ms TSR DecimateHistory(ReprojectMoire) 1857x721 1 dispatch 233x91 groups
LogRHI: 2.5% 0.51ms TSR RejectShading(WaveSize=32 FlickeringFramePeriod=2.425195 16bit ComposeTranslucency) 1857x721 1 dispatch 155x61 groups
LogRHI: 0.3% 0.07ms TSR SpatialAntiAliasing(Quality=1) 1857x721 1 dispatch 233x91 groups
LogRHI: 0.1% 0.03ms TSR FilterAntiAliasing 1857x721 1 dispatch 233x91 groups
LogRHI: 1.8% 0.37ms TSR UpdateHistory(Quality=High 16bit R11G11B10 OutputMip1) 3039x1179 1 dispatch 380x148 groups

What makes TSR in 5.1+ so different from 5.0 is in how many more convolutions it does using this exposed hardware capabilities. That RejectShading is doing like 15 convolutions in 5.1+, each 3x3 on a current hardware compared to 3 in 5.0, which allows to make TSR substentially smarter thanks to very neat discovered properties chaining some very particular convolutions do. And while this number of convolutions massively increased by a factor of 5, the runtime cost of this part of TSR didn’t change, and yet this gain in smartness of the algorithm allowed to cut significant amount of other costs that was no longer required in the rest of the TSR algorithm which is core reason behind this performance saving from 3.1ms to 1.5ms on these console. Sadly this expose hardware capabilities in standard HLSL are not benefiting all GPUs equally because how they decided to architure their hardware too.

We can’t do miracle using specificaly marketed hardware feature in most efficient maneur with what is only a subset of these exposed capability to us at the moment. But we can still do some surprising stuf on existing hardware wrongly assumed by the publicly uncapable of doing some particular things. And we are able to do it just with standard features, and better understanding of how the GPU works thanks for instance to that AMD pdf I linked above.

Even when we adventure ourselves into new creative way to use the GPU, we can hit all sorts of roadblocks, for instance shader compiler too. Some of these optimisations in TSR are pushing the shader compiler in such unexpected ways that they must be disabled on SPIRV platforms (for instance Metal and Vulkan) due to the shader compiler used there blocking the editor start up for tens of minutes ( https://github.com/EpicGames/UnrealEngine/commit/e123c0e4c1c428312550060961559eb77c147291 ).

Thanks a again for your continued feedback!