5.4.4 Color Regions crash with SMPTE 2110

Hi Rus,

Thank you for jumping on this with us. I have been looking into the CCR render parts on our side in the meanwhile and I believe we have hotfixes for several critical issues within CCR. I would like to share my findings, observations and mitigations with you here.

All the issues encountered or at least the symptoms of them appear in ::RenderRegion(). Prior to the issue we have faced recently, there was a narrowing conversion issue in the Scale() function due to a hard assumption of IntType being 64bit, which had been addressed already with:

UE::Math::TIntRect<int64> BoundingRectToBeTruncated(BoundingRectangle);One of the new issues is the Viewport dimensions of the BoundingRectangle being rogue sometimes. The behavior is quite non-determinstic. One might think this shouldn’t happen as the render code attempts to do a mitigation with:

BoundingRectangle.Clip(PrimaryViewRect);However, based on the crash report from above and the fact that the code *is* already clipping against PrimaryViewRect, the mitigation is simply not robust enough as we are reaching limits above allowed platform sizes, i.e. D3D12_VIEWPORT_BOUNDS_MAX in our case for DX12. Could also be that PrimaryViewRect is rogue and hence not clipping Min *always* towards 0. Also, we can’t straight away clip Min.XY with the function as it would clip negative values towards the Max platform number instead of 0 first for Min.XY, which would result in a Viewport with Min.XY == Max.XY (basically 0 Extent). That said, since Min.XY can be negative we have to ensure to clip Min.XY first towards 0 and then towards upper bounds with platform bounds of DX12, like so:

`// ++ Dimension Markus : Tentative fix to resolve ICVFX crash due to invalid DX12 dimensions on RHI viewport
include “D3D12.h”
// Dimension Markus : Tentative fix to resolve ICVFX crash due to invalid DX12 dimensions on RHI viewport

// ++Dimension Markus : Tentative fix to resolve ICVFX crash due to invalid DX12 dimensions on RHI viewport
// Preventing Graphics API platform crash for hard-limitation of viewport sizes (DX12 only for now)
// We are in a war zone here on-set, so we only considering DX12 platform for the hotfix
FIntRect ViewportLimits{ FIntRect(0, 0, D3D12_VIEWPORT_BOUNDS_MAX, D3D12_VIEWPORT_BOUNDS_MAX) };
BoundingRectangle.Clip(ViewportLimits);

BoundingRectangle.Min.X = FMath::Min(BoundingRectangle.Min.X, D3D12_VIEWPORT_BOUNDS_MAX);
BoundingRectangle.Min.Y = FMath::Min(BoundingRectangle.Min.Y, D3D12_VIEWPORT_BOUNDS_MAX);
// --Dimension Markus : Tentative fix to resolve ICVFX crash due to invalid DX12 dimensions on RHI viewport`

The crash, which was probably the nastiest one, is another undeterminstic crash, potentially due to a race condition between the RenderThread and the RenderGraph execution. More precisely, it seems to me since we are not in an RDG pass but reference an RDG managed resource, it could be that the RDG is updating that particular resource without knowing obviously that we also rely on read access while doing so. Let me elaborate step by step. After the Viewport clips, ::RenderRegion() creates a platform RHI viewport with this particular Constructor :

const FScreenPassTextureViewport RegionViewport(SceneColorRenderTarget.Texture, BoundingRectangle);which chains further into a FScreenPassTextureViewport construction via:

FScreenPassTextureViewport(FRDGTextureRef InTexture, FIntRect InRect) : FScreenPassTextureViewport(FScreenPassTexture(InTexture, InRect)) {}and ultimately results in the explicit FScreenPassTextureViewport Constructor here, which can crash on the runtime assertion:

inline FScreenPassTextureViewport::FScreenPassTextureViewport(FScreenPassTexture InTexture) { check(InTexture.IsValid()); Extent = InTexture.Texture->Desc.Extent; Rect = InTexture.ViewRect; }Now, this case should not happen, although a hot fix might be to check if the RHI resource of FScreenPassTexture is pointing towards valid memory. So, one might think we could fix this in ::RenderRegion() with:

`// Check RHI resource valid before creating platform Viewport
if (!SceneColorRenderTarget.IsValid() || !SceneColorRenderTarget.Texture)
{
return false;
}

const FScreenPassTextureViewport RegionViewport(SceneColorRenderTarget.Texture, BoundingRectangle);`However, sadly this is where the undeterminstic race condition behavior kicks in. This prevention is definitely not enough and the runtime assertion can quite likely still trigger on the inline constructor. Looking at SceneColorRenderTarget a bit further, the derived .Texture resource from FScreenPassTexture is an FRDGTextureRef, so clearly intended to be managed by the RenderGraph. I noticed the FRDGTextureRef by the time of executing ::RenderRegion() is seems to be valid *before* creating the platform viewport, however the underlying FRHIResource* was already null. There is no check on the actual FRHIREsource* specifically here, because ::RenderRegion() is also outside a Graph RenderPass so we probably would need to be within a RenderGraph.Pass() in order to access the RDG managed RHIResource natively. Or de-register the resource before accessing in RenderThread, or better even don’t relying on it if possible. However, that said, this gave me a clearer picture on what was going on and how we could prevent this on-set in the very limited time frame we have left.

By looking through the available constructors of FScreenPassTextureViewport(), I could circumvent the assertion by creating the platform viewport without passing the actual FRDGTextureRef but instead fetching the Extend and using the according FScreenPassTextureViewport() constructor like so:

`// ++Dimension Markus : Tentative fix to resolve ICVFX crash due to RenderGraph RHIResource RaceCondition
// This is not robust alone, as there is a very nasty race condition here having RHI pointer mismatch between the RDG and the RHI resource

FScreenPassTextureViewport RegionViewport = FScreenPassTextureViewport();

if (!SceneColorRenderTarget.IsValid() || !SceneColorRenderTarget.Texture)
{
return false;
}

// We altered the Constructor from FScreenPassTextureViewport to not crash anymore on invalid RHI memory, but set the
// RHI resource of the Viewport to Null as well as the extent to 0
// Since we can’t access the underlying RHI texture in this function due to RDG restrictions, we come up with this double gating and a modified Constructor
// const FScreenPassTextureViewport RegionViewport(SceneColorRenderTarget.Texture, BoundingRectangle);

// Circumventing the crash by using a different constructor chain
RegionViewport = FScreenPassTextureViewport(SceneColorRenderTarget.Texture->Desc.Extent, BoundingRectangle);

if (!(RegionViewport.Extent.X > 0 && RegionViewport.Extent.X < D3D12_VIEWPORT_BOUNDS_MAX) ||
!(RegionViewport.Extent.Y > 0 && RegionViewport.Extent.Y < D3D12_VIEWPORT_BOUNDS_MAX) ||
!SceneColorRenderTarget.IsValid())
{
return false;
}
// --Dimension Markus : Tentative fix to resolve ICVFX crash due to RenderGraph RHIResource RaceCondition`

Another way would be to modifyh the FScreenPassTextureViewport in the Renderer module directly with maybe instead of asserting make a conditional 0 viewport if the RDG resource is not valid. However one could argue it’s good to leave the runtime assertion there because in the perspective of the Renderer module, the RDG resource should arguably always be valid so I believe it is a fair assumption to leave that runtime assertion in there.

We had 2 stable overnight runs with regards to the race condition hotfix. We are still in testing phase with the rest of it but we thought it’s definitely worth to share these fixes and insights from our side here with you.

I could imagine if one would prefer to fix the root of the race condition with the FRDGTextureRef entirely, maybe one could wrap at least the accessing part of the ::RenderRegion() within an actual RenderGraph.Pass() to make sure the RDG Graph schedules our RHI access here without causing a race condition like we experience.

Anyway, we hope this is somewhat helpful.

Thank you again Rus for jumping on this with us.

Best,

Markus