Hey,
Looks good. I started optimizing/rewriting a bit as well to take advantage of some tracing optimizations that Brian Karis found while writing screen space reflections. Basically the gist of his optimization is to perform the raytraces in groups of 4 which apparently really speeds up how the GPU handles these kinds of lookups.
The first step in that was rewriting it using more of a ray length approach, but the result of that is pretty similar to what you have above. In my testing it was only a very minor perf difference from the current version, but it should be even faster once the ‘vectorization’ part is done. Here is is for anybody curious.
Rewritten just using Ray UVz (to make it easier to integrate the ssr method):
float SampleDepth, DepthDiff, LastDiff = 0;
float3 RayUVz = float3(UV, 1);
float3 RayStepUVz = float3(UVDist, -stepsize);
int i=0;
while (i<MaxSteps+1)
{
SampleDepth=(HeightMapChannel, Tex.SampleLevel(TexSampler,RayUVz.xy,0));
DepthDiff = RayUVz.z - SampleDepth;
if(DepthDiff < 0)
{
RayUVz -= RayStepUVz * (LastDiff / (LastDiff - DepthDiff));
break;
}
LastDiff = DepthDiff;
RayUVz += RayStepUVz;
i++;
}
return float3(RayUVz.xy, 1-RayStepUVz.z);
Now is the very WIP ‘vectorized’ version… has not yet been tested and there are still a bunch of extra temporaries I haven’t removed yet ( uses the old ray method) but should show where its going:
float rayheight=1;
float oldray=1;
float2 curoffset=0;
float oldtex=1;
float texatray;
float yintersect;
int i=0;
float4 offsets1, offsets2 = 0;
float4 raycheckheights= 0;
float4 texheights = 0;
while (i<MaxSteps+2)
{
offsets1 = curoffset.xyxy + float4(1,1,2,2) * UVDist.xyxy;
offsets2 = curoffset.xyxy + float4(3,3,4,4) * UVDist.xyxy;
raycheckheights = rayheight - (float4(1,2,3,4) * stepsize);
texheights.x = Tex.SampleLevel(TexSampler,UV+offsets1.xy,0).r;
texheights.y = Tex.SampleLevel(TexSampler,UV+offsets1.zw,0).r;
texheights.z = Tex.SampleLevel(TexSampler,UV+offsets2.xy,0).r;
texheights.w = Tex.SampleLevel(TexSampler,UV+offsets2.zw,0).r;
bool4 hitmask = raycheckheights < texheights;
[branch]
if (any(hitmask))
{
float2 outoffset = 0;
[flatten]
if(hitmask.w)
{
outoffset = offsets2.zw;
rayheight = raycheckheights.w;
oldray = raycheckheights.z;
texatray = texheights.w;
oldtex = texheights.z;
}
[flatten]
if(hitmask.z)
{
outoffset = offsets2.xy;
rayheight = raycheckheights.z;
oldray = raycheckheights.y;
texatray = texheights.z;
oldtex = texheights.y;
}
[flatten]
if(hitmask.y)
{
outoffset = offsets1.zw;
rayheight = raycheckheights.y;
oldray = raycheckheights.x;
texatray = texheights.y;
oldtex = texheights.x;
}
[flatten]
if(hitmask.x)
{
outoffset = offsets1.xy;
//need to use previous set if first value hits
oldray = rayheight;
rayheight = raycheckheights.x;
oldtex = texatray;
texatray = texheights.x;
}
curoffset = outoffset;
float xintersect = (oldray-oldtex)+(texatray-rayheight);
xintersect=(texatray-rayheight)/xintersect;
yintersect=(oldray*(xintersect))+(rayheight*(1-xintersect));
curoffset-=(xintersect*UVDist);
break;
}
curoffset=offsets2.zw;
rayheight=raycheckheights.w;
texatray=texheights.w;
i++;
}
float3 output;`
output.xy=offset;
output.z=yintersect;
return output;
Oh and the steps+2 was a special case to remove artifacts from textures that had either complete black or complete white in the heightmap. If I didn’t let it run an extra step it would try to divide by 0 on the first case and then an another extra step was needed to keep it from ending exactly at 0… I tried some other methods but that was easier
I’m sure a more graceful solution exists.