Your thoughts on and comments to Volume Rendering in Unreal Engine 4.

Hmm thats odd, I just tried moving the partial step to the front of the loop and didn’t have that artifact. I did see one subtle artifact which I fixed. I realized if you do it that way, you don’t need the plane alignment part at all, and should move the ray advancement after the first sample using partial step.

Box Intersection node





float scale = length( TransformLocalVectorToWorld(Parameters, float3(1.00000000,0.00000000,0.00000000)).xyz);
float localscenedepth = CalcSceneDepth(ScreenAlignedPosition(GetScreenPosition(Parameters)));

float3 camerafwd = mul(float3(0.00000000,0.00000000,1.00000000),ResolvedView.ViewToTranslatedWorld);
localscenedepth /= (Primitive.LocalObjectBoundsMax.x * 2 * scale);
localscenedepth /= abs( dot( camerafwd, Parameters.CameraVector ) );

//bring vectors into local space to support object transforms
float3 localcampos = mul(float4( ResolvedView.WorldCameraOrigin,1.00000000), (Primitive.WorldToLocal)).xyz;
float3 localcamvec = -normalize( mul(Parameters.CameraVector, Primitive.WorldToLocal) );

//make camera position 0-1
localcampos = (localcampos / (Primitive.LocalObjectBoundsMax.x * 2)) + 0.5;

float3 invraydir = 1 / localcamvec;

float3 firstintersections = (0 - localcampos) * invraydir;
float3 secondintersections = (1 - localcampos) * invraydir;
float3 closest = min(firstintersections, secondintersections);
float3 furthest = max(firstintersections, secondintersections);

float t0 = max(closest.x, max(closest.y, closest.z));
float t1 = min(furthest.x, min(furthest.y, furthest.z));

t1 = min(t1, localscenedepth);
t0 = max(0, t0);

float boxthickness = max(0, t1 - t0);

float3 entrypos = localcampos + (max(0,t0) * localcamvec);
float3 exitpos = localcampos + (max(0,t1) * localcamvec);


return float4( exitpos , boxthickness );


Raymarch node:



float numFrames = XYFrames * XYFrames;
float accumdist = 0;
float3 localcamvec = normalize( mul(Parameters.CameraVector, Primitive.WorldToLocal) );

float cursample = PseudoVolumeTexture(Tex, TexSampler, saturate(CurPos), XYFrames, numFrames).r;
accumdist += cursample * StepSize * (FinalStep);
CurPos += localcamvec  * StepSize * (FinalStep);


for (int i = 0; i < MaxSteps; i++)
{	
	float cursample = PseudoVolumeTexture(Tex, TexSampler, saturate(CurPos), XYFrames, numFrames).r;
	accumdist += cursample * StepSize;
	CurPos += localcamvec * StepSize;
}



return accumdist;



written this way, it starts the trace right at either exit of box or scene depth, does a sample and then does a partial step towards the camera to get in line with the planes. Probably don’t even need that part but it helps the artifacts. Then it traces as normal.

Optimization gain from going back to front wouldn’t necessarily be noticeable in the instruction count and also wouldn’t show up until you start doing a lit volume since it could allow simplifying the transmission and color to a single lerp per step (as in the first equation shown in my blog on this), if you seed color to be scenecolor at the start. Then you have no opacity result and simply return a color. The only way to measure perf with these shaders is profilegpu time with a fullscreen effect. Instruction count is 100% meaningless. This shader is probably more like 5000 instructions if you consider the loop.

If its ~200 instructions, doing 32 steps is more like 6400 instructions. Thats over estimating since not everything is in the loop.

Fwiw, I only get 146 instructions when testing the above setup but thats with an opacity only connection and emissive of 1.

A nested lighting loop of 64 density steps * 64 lighting steps would be doing ~250 instructions 4k times or over a million instructions. Maybe 1/3 of that when you account for early loop terminations. Shockingly, 980ti cards can perform that task in under 10ms. Getting one of these effects to look nice full screen in under 2ms with lighting is the real challenge and that ends up requiring fancy reprojection and partial update techniques.