Your thoughts on and comments to Volume Rendering in Unreal Engine 4.

Are these purple Clouds also made with the same technique?

Would really like to know how they´re made.

Nope, those were made by Shane Caudle using a different technique where you make a cloud out of a bunch of random polygon crosses in max and then run a script that pre bakes the lighting into the vertex colors. Then regular texture masking is applied. The technique is not completely artifact free but the sorting tends to work pretty well since the values for the overlapping faces tend to be similar. The metaballs at the start of that piece have more in common with volume rendering methods.

Hey Ryan thanks for your answer. Do you know a good Tutorial where a guy makes a Volumetric Cloud from scratch on YouTube or Vimeo?
Would be awesome to add this kind of stuff to my space scene q.q

How can you place a geometric model inside the volume and make it look like it is inside the volume? Some of the examples Ryan made shows a geometric ball being inside the volume. I have the problem that the model replace the pixels that I normally would paint the volume onto and thus it always seems to be in front of the volume. This is really weird in VR.

Did you read the whole blog post? The solution is described in detail in the box intersection part. The box faces need to be inverted and disable depth test needs to be off.

You may be able to find a tutorial about making fake cloud cards or the other method I described but I doubt you will find one for actual volumetrics because there is code involved and its a bit too much for just a video. Hence why I wrote the blog post with companion videos.

Well, this part I either forgot or didnt read/pick up on. Thanks.

After working with the algorithms presented by Ryan and reversing it to fit into my own system I am happy to say that it works!

Or so I thought until I backed off a little.
d5655a9853bd986e2f92f0699303104ddc7f1e6b.jpeg

If I move even a slight bit away from the object, just outside the box, the depth check fails it seems.
I have no idea why because the depth buffer is 32bit and I am hardly moving away from the object.
I am using the simplest code that Ryan provided so I cant see where the flaw in my logic is. If you have any thoughts I would love to hear them.

Can you post the whole code you used for the box intersection? that is the same artifact I see if I forget to clamp something to above 0 in that math.


//bring vectors into local space to support object transforms
float3 localcampos = mul(float4( ResolvedView.WorldCameraOrigin,1.00000000), (Primitive.WorldToLocal)).xyz;
float3 localcamvec = -normalize( mul(Parameters.CameraVector, Primitive.WorldToLocal) );

//make camera position 0-1
localcampos = (localcampos / (Primitive.LocalObjectBoundsMax.x * 2)) + 0.5;

float3 invraydir = 1 / localcamvec;

float3 firstintersections = (0 - localcampos) * invraydir;
float3 secondintersections = (1 - localcampos) * invraydir;
float3 closest = min(firstintersections, secondintersections);
float3 furthest = max(firstintersections, secondintersections);

float t0 = max(closest.x, max(closest.y, closest.z));
float t1 = min(furthest.x, min(furthest.y, furthest.z));

float planeoffset = 1-frac( ( t0 - length(localcampos-0.5) ) * MaxSteps );

t0 += (planeoffset / MaxSteps) * PlaneAlignment;
 float scale = length( TransformLocalVectorToWorld(Parameters, float3(1.00000000,0.00000000,0.00000000)).xyz);
float localscenedepth = CalcSceneDepth(ScreenAlignedPosition(GetScreenPosition(Parameters)));

float3 camerafwd = mul(float3(0.00000000,0.00000000,1.00000000),ResolvedView.ViewToTranslatedWorld);
localscenedepth /= (Primitive.LocalObjectBoundsMax.x * 2 * scale);
localscenedepth /= abs( dot( camerafwd, Parameters.CameraVector ) );

//this line goes just before the line: t0 = max(0, t0);
t1 = min(t1, localscenedepth);
t0 = max(0, t0);

float boxthickness = max(0, t1 - t0);
float3 entrypos = localcampos + (max(0,t0) * localcamvec);
float3 exitpos = localcampos + (max(0,t1) * localcamvec);
return float4( exitpos, boxthickness );

///////////////////////////////////////////////////////////////////////////////////////////

float numFrames = XYFrames * XYFrames;
float4 accumdist = float4(0, 0, 0, 0);
float3 localcamvec = normalize( mul(Parameters.CameraVector, Primitive.WorldToLocal) );

for (int i = 0; i < MaxSteps; i++)
{
    float4 cursample = PseudoVolumeTextureColour(Tex, TexSampler, saturate(CurPos), XYFrames, numFrames);
    accumdist = lerp(accumdist, cursample, cursample.a);
	//accumdist += cursample * StepSize;
    CurPos += localcamvec * StepSize;
}


return accumdist*Opacity;

I am not using CLAMP(), but when I calculate the exitpos at the end I use MAX(0, t1) so I make sure it is greater or equal to 0. I added the RayMarcher as well for reference.

EDIT:
I found out what it was. Since the size of the volume can change arbitrarily I can not assume that the scale is x=y=z. When I scale the volume uniformly it works fine.

It is supposed to be the EntryPos, not the exit pos that is returned by the first part of that function, so maybe that’s it?

also looks like I may have forgotten that cameravector needs to be * (-1) in the density only version but its correct in the full code. Try replacing this line:

float3 localcamvec = -normalize( mul(Parameters.CameraVector, Primitive.WorldToLocal) );

No, it was the scaling. I return the exitpos because I want to do a back to front blending. I just have to find out how I can handle non-uniform scaling somehow.

The main part that isn’t handling non uniform scaling is the box intersection. I am assuming that the box extents are 0 to 1, but you could use the actual box extents. It just gets a bit tricker since then you have to pick which axis to normalize along, so you may need a few extra lines of code to normalize the steps across the largest axis or something like that. ie largestside = max(extents.x, max(extents.y, extents.z));

also you would need to invert the other behavior I mentioned about taking a partial final step. Instead, you’d take a partial first step before the main tracing loop but it should work. Might even be a bit faster. I wanted to try it too but ended up focusing on other things. Curious if it ends up being more efficient.

I can take a closer look at non uniform scale at some point soon but for now I am trying stuff like baking IBL ambience and experimenting with 4d formats.

Back to front: 196 instructions
Front to back: 195 instructions

And I tried normalizing against the min and the max. It helps, but it is still some ways offset. I am going to try to see if I can use the depth buffer to limit the steps more directly. If SceneDepth < CamToEntry + CurPosInsideVol then it has gone too far. I will have to try things. Maybe I have to manually send in the bound as a uniform to the shader. But this is just things I have to test.

If I take the first step before the loop I end up with the circular artifacts turning black: 9a6f4884352ad9f5bef7f40ec15299f8c1e9a462.jpeg

I have a timeline on my volume, but I am simply doing that by making several textures for the volume over time. I did not need to blend between them, but I am thinking about adding another texture I look up and blend between their timeframes.

Hmm thats odd, I just tried moving the partial step to the front of the loop and didn’t have that artifact. I did see one subtle artifact which I fixed. I realized if you do it that way, you don’t need the plane alignment part at all, and should move the ray advancement after the first sample using partial step.

Box Intersection node





float scale = length( TransformLocalVectorToWorld(Parameters, float3(1.00000000,0.00000000,0.00000000)).xyz);
float localscenedepth = CalcSceneDepth(ScreenAlignedPosition(GetScreenPosition(Parameters)));

float3 camerafwd = mul(float3(0.00000000,0.00000000,1.00000000),ResolvedView.ViewToTranslatedWorld);
localscenedepth /= (Primitive.LocalObjectBoundsMax.x * 2 * scale);
localscenedepth /= abs( dot( camerafwd, Parameters.CameraVector ) );

//bring vectors into local space to support object transforms
float3 localcampos = mul(float4( ResolvedView.WorldCameraOrigin,1.00000000), (Primitive.WorldToLocal)).xyz;
float3 localcamvec = -normalize( mul(Parameters.CameraVector, Primitive.WorldToLocal) );

//make camera position 0-1
localcampos = (localcampos / (Primitive.LocalObjectBoundsMax.x * 2)) + 0.5;

float3 invraydir = 1 / localcamvec;

float3 firstintersections = (0 - localcampos) * invraydir;
float3 secondintersections = (1 - localcampos) * invraydir;
float3 closest = min(firstintersections, secondintersections);
float3 furthest = max(firstintersections, secondintersections);

float t0 = max(closest.x, max(closest.y, closest.z));
float t1 = min(furthest.x, min(furthest.y, furthest.z));

t1 = min(t1, localscenedepth);
t0 = max(0, t0);

float boxthickness = max(0, t1 - t0);

float3 entrypos = localcampos + (max(0,t0) * localcamvec);
float3 exitpos = localcampos + (max(0,t1) * localcamvec);


return float4( exitpos , boxthickness );


Raymarch node:



float numFrames = XYFrames * XYFrames;
float accumdist = 0;
float3 localcamvec = normalize( mul(Parameters.CameraVector, Primitive.WorldToLocal) );

float cursample = PseudoVolumeTexture(Tex, TexSampler, saturate(CurPos), XYFrames, numFrames).r;
accumdist += cursample * StepSize * (FinalStep);
CurPos += localcamvec  * StepSize * (FinalStep);


for (int i = 0; i < MaxSteps; i++)
{	
	float cursample = PseudoVolumeTexture(Tex, TexSampler, saturate(CurPos), XYFrames, numFrames).r;
	accumdist += cursample * StepSize;
	CurPos += localcamvec * StepSize;
}



return accumdist;



written this way, it starts the trace right at either exit of box or scene depth, does a sample and then does a partial step towards the camera to get in line with the planes. Probably don’t even need that part but it helps the artifacts. Then it traces as normal.

Optimization gain from going back to front wouldn’t necessarily be noticeable in the instruction count and also wouldn’t show up until you start doing a lit volume since it could allow simplifying the transmission and color to a single lerp per step (as in the first equation shown in my blog on this), if you seed color to be scenecolor at the start. Then you have no opacity result and simply return a color. The only way to measure perf with these shaders is profilegpu time with a fullscreen effect. Instruction count is 100% meaningless. This shader is probably more like 5000 instructions if you consider the loop.

If its ~200 instructions, doing 32 steps is more like 6400 instructions. Thats over estimating since not everything is in the loop.

Fwiw, I only get 146 instructions when testing the above setup but thats with an opacity only connection and emissive of 1.

A nested lighting loop of 64 density steps * 64 lighting steps would be doing ~250 instructions 4k times or over a million instructions. Maybe 1/3 of that when you account for early loop terminations. Shockingly, 980ti cards can perform that task in under 10ms. Getting one of these effects to look nice full screen in under 2ms with lighting is the real challenge and that ends up requiring fancy reprojection and partial update techniques.

I think I am misunderstanding something here. I have a parameter collection that is updated with the position and orientation of a plane I use to cut the volume. However, when I try to transform the vectors I get nonsensical results.

Since the data I store is saved in the world space I needed to transform it to the local space to properly cut the volume.

And then all I do inside the custom node is check if the dot product is < 0 and if so, set the opacity to 0 to ignore it. float clip = dot(normalize(CPPos - CurPos), CPUp);

TranformPosition only supports transforming INTO the World space, so i tried changing it, but similar results emerged.
Any idea what I am doing wrong?

  1. do as much of the transform as you can in the BP to make the shader cheaper. you have lots of unnecessary transforms in material.

  2. the transform applies the location offset, but it does NOT normalize into a 0-1 range. To do that, you need to know the box diameter and divide the position by that. And then you need to add 0.5 and multiply by 0.5, since a position of 0 would be a centered position of 0.5,0.5,0.5 in the volume.

A 0-1 transform without rotation is as simple as (WoldPosition - Min ) / (Max - Min)

  1. Not sure why you are adding the upvector of the plane to the world location. Just pass in the up vector itself. As long as you aren’t rotating the volume you don’t need to transform it or anything.

True, but I find it to be more expedient to iterate only inside the material. When i get the effect I want I can simply reimplement it where it is faster.

Yes, and I have been trying to find a node that could give me these bounds, but I cant seem to find it. You have done it in the custom node so i am considering making a node that simply returns the bounds for me.

That was a quick fix attempt to see if I could counter the translation part of the transform inside the material. If I transform the normal vector it will not be a unit vector anymore. I want to enable rotation as well as that is really useful for my use case.

I will try to apply the transforms inside BPs instead.

Thanks.

Look at the node “Local Bounding Box Based _ 0-1_UVW” for hints. All you need to do is (Position - BoundsMin) / (BoundsMax - BoundsMin).

if there is rotation a local space transform may be needed first.

There are a few steps in this process. Here is a high-level outline.

  1. Create a sequence of OpenVDB grids in Houdini
  2. Execute a python SOP that writes out to an ascii file with the voxel data. Basically you iterate over all the voxels and output the values to a text file. Chances are you have to output a bunch of metadata too, such as object to world transforms, dimensions, etc. Note in this process we are also essentially converting a sparse grid to a dense one. Read up on OpenVDB if you are curious about sparse volumes.
  3. In Unreal, you need to read this file into a 3d array with the same dimensions as your original grid in Houdini.
  4. The last step is converting the 3d array into a 3d texture that lives on the graphics card. There are some API calls that make this easy.

Note this is a very general overview of how it works. In practice there are a bunch of optimizations you can make.

That was EXACTLY what I was looking for. Now it works like a charm! Thanks a lot. Time to clean up the code.

I must have missed the send button because when I came back here I realized the reply was still in the edit window. :stuck_out_tongue: Sorry for that.