# Pixel Tracking Methods: Optical Flow Methods

Hi All,
I am new to Unreal Engine and I was hoping that you all could point me in the right direction. I am trying to do something that is not really built into the Unreal Engine but I was hoping that someone here could let me know if what I want to do is possible. And if there is a better spot in this forum for me to pose this question, please let me know!

So a little background, I am working on a project to create a tool to obtain optical flow information between frames. Not to go too in depth, Optical flow is the apparent motion of objects surfaces and edges in a visual scene relative to the observer. Essentially, I want to track the world space location represented by each pixel rendered in a scene from frame to frame and calculate a vector of its movement between frames.

An example: So say a scene is comprised of a cube that is spinning. The pixel I am interested in represents one corner of the cube or some specific point on an actor and has WS coordinates of (x,y,z). As the next frame renders, the cube has rotated and the corner of the cube now has WS coordinates of (x+dx , y+dy ,z+dz) and the pixel that previously represented the corner now represents some other part of the cube or scene. I want to be able to calculate (dx,dy,dz) for each pixel.

Rough Procedure:

Frame1:

1. Get WS location of each pixel rendered -> output to array
2. Create a reference to the specific point on the object that the pixel represents.

Frame2:

1. Get WS location of previously referenced point on objects in scene -> output to array
1a.Subtract WS location (x+dx,y+dy,z+dz) at frame 2 from WS location (x,y,z) at frame 1 to obtain differences(dx,dy,dz)
1b.Save array of differences
2. Get WS Location of each pixel rendered in new frame
3. Reset references to specific points on object that pixel represents

Repeat For all frames

some of that sounds a lot like what motion vectors do for post process motion blur, if that’s the case then maybe half your work is done for you already

Do you think you could link me to some documentation describing what you are thinking about, I am having a little bit of a hard time.

Are you wanting to do this for content where the “animation” is defined in UE4 such as a flipbook, or are you talking about getting motion vectors for an externally generated image sequence?

There is already a way to generate motion vectors for flipbooks in UE4 but it is not well documented yet since its a bit tricky. If that is what you want I can describe the steps to do it. The process is not unlike the steps you mentioned; since the motion is defined in UE4 we have access to the world positions of the current and next frame. The ‘trickyness’ comes in making the motion vector texture in a way that actually improves the motion. That means all silhouette gaps need to have continous motion. I accomplished that via ‘motion substepping’ which involves duplicating each object many times and performing partial rotations between the current frame and next frame with locked motion vectors for all the ‘substeps’.

If you want to do this for externally created sequences then I am not quite sure how you would be getting world space locations and you’d instead have to search for similarity and try to use neighbor information that correlates within a theshold or something along those lines.

,

Thanks for taking the time to respond to me.

I want to use this for UE4 content, do you think you could help describe the steps to output per-pixel motion vectors/velocities/distances traveled?

Ideally, I want to attach a scene capture to an actor with custom render target that outputs the motion vectors between frames. I want to be able to place this actor on different levels and capture the scene as well as the motion vectors for each of the pixels rendered.

I was hoping you would say from a flipbook since that is all I have working. You may be able to devise a new system based on how this works since the basic ideas will be the same. The one thing you are going to figure out for yourself is how to get the objects next frame position, and how you are going to store all of the motion vectors into a texture. In my case those problems were easy since I was dealing with a flipbook, every frame is already part of an atlas and took advantage of that by using the same layout.

Here is how I generate motion vectors alongside a flipbook animation:

On the bottom is the array of flipbook meshes, each with a different rotation. On the top, the flipbook array is repeated, but each mesh gets unwrapped using its lightmap UVs, and centered inside of its local flipbook cell. I replace the material emissive color with local worldposition, which is worldposition centered within each flipbook cell and then biased into the 0-1 range so that regular 8bit textures can be used without requiring floatHDR textures.

This is all set up and working in the RenderToTexture_LevelBP, under Flipbooks. The associated materials may be useful for you to reverse engineer.

At this stage, motion vectors are calculated by capturing the top UV layout image using a render target, and then reading current frame and frame+1 and getting the difference and transforming that difference from world to screen space. I also expose a “motion vector intensity” parameter which boosts the difference to get better precision with 8 bit textures. I just increase it until it clamps out (which I can tell by using an IF node to make the color red in that case). That means you need to specify the same amount as a divisor in your final material that reads the motion vectors.

The ‘motion substepping’ part helps the shapes be able to blur outside of their original silhoutte.

For example, with our current flipbook motion vectors, it will look like this:

Notice that we only have motion vectors within the silhouette of any given frame. That means that we will get the expected blending only where the silhouettes over N and N+1 overlap.

In order to fix that, we actually have to render a constant gradient from each pixels source to its destination in the next frame. That looks like this:

This is accomplished by actually re-adding all of the flipbook meshes for each “substep”, with an offset Z value, and with animation phase that blends between N and N+1.

For example, if we did 4 substeps, it would add 4 new frames behind just N:

N
N + 0.25
N + 0.5
N + 0.75
N + 1

So we rendered 4 additional meshes to substep 4 times. In my example this is 32 substeps (altho it might be 64 I cant remember), and it causes lots of meshes to be added and thus can take a second to refresh when increasing the substeps:

For information on how the motion vectors are applied, there is a material function called “Flipbook_MotionVectors” in engine.

, wow, thanks for the great response. I will definitely take a look at ‘RenderToTexture_LevelBP’ but before I go deep into that, I just want to clarify my question because I just realized the procedure I described above is incorrect and I think what I want to do may be more integrated into Unreal than what you described above.

Above, I use the world position domain to calculate a 3D vector Field of the motion of each of the pixels in world space and store the vector information into a texture, but actually what I want to do is to use the screen position domain to calculate a 2D vector field of the motion of each pixel in screen space which should be much easier to store in a texture. which gives a general overview of Optical Flow and may help you understand what I want to do.

To rewrite my example above, but this time correctly:
A scene is comprised of a cube that is spinning. The pixel I am interested in represents one corner of the cube or some specific point on an actor and has screen space coordinates of (x,y) at time t. As the next frame renders, the cube has rotated and the corner of the cube now has screen space coordinates of (x+dx , y+dy) at time t+dt and the pixel that previously represented the corner now represents some other part of the cube or scene. I want to be able to calculate (dx,dy) for each pixel.

More succinctly: (x,y) are screen space coordinates and t is time, for a pixel at (x,y,t) which represents a point on an actor, find the vector (dx,dy) such that (x+dx, y+dy, t+dt) corresponds to the same point on the actor at (t+dt).

I was hoping to use a post-processing filter such as the one shown in this example because

Are these vectors that they visualize the screen space motion vectors? And in that example, they visualize using a low resolution map, is there a way to output those vectors in high resolution to a texture?

Tl;dr Is there a way to output high resolution screen space motion vectors to a texture using the built in post processing algorithms for motion blur?

I see more what you are trying to do. I will reply in more detail later after talking to one of our motion blur experts (Brian K). It may be there is some way to get access to the actual motion blur buffer… but due to the way buffers get used and then immediately re-written I cannot guarantee that is possible without checking with Brian.

In terms of solving this using some additional math, you may be able to solve this using clip space transforms. You mentioned that you already have a texture that represents the worldposition on your mesh at screenposion X,Y… I was not sure from your post though if you already have access to the texture that has the previous worldpositions for each frame. If you do, you can simply do a World to Clip transform using the ‘old’ worldposition and it will tell you the old sceenspace. Then you subtract from current screenspace coordinate to get delta screenpos.

If that is the case that you do have that ability already, let me know and I can post back more information on how to do the clip space transformation. There is actually a Clip Space transform material function you can use for inspiration, but be warned the the bias you see in there using 0.5 and 1 is actually wrong and will break if you resize the viewport among other things but it will appear to work as long as you don’t mess with it. That is something on the list for me and another tech artist to fix soon. DanielW mentioned that the code function “ScreenPositionToBufferUV” is what we want to use there but I have not had time to investigate further yet.

If this isn’t making sense or I’m making some wrong assumptions, sorry about that… I am not exactly an expert yet on the subject of how our various buffers and clip space transforms always work. I can get somebody more knowledgeable to help if so.

, could you talk to your motion blur expert and ask him if there is a way for me to write the buffer to a texture or save it somehow? Because I think that is exactly what I want to do.

Regarding the clip transformations, I’m going to think about it to see if i can work out the math, but this sounds like another possibility of how to do it!

Thanks again for being so helpful

Do you have any extra documentation regarding clip space transforms and world to clip transform, because I am having a little bit of a hard time understanding them.

I just found some documentation for UE3 and it details very clearly how to get access to the motion blur vectors, see MotionBlurand http://udn.epicgames.com/Three/MotionBlurSkinning.

Do you know of similar functionality for UE4? Because this seems like the most ideal solution to what I am looking for.

I think I may have figured out a potential solution using a Custom HLSL expressions node and using it to access the GBUFFER to output screen space velocity.

In DeferredShadingCommon.usf, there is function called GetScreenSpaceData, is there a way to call that function from within my custom node?

MaterialFloat2 UV = MaterialFloat2(ScreenAlignedPosition(Parameters.ScreenPosition).xy);
FScreenSpaceData ScreenSpaceData = GetScreenSpaceData(UV, false);
return ScreenSpaceData.GBuffer.Velocity.xy;

But I get the error:
Error [SM5] error X3000: unrecognized identifier ‘FScreenSpaceData’
Error [SM5] error X3000: unrecognized identifier ‘ScreenSpaceData’

I believe you may be on to something with your approach, but materials do not get automatic access to everything in the deferred rendering pass. The flow is more going the other way.

It may be possible but I am not sure. There is another thread around here where somebody added code access to skeletal mesh pre-skinned vert positions by passing it to the shaders. Maybe you can use some of that as something to try. But it may not be so simple but I will ping Brian to check it out.

Hey Brian suggested this:

Try placing a Scene Texture node somewhere in your material node network. You can multiply it down to something almost nonexistent like *0.00001 and then add it to any input as long as its plugged in. That flips a switch to pass more gbuffer access to the material.

You will need to make sure the project setting to “support accurate velocities for vertex deformation” as well. Not sure if this will work but it may.

Is this what you are talking about?

Its seems to recognize the functions now, or maybe a new error is preventing the compiler from reaching that error.

New error codes:
Error [SM5] (Node SceneTexture) Coercion failed: MaterialFloat3 Local0 = TransformViewVectorToWorld(MaterialFloat3(0.00000000,0.00000000,1.00000000));
: float3 -> float2
Error [SM5] SceneTexture expressions cannot be used in opaque materials

p.s. I turned accurate velocities on in the project setting.

Your material will always have to be translucent in order to read any of the gbuffer stuff including scene texture.

The other error is you are trying to use a 3d vector to specify the 2 coordinate UVs for the scene texture. That should just be screenposition there.

My code is compiling and I have no errors but I am only getting 0’s/black screen for the output.

I noticed in DeferredShadingCommon.usf, that it will only write to velocity to GBuffer if WRITES_VELOCITY_TO_GBUFFER is enabled. How do I make sure that setting is enabled?

That is under project settings -> rendering, called “support accurate velocities for vertex deformation”. Turn it on.

And yes my bad I meant translucent or post process materials. The log saying opaque threw me, that warning could be improved.

Ok I had “support accurate velocities for vertex deformation” enabled, can you think of any other reason why I am not getting any useful output?

Maybe you have no dynamic moving objects?

The gbuffer velocity channel only include dynamic objects that have motion. Static world motion (which only blurs when camera moves) is calculated using some math involving previous camera transform. You’d have to calculate that part manually currently. Doesn’t seem terribly crazy but maybe a bit tricky.