Community Tutorial: Ocean Simulation

Deathrey · November 15, 2023, 1:25pm

You will need to literally execute the same code on CPU, including randoms for spectrum. With only one exception. Instead of 256x256 cascade, you should be doing 32x32. Rest remains same. Takes about 2-3 hours to set up. Once you have it running, you can do optimization passes to make the code more CPU friendly.

mtx · December 1, 2023, 6:46pm

Hello, thank you for sharing your ocean simulation. It is by far the best I have seen. I would like to ask if it possible to simulate buoyancy and whether it is possible to apply any underwater effects? (I am new to UE)

mjegen · December 2, 2023, 10:58pm

Do you also need to reduce the BUTTERFLY_COUNT to 5 for the CPU logic?

Edit: I believe the answer is: yes. The butterfly algorithm takes 5 passes to cover the 32 inputs to the PingPongArray.

Also, beware the GroupMemoryBarrierWithGroupSync call in the shader. The CPU logic will need to emulate this behavior which ensures the PingPongArray is initialized before the butterfly passes occur. It will also ensure all the butterfly passes have completed before the final butterfly pass on the 4 cascades.

mjegen · December 3, 2023, 12:24am

Yes, it is absolutely possible. There are two main approaches to simulate buoyancy. GPU readback from the Niagara System and running the simulation on the CPU as described by DeathreyCG in a previous comment.

Underwater effects would normally be done using a separate material in a Post Process Volume. This material could use the same render targets as the ocean material to access the surface displacement to determine whether the camera is underwater or not.

Deathrey · December 5, 2023, 1:07am

There is no need for sync on CPU, as you can execute the whole thing with 32 parallel fors, each doing 32 elements. Last but not least, doing inverse Fourier transform is not mandatory at all either. You can straight up evaluate individual wave signals instead. More so, it gives you valuable advantage for server side logic or any game logic, where you would need to get displacements at time other than current frame time. Of course, you will be limited to about 16-30 wave signals, as opposed to 1024, but what you can do, is importance sample the spectrum, evaluating complex amplitude including current wind settings and randoms, and picking 16 most contributing ones. Such approach is faster when you need only few dozens of height queries per frame.

But for starters, mindlessly copy pasting the code for each pass, wrapping it into for x for y, and resolving any issues on the way will get you with working water height queries in several hours. From there, one can decide whatever next step could be.

mtx · December 6, 2023, 2:51pm

Hi did you manage to get the buoyancy system to work?

gmloffi · December 6, 2023, 8:15pm

No, not with the buoyancy system based on Niagara, this ocean is more complex to apply than the examples in the video, I haven’t even tried other modes, I don’t have that much experience with materials and Niagara, it would take hours to configure

mjegen · December 6, 2023, 11:34pm

Thanks for the reply. There’s a lot to unpack there.

I was able to take the brute force approach and replicate the code on the CPU using a 256 x 256 grid. The results were accurate to within a few hundredths of a cm.

However, when I change the grid to 32 x 32 the results aren’t even close. The spectrum grid ends up a with much higher numbers due to the k^4 denominator. I could start reducing the PatchLength and/or Amplitudes to accommodate but that doesn’t seem like the right way to go about it.

Can you think of anything offhand that would need to be adjusted besides the GridSize, HalfGridSize, Ping Pong array length, and butterfly count constants when reducing the size from 256 to 32?

Deathrey · December 6, 2023, 11:58pm

If you get different magnitudes of wavevector k, you are doing something wrong.

Your 32x32 grid, each point of it maps EXACTLY to the same point of original 256x256 grid. Complex amplitude on downsampled grid should be corresponding to complex amplitude on original size grid.

Downsampled grid is centered on spectrum center, so that downsampled spectrum will contain the same wavewavector magnitudes, as original grid, but only smaller than 2PI times sqrt(16+16), divided by patch length ( while original grid goes beyond that, reaching as high as 2Pi times sqrt(128+128), divided by patch length.)

That, in the end, effectively gets you exact copy of displacement field, but at lower resolution and without majority of short waves.

Any kind of spectrum logic or other settings need not be touched at all.

mjegen · December 7, 2023, 1:47am

I think the issue that I’m having is that in my logic the downsampled 32x32 grid is aligned to the 256x256 grid at (0, 0) based on the grid index. So, I’ve been looking at the WaveVector generated at (-128, -128) for the 256x256 case and comparing that to the WaveVector generated at (-16,-16) for the 32x32 case and assuming they should be the same. Looks like I’ll have to refactor my code so that the downsampled grids are centered on the original grids.

Thanks for your help.

mjegen · December 7, 2023, 11:13pm

I’ve got the algorithm working so that the FFTGrid in the 256x256 and 32x32 cases are initialized with the same values. After completion of the iFFT the 32x32 DisplacementGrid contains values that are much larger than the 256x256 grid. My assumption is that the Displacement grids after the iFFT represent the same area and the 256x256 grid is just higher resolution.

A question that I have is related to the inverse DFT:

I can’t seem to find where the 1/N term is applied. I assume I’m missing something obvious but I’m suspicious that this might be causing some of my issues.

Deathrey · December 8, 2023, 1:39am

There is no 1/N term involved.

If you implemented CPU side code as direct copy of GPU code, I mentioned already, then you do not need to add or alter anything.

Check correctly every step for equality between CPU and GPU first. h0(k), h(t,k)

mjegen · December 12, 2023, 5:47am

CPU iFFT Calculation UE 5.3 (requires C++)

I have implemented the CPU logic that matches the GPU logic in this tutorial. This allows the user to sample the surface displacement at a specific location and can be used to implement things like buoyancy calculations.

The is purely in C++ and I haven’t implemented any hooks to allow it to be called from Blueprints but it shouldn’t be hard for someone to add themselves.

First, I’d like to thank DeathreyCG for their effort on the ocean tutorial. It’s a great piece of work and very well laid out. The addition of the links to the background material and theory behind the implementation was very useful.

There were 3 main hurdles to address when building the CPU logic:

Synchronizing random number generation between GPU and CPU.
Matching displacement magnitude at different grid sizes.
Achieving an acceptable accuracy of displacement between GPU and CPU
Achieving an acceptable level of performance on the game thread.

I’ll break down each of these issues in separate sections.

Random Number Generation

The Random node used in the ocean tutorial is non-deterministic. I replaced it with the deterministic version named “Seeded Float Random”. This requires using an identical seed on both the GPU and the CPU. The DispatchThreadId on the GPU and the iteration indices on the CPU are used to generate the same random value.

It’s worth noting that the “Seeded Float Random” also relies on a 4th seed implemented as a static variable in the shader which is incremented each time a random number is generated. This behavior was replicated in the CPU logic, as well.

I left Seed 2 and Seed 3 set to 0. I’m not sure if this will affect the entropy of the random numbers generated but I didn’t get a chance to dig into that.

Matching Displacement Magnitude

The Niagara System will generate different displacement magnitudes for different grid sizes. For example, dropping the grid size from 256 to 64 results in the following:

Dividing the displacements by 16 results in values that match the 256 grid size. I believe this is due to the fact that the Phillips spectrum model used has a 1/k^4 term which is the length of the wave vector used to populate the spectrum. The k value depends on the grid size.

I added a displacement factor to the Niagara System as a user parameter and modified the GPU code to divide the displacement by this factor. Alternatively, you could adjust the Amplitude values when modifying the grid size instead but I found this made it easier to play with different grid sizes. Here is the grid size at 64 with the displacement factor of 16 and the original parameters from the tutorial:

Displacement Accuracy

Originally, I was following the advice of the tutorial and running a grid size of 32 on the CPU and 256 on the GPU. It became obvious that the displacements calculated at the same location resulted in a material margin of error. I was seeing differences as large as a couple of meters using the parameter values from the tutorial.

For use cases with large floating objects this might be acceptable, but my use case requires calculating the surface of the water to allow a player to swim in high seas. So this wasn’t going to work for me.

In order to achieve an accurate displacement at a specific location the GPU and CPU were going to have to run with matching grid sizes.

It is also worth keeping in mind that getting the actual height of the surface at a specific point is not trivial. My implementation will generate the displacement (horizontal and vertical) at a given location. Additional logic will be required to find the height of the surface at a specific location. This video describes the issue at around 13:40.

Performance

This brings us to the meat of implementation. I tried increasing the grid size on the CPU to 64 without any optimizations. The performance was awful. It was consuming about 20ms on the game thread.

To address this required leveraging task parallelism and vector parallelism. Unreal has mechanisms that make this fairly trivial. A ParallelFor handled the task parallelism and ISPC handled the vector parallelism.

The OceanCalculate counter measures the impact to the game thread. It runs at about 0.7ms on my AMD Ryzen 9 5900X 12-Core processor. The VectorRowPass and VectorColPass iterations run on multiple threads so their total time is higher but it is a concurrent workload.

Using the Project

In order to change the grid size of the GPU logic, you need to:

Modify the GridSize and HalfGrid Size in the FX_OceanWater_SetInitials module
Modify the size of all the render targets in the FX Ocean Water Set Render Targets section of the WaterSim Emitter
Modify the Dispatch parameters in the RowPass and ColPass stages of the WaterSim Emitter
Modify the LENGTH and BUTTERFLY_COUNT defines (BUTTERFLY_COUNT should be equal to log2(LENGTH)) in OceanWaterFFT.ush

In order to change the grid size of the CPU logic, you need to:

Modify the GPU_GRID_SIZE, GRID_SIZE and BUTTERFLY_COUNT defines in OceanFFTData.h
Modify the GRID_SIZE define in OceanFFTCalculator.ispc

The GPU_GRID_SIZE define needs to match the GridSize used in the GPU calculations. This allow the random numbers to generate on the CPU to match those on the GPU.

The project contains a reference to the Unreal Water plugin. It’s not technically required, but it was an easy way to get a surface on which to apply the Ocean Material.

The project was created based on the empty game template in the Unreal Editor. I replaced the default map with a new map. It is a C++ project and was setup to use VSCode as the IDE. You can use the OceanSampleEditor (DebugGame) configuration to launch the editor.

The ispc compile step generates some warnings about performance that can be ignored.

I added a shader directory to the primary game module in order to allow relative references to the shader files in the Niagara System.

Console Commands

Enter “ocean.ShowDisplacement 1” console command to show the surface points calculated on the CPU in the editor.

Enter “stat ocean” console command to show the performance counters associated with the CPU calculations.

Here is a link to the project files:
OceanSample.zip (3.9 MB)

@Deathrey - Feel free to use this to supplement your tutorial if you feel it would add value.

Deathrey · December 13, 2023, 2:07am

Great job and good start. But, as mentioned before, you made a wrong assumption about wavevector K being dependent on the grid size. It never was. Grid size is never participating in that calculation, only constant Pi and Patch size. Therefore, the factor you added is unnecessary and problematic, especially if you want to move to more advanced spectrums and spreading functions, that give direct displacement in meters.

Think about it this way. Your downsampled grid calculation should be the same, as running full grid size calculation, but with all amplitudes, that lie outside of downsampled grid, multiplied by zero, if you run two calculations on the same grid size.

Now, when you reduce CPU side grid size, what happens? Exactly the same. If your code is correct, randoms would be the same, wave vector directions would be the same, and magnitudes of wavevector would be the same. The only place in the code, which will yield a different result will be output of a butterfly pass. Why? Because there is a multiplication factor by 0.5 at every operation there.

If you calculate how many times you multiply by 0.5, you will see, that it is fully equivalent to dividing by N squared (which is exactly the normalization factor of inverse DFT you were asking earlier).

You can adjust 0.5 factor in butterfly pass by ratio between gpu and cpu grid sizes, or you can move out 0.5 factor completely, and divide resulting displacements by cpu grid size squared.

Or well, just dont touch anything at all, and multiply final CPU side displacements by ratio of CPU grid size to GPU grid size squared. There is no need to adjust GPU side things at all.

Having this multiplication in butterfly pass adds extra instruction, but ensures that iFFT values do not skyrocket outside of float ranges. This is important if you are indenting to optimize iFFT by storing at lower precision than 32 bits, or if you are simulating large grid size, where 32 bit is not enough at all.

Minding, that one would really want either 256x4 cacade setup for Ocean, or 3x512, or something alike, running exact copy on the cpu is completely out of option. You have to run downsampled. Accuracy of running 2x32 cascades as compared to 4x256 cascades, is somewhat as depicted:

Up to 2 meters for 1.5km sized largest patch

Now, first person character swimming in 500 meters long waves is of course a bit complicated case. You can time slice generating CPU side displacements by performing temporal upsample and instead of picking wave vectors from the center of the spectrum, cover whole range, but at each time step, jitter position to pick one from number of samples, so that you can cover all waves of the spectrum in N frames. And then temporally amortize displacement values over those N frames.

There is a breakpoint, spreading temporal accumulation beyond which will result in loss of accuracy rather than its increase. But 4-16 time slices work quite good. There is other method of generating very accurate displacement from downsampled displacement grid, but I cannot share it for the time being.

Last but not least, turning displacement queries into height(or even worse, intersection) queries normally involves doing few ray marches (2-5 steps, normally). It quickly goes out of hand, especially on server side code.

There is an alternative. After obtaining displacements… simply perform software rasterization of the CPU sided grid, rasterizing height and XY offset with depth test. It adds flat cost, but turns your heavy water height queries into trivial one taps. Again, a breakpoint exists, where running individual queries is faster than raster pass over whole grid. Another benefit, is that you can build min/max height acceleration structure now, turning processing-consuming raycasts into pretty fast things.

mjegen · December 13, 2023, 2:39am

I’d say there is likely something wrong in my GPU implementation then because changing the Grid Size from 256x256 to 32x32 results in the same issues I was seeing in my CPU implementation.

I assumed that my GPU implementation was correct but it appears like something is not working correctly.

Sengchor · December 13, 2023, 10:45am

Hello. I followed the tutorial. But I don’t know what I missed or did wrong. My render targets (RT_OceanPixelAttribsB_Casc0, 1, 2, 3) are missing the blue color. Can anyone give me the clue and where should I debug?

mjegen · December 13, 2023, 12:52pm

The PixelAttribibutesB_Casc RenderTargets contain the Foam value in the B channel. This value is calculated in the FoamAndExport stage of the WaterSim emitter. It is written to the RenderTarget in the ExportPixelCascade0 through ExportPixelCascade4 stages.

Your screenshot doesn’t necessarily look wrong for what you would expect to see for Cascade0. It depends on the foam parameter values.

Sengchor · December 13, 2023, 2:57pm

Thank you very much for your answer. You’re right. I set the wrong value of PerCascadeFoamParameters. I messed up the values because I set them by following the order of the variables without reading their names. Thank you.

mjegen · December 13, 2023, 3:51pm

If I am understanding you correctly then what you are saying is that if you drop the grid size from 256x256 to 128x128 in the Niagara System then you would need to divide the displacements by 4 in the 128x128 case to match the displacements in the 256x256 case (ignoring the differences created by the higher frequency spectrum outside the [-128,-128] to [128,128] range).

If that is true then I think my GPU implementation is producing the expected results. I only introduced that factor to make it easier to change grid size while testing. Although, a less confusing option may have been to just multiply the displacements by GridSize / 256 for testing purposes which would eliminate my extra user parameter.

mtx · December 18, 2023, 12:35pm

Hi, thank you for sharing your solution. Do you know how to get this working for 5.2 as the project wouldn’t compile