Community Tutorial: Ocean Simulation

Great job and good start. But, as mentioned before, you made a wrong assumption about wavevector K being dependent on the grid size. It never was. Grid size is never participating in that calculation, only constant Pi and Patch size. Therefore, the factor you added is unnecessary and problematic, especially if you want to move to more advanced spectrums and spreading functions, that give direct displacement in meters.

Think about it this way. Your downsampled grid calculation should be the same, as running full grid size calculation, but with all amplitudes, that lie outside of downsampled grid, multiplied by zero, if you run two calculations on the same grid size.

Now, when you reduce CPU side grid size, what happens? Exactly the same. If your code is correct, randoms would be the same, wave vector directions would be the same, and magnitudes of wavevector would be the same. The only place in the code, which will yield a different result will be output of a butterfly pass. Why? Because there is a multiplication factor by 0.5 at every operation there.

If you calculate how many times you multiply by 0.5, you will see, that it is fully equivalent to dividing by N squared (which is exactly the normalization factor of inverse DFT you were asking earlier).

You can adjust 0.5 factor in butterfly pass by ratio between gpu and cpu grid sizes, or you can move out 0.5 factor completely, and divide resulting displacements by cpu grid size squared.

Or well, just dont touch anything at all, and multiply final CPU side displacements by ratio of CPU grid size to GPU grid size squared. There is no need to adjust GPU side things at all.

Having this multiplication in butterfly pass adds extra instruction, but ensures that iFFT values do not skyrocket outside of float ranges. This is important if you are indenting to optimize iFFT by storing at lower precision than 32 bits, or if you are simulating large grid size, where 32 bit is not enough at all.

Minding, that one would really want either 256x4 cacade setup for Ocean, or 3x512, or something alike, running exact copy on the cpu is completely out of option. You have to run downsampled. Accuracy of running 2x32 cascades as compared to 4x256 cascades, is somewhat as depicted:


Up to 2 meters for 1.5km sized largest patch

Now, first person character swimming in 500 meters long waves is of course a bit complicated case. You can time slice generating CPU side displacements by performing temporal upsample and instead of picking wave vectors from the center of the spectrum, cover whole range, but at each time step, jitter position to pick one from number of samples, so that you can cover all waves of the spectrum in N frames. And then temporally amortize displacement values over those N frames.

There is a breakpoint, spreading temporal accumulation beyond which will result in loss of accuracy rather than its increase. But 4-16 time slices work quite good. There is other method of generating very accurate displacement from downsampled displacement grid, but I cannot share it for the time being.

Last but not least, turning displacement queries into height(or even worse, intersection) queries normally involves doing few ray marches (2-5 steps, normally). It quickly goes out of hand, especially on server side code.

There is an alternative. After obtaining displacements… simply perform software rasterization of the CPU sided grid, rasterizing height and XY offset with depth test. It adds flat cost, but turns your heavy water height queries into trivial one taps. Again, a breakpoint exists, where running individual queries is faster than raster pass over whole grid. Another benefit, is that you can build min/max height acceleration structure now, turning processing-consuming raycasts into pretty fast things.

3 Likes