I also did a fair bit of research into real-time PT a while ago (before NVIDIA released their RTX remix toolkit and similar). I have good news, and I have bad news.
Good news: real-time path-tracing is viable in real-time, and NVIDIA has made a branch of UE that supports it. It can, depending on hardware and scene configuration, actually be faster than lumen, but that really depends on the scene and what shaders are being called.
Bad news: path-tracing, by and large, won’t be faster than rasterization (saying this with many many caveats), for a few different reasons, but one of them is occupancy.
(Super technical here, also very simplified):
Summary
When rendering a scene (and if I misunderstand a computer graphics concept someone please call me out), the pixel shaders in your graphics card are provided information about the scene from buffers (normal, albedo, etc), and perform math to calculate a lighting value. The color of the metal and its’ specularity and its’ normal are computed with a lighting vector to generate a specular highlight, just as an example. Every pixel shader is working simultaneously in some functional block, and every pixel shader would have to finish their task before that functional block could receive new instructions (again, massive simplification by someone who isn’t a graphics programmer). When the pixel shaders are performing simple lighting math, like a specular highlight (N.L), they all generally finish up at roughly the same time, which means the entire functional block isn’t stalled out waiting for just a few pixel shaders to finish up.
Path-tracing, however, isn’t like that.
If you were to path-trace a scene, the visibility and direct lighting portion of the rendering (direct light and shadow) wouldn’t be too different from rasterization in either the look or the performance. But once you start tracing rays for indirect lighting, things get very messy. Maybe a ray will bounce into a light source and resolve with very little computational time. Maybe a ray will shoot off into a distant portion of the scene and take forever to finish. Since all the rays are traversing the scene in very different ways, suddenly your occupancy gets really wonky, and most of that functional block could be stalled out waiting for those long rays to resolve themselves.
And there’s also the issue of coherence, where ray-tracing a scene requires a lot of data, more than can be fit into your GPU’s cache at any one time. If a ray is traversing the scene and suddenly realizes it doesn’t have the data to figure out what it actually hit, then it has to pull that data from the (much slower) GPU memory, which again, stalls out the GPU, and creates what’s known as ‘cache-thrashing’ as memory is brought in and out of GPU cache to render a scene.
And then there’s still the cost and issues with shader coherence, which I won’t get into, but it’s another can of worms.
That demo running on Shadertoy basically runs into none of these issues because it’s small. Don’t get me wrong, it’s definitely using some inventive and intelligent sampling algorithms, but if all you have in your scene is a light, three spheres, and a cube, you can easily fit that into cache, and it won’t cost much of anything to traverse. But when you have massive scenes with billions of triangles, that cost explodes. You suddenly need really powerful algorithms to cut those costs down, and even they can only do so much. Then there’s a conversation about culling and BVH construction methods and skinned geometry support and a whole lot else.
Lumen does everything it can to avoid all the problems real-time PT suffers from. Their tracing rays from probes keeps traversal cost predictable and low. Caching GI to textures lets them reuse data and avoid crazy occupancy issues, and denoising means they can do a lot with very little. Even NVIDIA’s real-time path-tracer still uses plenty of caches and denoising trickery.
You could probably easily make a 60FPS path-tracer game on console, if the scene were trivial. Game scenes are complex though, which generally puts them above what the current technology can handle. With more progress in denoising, intelligent sampling, memory management, who knows, but that’s the current state of the art.
None of this is meant as a criticism of your desire, I still really want real-time PT and I can’t wait until everyone has it. These are just the current roadblocks that stop us from being there.