Oh I see. Hmm, maybe, not really sure how to add it to raymarching though, seems like a good fit for conetracing, but strange for rays. Will give it some thought.
I do need to finish the changes I’m currently doing and evaluate performance to get some up to date numbers, see how fast AHR really is (on million of rays/second)
EDIT: Maybe a buffer on shared memory? Not completely sure, you need to be really careful with stuff, as you might spend a lot of time on a that on paper it reduces the access count, but on the end it’s slower cause the access pattern is worse, or warp divergence, or occupation, or just you realize what you wrote is the same that the cache does. Been there, GPUs can be ******* D:
Will give it some more thought anyway.