Hello,
I am trying to write a new set of buffers that manage memory in a chunked way, similar to how GPUScene wraps a structured buffer for all the primitives in the scene. I wrote my own wrapper for my API purposes which pattern matches off GPUScene: I use a span allocator to track free ranges and use the uploader to upload memory on demand.
I’ve got a few instances of the wrapper buffers with different strides, and it’s exposing some problems that I’m not sure the best approach to deal with - currently I’m putting workarounds into FRDGAsyncScatterUploadBuffer, but it feels extremely fragile and wrong to me.
Problems:
- ResizeBufferIfNeeded() doesn’t work properly with typed buffers because MemcpyResource() in UnifiedBuffer attempts to create a SRV without specifying a type. It seems this helper is expecting us to only work on structured buffers, which makes it hard to use for things like index buffers. I’m trying to hack around that by explicitly creating structured buffers myself with strides that are appropriate.
- If I am uploading a large chunk of memory (currently I upload blocks of 1k bytes per scatter upload) the permutation selected in GetScatterCopyParamsAndPermutation() is EByteBufferStructuredSize::Uint8, however the config which was decided upon earlier considered that the element stride is 16 bytes, not 32 bytes. Therefore we end up writing twice as much data per dispatch as we should, which corrupts data.
- FRDGAsyncScatterUploadBuffer::Begin() does not consider the underlying buffer. It simply assumes 16 bytes, I added a workaround to pass a bool to this function to get the buffer’s desc to use that size to load data properly (Maybe this workaround is not needed due to other hacks I’ve put in?).
- The type of the underlying buffer that I am uploading to is not considered when selecting shader permutations. So if I have a buffer of 4 byte types (eg. index buffers) then it can choose a shader permutations where the structured buffer stride is larger than 4 bytes and it skips elements.
It seems like a problem to me that the config is decided upon before the shader permutation is known, and also the offset which I have to hand it for the scatter destination also needs to be aware of the thread count and permutation.
The other approach which I am starting to look into is to have my buffers be a unified stride that works well with the scatter uploader, but then I need to update RDG to support creation of SRV/UAV objects that allow me to specify the structured buffer stride without inferring it from the underlying buffer. Or I might just roll my own version of FRDGAsyncScatterUploadBuffer, but I would prefer to not have to do that.