Hey @gabaly92
Nice work! You are right about the HLSL runtime, this is work in progress and missing most of the operators. With RDG, you also took the biggest challenge, as it is the most complex to get it running.
So first of all: If you have data on cpu and you want to run a model on it on gpu but then get the results on cpu again, I would recommend to use the INNERuntimeGPU interface (and probably the ORTDml runtime) as it handles the upload and download for you.
RuntimeRDG is meant to be used when you want to run a network as part of rendering a frame (e.g. post processing), e.g. consuming the resource that is generated during frame rendering and consume the output which then contributes to the final output.
However, for the fun of the exercise you can indeed do what you did and try to do the up- and download manually. But there are unfortunately a couple of issues with the code and I recommend you reading about the Render Dependency Graph first. Especially the part on uploading buffers and buffer extraction.
E.g. you create the buffers on one GraphBuilder but then use it in the other (which only works if you register or convert it to external), you would typically move this code into the dispatch function, to allow RDG to reuse resources. Also you use the upload mechanism on the output buffer, but what you want there is to download the data from the buffer into your array.
So:
- Create the buffers inside ENQUEUE_RENDER_COMMAND on the same builder and set the input bindings there
- Looks correct
- It crahses because you use buffers that are not valid anymore as they belong to another graph builder
(4. You will not get any results back, as you are uploading your output array)
Hope that helps! As mentioned, it is difficult so dont be demotivated!