On reference hardware, there is no measurable game thread performance difference between running 2x32 grids on nativized blueprints and reference C++ implementation.
On reference hardware, there is an average of 0.14 ms GPU time difference and measurable, but insignificant differences on both threads between running 4x256 grids via blueprint render to texture and reference compute shader implementation in favor of the latter one.
No clue how you decided that overhead is not affordable for it is clearly not the case. Overall, I’m pretty comfy with the mentioned overhead and I don’t have any doubts, that a certain deal of engine users would warmly accept trading that cost for an option to stay way from plugins and/or source builds.