Also interested in this. Fortunately in my case, I was instancing hundreds of thousands of quads for runtime procedural foliage growth. I was able to fake transform scale changes using per instance custom data and modifying UV scale.
This obviously wouldn’t work the same for meshes that aren’t planes, or transform position, however I’m playing with the idea of using world position offset to see if that could be a viable work around to again “fake” it.
Another idea I’m playing with is to apply translations with cpu, but only mark the renderer dirty once all instances have been modified. Doesn’t solve the broad issue of CPU->GPU->CPU, but could work for some cases.