Now I could be off base with this as I’m no expert on the subject. However, I believe that the entire list of vertices needs to be pushed to VRAM regardless of how many you actually change. So it makes sense that a list of ~700k vertices will take more time to send across than a ~200k list. At least my own testing verified this for me, so that’s the assumption I’m rolling with for now until corrected.
So what I did to get around the problem was to break my world up into multiple RMC’s. I used a quad-tree approach to break the world up into even square chunks. Vertices are then assigned to a chunk based on their initial world location. So instead of one RMC with ~700k vertices, you end up with say 16 RMC’s with ~40k vertices each. Then when you update a triangle, you are only updating that one chunk. I can now update multiple chunks every frame with no noticeable hit on frame time.
Bare in mind that this approach results in 16 draw calls as opposed to 1 draw call. That was an acceptable trade-off for me as I update the chunks frequently. You would need to decide what is acceptable to you based on how often you update.
Oh and one more thing… don’t try any of this in BP. Stick to C++ for this kind of stuff. Not just for the performance of it, but because BP would get out of control with the math and what not going on.