Updating instance custom data each frame is slow. Any way to make it faster?

I have a crowd system with ~20k people. It’s based on Hierarchical Instanced Static Mesh component and using bone animation baked to textures.
It is running quite fast, but I want to update Custom Float Data each frame for each instance and it does give significant performance drop. I need to update couple dynamic parameters each frame, that I’m using in vertex shader for animation blending/playback.
I don’t exactly understand how it works lowe level, but I understand data needs to be copied to GPU when changed and it is probably not very fast, but I was hoping the performance hit would not be so big (From 85 Fps to 45 Fps).
I tried profiling with Unreal Insights, but only slow function I found was PrepareFistanceFieldScene. I don’t have any distance fields in the scene and scene is empty (just the crowd). So it’s probably waiting on something else.

So my question is - is there anything that can be done to improve custom data update? How do I properly profile it? Maybe there is another way to pass data to shader in a faster way?
I was also trying to do this through Niagara and using CustomMaterialParameters and I think it was working faster.