Volume-preserving soft-bodies, which is what you’d use in your case, are nearly ready to be used. I need to get the rendering and component bits put in on the plugins side.
All VD components support multi-threaded simulation. This is a tough one to answer without running a test, but I’d ballpark it at 500-1000 depending on how you want collision done and how many simulation sub-steps are done each frame. I’ll try to get a sphere test setup and get you more accurate numbers, but as a general rule particle count doesn’t affect performance as much as collision checks and constraint solver iteration counts.