I thought a bit further about your scenarios and here are some more considerations.
At first using hard hits, also for the cymbals, may sound like a good idea, but in practice it may have more drawbacks than advantages.
Say that you attach the virtual cymbal meshes to the touch controllers. When the user wants to play them, she is going to bring the two controllers quickly one toward the other in the typical play-the-cymbals motion. In VR, the only visual indication to the user of where the controllers are in real space are the cymbal meshes attached to them.
If, when the cymbals get into touch, you detach the cymbal meshes and hold them still at the collision position, it is very likely that the user will loose the reference to where the controllers are, will keep moving them one toward the other resulting eventually in a real-life collision between them. In this way they could even hurt themselves by smashing the controllers hardly one into the other. You don’t want that.
To prevent that from occurring, you have to keep providing some visual feedback on where the controllers are after the hit and, at the same time, hold the cymbals at the collision position. You can use some phantom/semi-transparent copies of the cymbals for that, or another visual cue. Then, when the controllers come back because the user has inverted the movement, you have to re-attach the cymbals to the controllers so that the playing cycle can repeat.
I find that using overlaps, while not being entirely realistic, is much simpler to handle and provides a better user experience.
Anyway, since we are all still learning about what works and what doesn’t in VR, I would say try both options with a number of test users and see which one they like best.