Well, if you want it just for one clip, hiring someone would probably the easiest way, if you have no clue ^.^ If you plan to do more videos like this, then it would be better to learn it for yourself
For just one video, some Blender Gurus could do that for you, if the video is good for tracking.
Then feed those camera movements to UE for compositing, if you don´t want do the composition in Blender too.
Unreal Engine Compositing documentation:https://docs.unrealengine.com/en-US/Engine/Composure/index.html
If you want do more videos later:
Save some more money and buy a HTC Vive Tracker and two Base Stations*. The tracker will be attached to your camera, which then will be tracked by the base stations to read the position, rotation etc of your real camera and send those infos to Unreal Engine, in which this data will be used to move the virtual camera to match your virtual scene to your real world camera movements.
Then alter that scene for compositing, so that you get your mixed reality rendered/composited.
*that would be the minimum requirement for camera tracking, as far as i could find out, but HTC Vive Support could help you more with that. You should not need a complete VR Set, since you only want to track one camera, and thats what the tracker is good for. And minimum two base stations, that can actually track/record the Tracker and feed the position/rotation signals into your comp.
About camera tracking with HTC Vive Tracker:
A short overview from an artist, that already uses this successfully:
And a longer in depth video from the same artist about this topic. Note that he uses OBS instead of UE4 for compositing, and therefore he does it more complex (he needs to feed the data to OBS). You probably would only need to record the camera movement and the video, and then use that video with the composition mentioned above:
Have a nice day