In unreal you’d likely have to capture separate images and assemble the final result in an image DCC.
Alterbatively, screencapture2d should be able to output depth.
The thing about dept is that it isnt set to normalized - at least within the post process materials - so you have to figure out the min/max in order to convert it into something that can fit an immage channel.
Im rather sure the screencapture cannot do this on its own, but check that first.
Say it doesnt work,
Then your path is to have multiple screen captures capture to the same render target (which is a challange if i recall correctly), and the one in charge of depth has to use a specifically made post process material to render.
The resulting RT will essentially be a realtime capture too, so potentially the texture you produce could end up being usable in engine.
Not sure this helps exactly, but at least you have a couple things to poke around with.