is SceneCapture's performance dependent on FOV and Resolution?

I am a bit confused as to why you do not simply use “Scene Capture Cube” to bake a static texture, that will do pretty much the same thing. Are you trying to do it dynamically or something? I would never expect that to be fast. More than a handful of dynamic captures is a bad idea.

What you are seeing is that there is overhead cost to each scene capture. Yes, the smaller view sizes will technically have less pixels to process but that is probably dwarfed by the overhead of calculating scene visibility and occlusion from each render target.