Matching RealityCapture Camera Calibration with OpenCV for Rendering 3d Model onto Image

I have a pipeline that classifies and generates improved synthetic 3D building models from a photogrammetry scene, exporting each building as an .fbx.synthetic file.

I’m currently working on a texture-generation pipeline for these synthetic building models using LLMs and generative AI.

For each building, the workflow is:

  1. Find the original drone images in which the building appears
  2. Crop the building region from those images
  3. Feed the cropped images into an LLM/VLM
  4. Generate a material and architectural description of the building
  5. Use that description to generate textures

To extract the building regions from the images, I’m trying to render the building into a mask using the corresponding OpenCV camera (while the camera properties are taken from exported RealityCapture camera calibrations).

Current workflow:

  • Render the building model into a binary mask using OpenCV camera intrinsics
  • Apply inverse distortion to the resulting mask
  • Use the mask to crop the original drone image

From the RealityCapture calibration export, I’m using the following camera properties:

  • Camera position/orientation
  • Focal length
  • Principal point
  • k1, k2, k3, k4
  • t1, t2
  • cx, cy
  • focalLength35mmEq

The problem is that even after applying distortion correction, the rendered mask does not align correctly with the building in the original image, especially near the image edges.

My guess is that something in the interpretation of the RealityCapture camera calibration in OpenCV is incorrect — possibly even the proper camera field of view, which I extracted from focalLength35mmEq.

Could you share the exact math RealityCapture uses to project a 3D point into final image pixel coordinates, including distortion?

Also, is there a better or recommended way to accurately extract the visible building region from the original drone images?