How to compute accurate WGS84 coordinates from a pixel using RealityScan v2.1 exports (intrinsics, extrinsics, depth)

Hello — I’m using RealityScan v2.1 and need help computing an accurate georeferenced coordinate (EPSG:4326 / WGS84) for a given pixel in one of the aligned images.

Background / what I did

  • I reconstructed a model from a UAV image set (images contain GPS). The reconstruction looks correct.
  • In the 3Ds window I used Scene 3D → Add Control Points (fig.1) and picked a control point on the reconstructed mesh (fig.2). RealityScan automatically finds the corresponding pixel(s) in the source images. If I select two images the software computes the model point’s coordinate automatically (fig.3). That built-in control point workflow is great.
  • RealityScan’s exported coordinate system is EPSG:4326 (GPS / WGS84).
  • I then exported the aligned camera intrinsics and extrinsics and exported a depth map (Maps & Masks) for the aligned image.
  • Example: image img_012168.JPG; pixel coordinate I want to convert is (712.43, 423.96).

Goal
Compute the actual geographic position (latitude, longitude, and altitude in EPSG:4326 / WGS84) of that pixel from:

  1. the image pixel coordinate,
  2. exported camera intrinsics & extrinsics,
  3. the exported depth map (or other data RealityScan can export).

What I tried and the problem

  • I used the depth value from the exported depth map together with the intrinsics/extrinsics to reproject the pixel into 3D and then transform to the exported world coordinates. The result has an error on the order of ~5 meters compared with RealityScan’s own control-point result.
  • I suspect a units / convention mismatch (depth units, focal length units, camera/world transform convention, georeference transform), but I’m not sure which convention RealityScan uses in each export.

What I need from the community

  1. Confirmation of the correct reprojection pipeline and coordinate conversions for RealityScan v2.1 (concise, canonical steps). My understanding of the mathematical pipeline is:

  2. Read pixel (u, v) and obtain depth d for that pixel (ensure d is in meters and represents distance along the camera optical axis).

  3. Convert pixel to camera coordinates (assuming pinhole model):
    X_c = (u - c_x) * d / f_x
    Y_c = (v - c_y) * d / f_y
    Z_c = d
    where f_x, f_y are focal lengths in pixels, and (c_x, c_y) is the principal point in pixels.

  4. Transform camera coordinates to world/model coordinates:
    X_w = R_cam_to_world * X_c + t_cam_to_world
    (confirm whether RealityScan provides camera-to-world or world-to-camera; confirm rotation convention / handedness)

  5. If the world/model coordinate system is not yet EPSG:4326, apply the model→EPSG:4326 georeference transform exported by RealityScan (confirm whether that transform is a Helmert / affine transform, or direct lat/lon degrees).

  6. Final output should be latitude, longitude, altitude in WGS84.

Is this pipeline correct for RealityScan? Are there RealityScan-specific conventions I should be aware of?

If anyone can confirm RealityScan v2.1 conventions for the items above (especially depth units, extrinsic convention, and model→EPSG:4326 mapping), or can point to a short example (Python snippet) that reproduces a correct pixel→lat/lon workflow using RealityScan exports, I would really appreciate it.

If helpful, I can attach:

  • one aligned image filename and the pixel coordinate (example: img_012168.JPG, (712.43, 423.96)),
  • the depth value at that pixel,
  • the exported camera intrinsics/extrinsics file (JSON/CSV),
  • the exported model georeference / control-point CSV that RealityScan generated,
    so someone can inspect exact numbers and spot the mismatch.

Thank you.

RealityScan_vECejAhqQN