How to compute accurate WGS84 coordinates from a pixel using RealityScan v2.1 exports (intrinsics, extrinsics, depth)

biabiabia66 · January 9, 2026, 7:06am

Hello — I’m using RealityScan v2.1 and need help computing an accurate georeferenced coordinate (EPSG:4326 / WGS84) for a given pixel in one of the aligned images.

Background / what I did

I reconstructed a model from a UAV image set (images contain GPS). The reconstruction looks correct.
In the 3Ds window I used Scene 3D → Add Control Points (fig.1) and picked a control point on the reconstructed mesh (fig.2). RealityScan automatically finds the corresponding pixel(s) in the source images. If I select two images the software computes the model point’s coordinate automatically (fig.3). That built-in control point workflow is great.
RealityScan’s exported coordinate system is EPSG:4326 (GPS / WGS84).
I then exported the aligned camera intrinsics and extrinsics and exported a depth map (Maps & Masks) for the aligned image.
Example: image img_012168.JPG; pixel coordinate I want to convert is (712.43, 423.96).

Goal
Compute the actual geographic position (latitude, longitude, and altitude in EPSG:4326 / WGS84) of that pixel from:

the image pixel coordinate,
exported camera intrinsics & extrinsics,
the exported depth map (or other data RealityScan can export).

What I tried and the problem

I used the depth value from the exported depth map together with the intrinsics/extrinsics to reproject the pixel into 3D and then transform to the exported world coordinates. The result has an error on the order of ~5 meters compared with RealityScan’s own control-point result.
I suspect a units / convention mismatch (depth units, focal length units, camera/world transform convention, georeference transform), but I’m not sure which convention RealityScan uses in each export.

What I need from the community

Confirmation of the correct reprojection pipeline and coordinate conversions for RealityScan v2.1 (concise, canonical steps). My understanding of the mathematical pipeline is:
Read pixel (u, v) and obtain depth d for that pixel (ensure d is in meters and represents distance along the camera optical axis).
Convert pixel to camera coordinates (assuming pinhole model):
X_c = (u - c_x) * d / f_x
Y_c = (v - c_y) * d / f_y
Z_c = d
where f_x, f_y are focal lengths in pixels, and (c_x, c_y) is the principal point in pixels.
Transform camera coordinates to world/model coordinates:
X_w = R_cam_to_world * X_c + t_cam_to_world
(confirm whether RealityScan provides camera-to-world or world-to-camera; confirm rotation convention / handedness)
If the world/model coordinate system is not yet EPSG:4326, apply the model→EPSG:4326 georeference transform exported by RealityScan (confirm whether that transform is a Helmert / affine transform, or direct lat/lon degrees).
Final output should be latitude, longitude, altitude in WGS84.

Is this pipeline correct for RealityScan? Are there RealityScan-specific conventions I should be aware of?

If anyone can confirm RealityScan v2.1 conventions for the items above (especially depth units, extrinsic convention, and model→EPSG:4326 mapping), or can point to a short example (Python snippet) that reproduces a correct pixel→lat/lon workflow using RealityScan exports, I would really appreciate it.

If helpful, I can attach:

one aligned image filename and the pixel coordinate (example: img_012168.JPG, (712.43, 423.96)),
the depth value at that pixel,
the exported camera intrinsics/extrinsics file (JSON/CSV),
the exported model georeference / control-point CSV that RealityScan generated,
so someone can inspect exact numbers and spot the mismatch.

Thank you.