Hello — I’m using RealityScan v2.1 and need help computing an accurate georeferenced coordinate (EPSG:4326 / WGS84) for a given pixel in one of the aligned images.
Background / what I did
- I reconstructed a model from a UAV image set (images contain GPS). The reconstruction looks correct.
- In the 3Ds window I used Scene 3D → Add Control Points (fig.1) and picked a control point on the reconstructed mesh (fig.2). RealityScan automatically finds the corresponding pixel(s) in the source images. If I select two images the software computes the model point’s coordinate automatically (fig.3). That built-in control point workflow is great.
- RealityScan’s exported coordinate system is EPSG:4326 (GPS / WGS84).
- I then exported the aligned camera intrinsics and extrinsics and exported a depth map (Maps & Masks) for the aligned image.
- Example: image img_012168.JPG; pixel coordinate I want to convert is (712.43, 423.96).
Goal
Compute the actual geographic position (latitude, longitude, and altitude in EPSG:4326 / WGS84) of that pixel from:
- the image pixel coordinate,
- exported camera intrinsics & extrinsics,
- the exported depth map (or other data RealityScan can export).
What I tried and the problem
- I used the depth value from the exported depth map together with the intrinsics/extrinsics to reproject the pixel into 3D and then transform to the exported world coordinates. The result has an error on the order of ~5 meters compared with RealityScan’s own control-point result.
- I suspect a units / convention mismatch (depth units, focal length units, camera/world transform convention, georeference transform), but I’m not sure which convention RealityScan uses in each export.
What I need from the community
-
Confirmation of the correct reprojection pipeline and coordinate conversions for RealityScan v2.1 (concise, canonical steps). My understanding of the mathematical pipeline is:
-
Read pixel (u, v) and obtain depth d for that pixel (ensure d is in meters and represents distance along the camera optical axis).
-
Convert pixel to camera coordinates (assuming pinhole model):
X_c = (u - c_x) * d / f_x
Y_c = (v - c_y) * d / f_y
Z_c = d
where f_x, f_y are focal lengths in pixels, and (c_x, c_y) is the principal point in pixels. -
Transform camera coordinates to world/model coordinates:
X_w = R_cam_to_world * X_c + t_cam_to_world
(confirm whether RealityScan provides camera-to-world or world-to-camera; confirm rotation convention / handedness) -
If the world/model coordinate system is not yet EPSG:4326, apply the model→EPSG:4326 georeference transform exported by RealityScan (confirm whether that transform is a Helmert / affine transform, or direct lat/lon degrees).
-
Final output should be latitude, longitude, altitude in WGS84.
Is this pipeline correct for RealityScan? Are there RealityScan-specific conventions I should be aware of?
If anyone can confirm RealityScan v2.1 conventions for the items above (especially depth units, extrinsic convention, and model→EPSG:4326 mapping), or can point to a short example (Python snippet) that reproduces a correct pixel→lat/lon workflow using RealityScan exports, I would really appreciate it.
If helpful, I can attach:
- one aligned image filename and the pixel coordinate (example: img_012168.JPG, (712.43, 423.96)),
- the depth value at that pixel,
- the exported camera intrinsics/extrinsics file (JSON/CSV),
- the exported model georeference / control-point CSV that RealityScan generated,
so someone can inspect exact numbers and spot the mismatch.
Thank you.
![]()

