Question about the pose and depth correspondance.

lqxisok · June 4, 2025, 2:57pm

Objective

use the rendered rgb image, depth, and pose data to generate a complete reconstruction point cloud of the virtual scene.

Steps

generate image, depth, and pose data. For convenient, I will give some sample below.
compute the transformation matrix from camera actor to world. This step will generate a 4x4 matric contains rotation and translation.
compute the point cloud according to the depth and image data with intrinsic parameter from camera.
apply transformation matrix to the computed point cloud to transform the point cloud within from camera to world.

Results & Questions

After several tries, I still found that multiple point cloud data cannot be completely spliced together, presenting a hierarchical result (see below).

Question：I guess the reason lies in pose transformation or depth data. Is there a way to concatenate these synthesized point clouds?

Sample data

Transformation Matrix

# frame 1
0.1515966672668435	-0.9877408181587248	0.03723609293036194	-4073.306396
0.9592288209060744	0.15610277429648933	0.23560983213796713	986.939026
-0.23853410577258688	2.681893238876132e-07	0.9711341206976513	972.654785
0.0	0.0	0.0	1.0
# frame 2
0.1515966672668435	-0.9877408181587248	0.03723609293036194	-4109.792969
0.9592288209060744	0.15610277429648933	0.23560983213796713	993.744629
-0.23853410577258688	2.681893238876132e-07	0.9711341206976513	972.654785
0.0	0.0	0.0	1.0

Images

image 1

image 2

Depth images

depth 1

depth.zip

read the exr depth data according to

import os
os.environ["OPENCV_IO_ENABLE_OPENEXR"] = "1"
# Here the 1000*100 is the range when rendered data
depth = cv2.imread(depth_path, cv2.IMREAD_ANYCOLOR | cv2.IMREAD_ANYDEPTH)[:, :, 0].astype(np.float32) * 1000 * 100