Stitching point clouds from multiple cameras and generating multiview 3D clouds

GL
May 30, 2022
11 min read

Updated: Jun 7, 2022

This article will show you how to create one combined multiview point cloud with data from many cameras. We will build on the learnings from our previous article and generalize the stereo camera concept to include all possible unique camera pairings as stereo cameras. Each stereo camera pairing then will contribute data from its unique perspective and fill in gaps that other camera pairs physically can not observe.

This article will cover the following topics:

Rectification and stereo matching of multiview camera pairings.
Triangulating point clouds from each multiview camera pairing.
Using the extrinsic camera poses to stitch all the point clouds together.
Review the structural changes between our stereo and multiview example code.

Warning: this article builds on our previous article. We strongly recommend you read that article first if you have not yet done so.

Download our examples source code

You don’t need to copy and paste the code snippets we show here and puzzle them together. Just check out the code for the example from our examples repository and follow the instructions in the accompanying README.md to get up and running with ease.

Introduction

The following sketch outlines how many camera pairs scan many point clouds. The illustration also shows how many individual point clouds are collected into one global point cloud. Note that every point contained in a point cloud must be visible from two cameras simultaneously, as otherwise, it would be impossible for us to triangulate it.

Sketch of overlapping stereo-visible regions. The resulting point clouds are marked in thick black and only exist for points that are viewable from both cameras of a pairing.

The input data we will use for this example is shown in the following figure.

The screenshot below is the result of the current article and its accompanying example code. The image depicts the fused point clouds created by the camera pairings 1+2, 2+3, and 4+2. The camera’s locations and orientations are also rendered as a visualization hint.

Visualization of a stitched point cloud from the camera pairings 1+2, 2+3, and 4+2.

Visualization of a stitched point cloud from the camera pairings 1+2, 2+3, and 4+2. In red, green, and blue, the edges of the individually observed point clouds are highlighted and correspond to the projected boundaries of cam1, cam2, and cam4, respectively.

A few notes on what the screenshot depicts:

The cameras 1-4 create the unique stereo camera pairings: 1+2, 1+3, 1+4, 2+3, 2+4, and 3+4.
Many stereo matchers have difficulty with camera pairs where baselines and/or camera-to-camera rotations are large. This is why the image only depicts the camera pairings 1+2, 2+3, and 4+2.
Take note of the rear wall and floor on the left side of the image. That part of the image is only visible by the camera pairings 2+3 and 4+2. Combined with slightly different exposure settings of pairing 1+2 vs. the others causes a visible brightness-texture line on the floor and wall.

Example overview

With the teaser out of the way, let’s dive into the example and get our bearings. The construction of multiview point clouds is different from constructing a single stereo-view point cloud in one significant way: after stereo matching and triangulation, the point cloud’s coordinate frame is centered on the left camera of each pairing. Consequently, we must apply two pose transforms to each cloud to get a consistent multiview point cloud.

This is the result we get, if we do not apply the pose corrections

To efficiently handle any multiview calibration configuration, we restructured the stereo example codebase in a few key ways that we will discuss next.

Example code modules

When handling multiview point clouds, we must repeat the following steps for each camera pairing:

Load the calibration data.
Construct and cache the undistort rectify maps.
Load the image pair.
Rectify the image pair.
Stereo match the image pair.
Triangulate disparity map yielding a point cloud.

In addition, to create multiview point clouds, we need to carry out the following steps:

Apply the inverse rectification rotation to the current camera’s point cloud. Remember that the rectification process virtually rotates the camera images so that the camera’s optical axes become parallel and pixel rows become aligned between both pictures.
Apply the inverse extrinsic pose transform to the point cloud.

These steps will give us as many point clouds as we have camera pairings. We will also have placed each of these point clouds in the correct coordinate frame, which leaves us with the final task of visualizing everything.

As a consequence, we decided to structure the codebase as follows:

camcalib_multiview_pointcloud_example
├── camcalib_tutorial_data
│   ├──     ...
├── modules
│   ├── Pose.py
│   ├── calib_viz_utils.py
│   ├── camcalib_loader.py
│   ├── opencv_matcher.py
│   └── raft_matcher.py
└── main.py

camcalib_tutorial_data contains the raw input images and calibration_result.yaml required to execute this example. You can use your data here instead if you like.
modules contains Pose.py and *_matcher.py that we are already familiar with from our previous stereo view example. In addition, we have two new modules:
- calib_viz_utils.py helps us visualize the intrinsic and extrinsic calibration alongside our multiview point clouds. Consider this a simple helper utility for now. We will dive into its details in a future article.
- camcalib_loader.py will aid us with loading the YAML file and constructing undistort-rectify maps for all camera pairings.
main.py, when run, launches our example. Check out the README.md to see how to set everything up and run the example.

Steps 1 and 2: loading calibration data and constructing undistort-rectify maps

This is where we make use of the camcalib_loader.py module.

# 1. import CamcalibLoader module.
from modules.camcalib_loader import CamcalibLoader

# 2. specify calib file.
calibration_file_name = "camcalib_tutorial_data/calibration_result.yaml"
# 3. specify camera pairs in left to right sequence.
camera_pairs = [("cam1","cam2"),
                ("cam2","cam3"),
                ("cam4","cam2")
                ]
# 4. load the calib data and construct the undistort-rectify maps.
calibration = CamcalibLoader(calibration_file_name, camera_pairs)

With that, the calibration data is loaded, and undistort-rectify maps for the camera pairings we specified are created. If you set camera_pairs=None, the module automatically creates its own list of all possible unique camera pairings.

We specify which camera pairs to use for two reasons. First, the stereo matchers we use have difficulty with large baselines and camera-to-camera rotations. Second, RAFT Stereo expects a left followed by a right camera image. So by manually specifying the camera pairs, we ensure that only good pairs in the correct sequence are used.

To make use of the calibration object we created, let’s discuss its member variables:

.cameras is a list of all camera names contained within the YAML file.
.camera_pairs either contains
- the list of the camera pairs we specified in camera_pairs or
- if we specify camera_pairs=None, .camera_pairs contains a list of all unique pairings of the cameras listed in the .cameras member variable.
.camera_poses contains the extrinsic pose for each camera listed in .cameras.
.camera_pair_undistort_rectify_maps is a dictionary that contains, for each pair in the member variable .camera_pairs, the corresponding undistort-rectify maps and rectification data.

Example for .camera_pair_undistort_rectify_maps contents

Step 3: loading image data

We use python dictionaries that we index with camera names or camera pairs to make our lives easier. As we already have a list of camera names loaded, and our raw image data files live in a sub-folder per camera, loading all the images becomes a one-liner

images_raw = {cam:cv2.imread("camcalib_tutorial_data/{cam}/001.png"\
              .format(cam=cam)) for cam in calibration.cameras}

You can read the above line as: for every camera name in the list of camera names, load 001.png from the folder named after our camera and store it in the python dictionary under the name of our camera. This results in the following dictionary object:

images_raw = {
 'cam3': array(...), 
 'cam4': array(...), 
 'cam1': array(...), 
 'cam2': array(...)
}

Step 4: rectify all image pairs

We want to apply the undistort-rectification maps for every camera pairing we specified in camera_pairs. Note that each rectification map is specific to a camera pair. This means if you rectify cam1 for pair cam1+cam2, the rectified cam1 image will be different from the cam1+cam3 pair’s rectified cam1 image. For this reason, we need to store the rectified images in pairs as they now inseparably belong with each other.

# 1. set up a container for rectified image pairs.
imgages_undistorted_rectified = {}

# 2. iterate over all specified camera pairs for which we 
#    constructed undistort-rectify maps.
for pair in calibration.camera_pairs:
    # 3. extract the undistort-rectify maps.
    rect_map1 = calibration.camera_pair_undistort_rectify_maps[pair]["rect_map1"]
    rect_map2 = calibration.camera_pair_undistort_rectify_maps[pair]["rect_map2"]
    
    # 4. apply undistortion and rectification to raw images.
    img1_rect = cv2.remap(images_raw[pair[0]], *rect_map1, cv2.INTER_LANCZOS4)
    img2_rect = cv2.remap(images_raw[pair[1]], *rect_map2, cv2.INTER_LANCZOS4)
    
    # 5. store the undistorted and rectified image pair.
    imgages_undistorted_rectified[pair] = (img1_rect, img2_rect)

When the loop terminates, the dictionary imgages_undistorted_rectified will contain the rectified image pairs for every corresponding item in camera_pairs.

Check out the expand below to see the rectification for all possible camera pairings.

Visualization of all rectified pairings

Steps 5 and 6: Stereo match the image pairs and triangulate 3D points

As in our previous example, we construct a matcher object from either of our wrapper classes RaftStereoMatcher or StereoMatcherSGBM and use its .match() and .reconstruct() member functions. In our code snippet here, we will use the RAFT Stereo wrapper.

# 1. Setup the stereo matcher
matcher = RaftStereoMatcher()

# 2. Run the matcher for each undistorted and rectified image pair
for pair in calibration.camera_pairs:
    disparity = matcher.match(*imgages_undistorted_rectified[pair])
    Q = calibration.camera_pair_undistort_rectify_maps[pair]["Q"]
    rect_img = imgages_undistorted_rectified[pair][0]
    pointcloud = matcher.reconstruct(disparity, rect_img, Q)

This will give us point clouds as seen by the rectified left camera of each pair. Next, we will correct the extrinsic and rectification poses to stitch the individual clouds into one big consistent cloud.

Multiview steps 1 and 2: inverse rectification and inverse extrinsic pose

Recall the Rectification section of our previous article, or check out the following expand for a quick visual refresher. To understand the next steps, it is essential to know what we have done to our virtual cameras thus far.

Reminder: rectification transformation

To rectify the image pairs, we had to apply a virtual rotation to both cameras of the pair. This virtual transformation changes the image and the final extrinsic pose of the new virtual camera. We use what this virtual camera sees to triangulate our point cloud. Consequently, we must account for this in the final extrinsic pose we correct.

Reformulating the previous statement, we are looking for a pose transformation that takes 3D points observed in the rectified camera frame into a single user-designated coordinate frame. This involves the following steps:

Transform the 3D points from the rectified to the unrectified camera frame (extrinsic camera pose frame).
Then from the unrectified frame to a reference camera frame designated by camcalib. Camcalib selects the camera that has the most shared observed features with all other cameras.
Finally, from the camcalib-designated frame to a user-designated coordinate frame.

The following graph shows the transformation sequence required to build a consistent accumulated point cloud expressed in camera frame 2 (specified by the user), from all the point clouds currently defined in their respective rectified camera frames.

The transformation graph that transforms the points of a point cloud from the rectified camera frames (rcam) to one user-specified coordinate frame (cam2).

Detailed explanation of the transformation sequence graph

Let’s check out the code required to do this for any multiview camera setup you may have.

# 0. Specify which camera we want the world frame to be centered on.
#    Feel free to change this to any other cam if you like.
reference_camera = "cam2"

# 1. Setup the stereo matcher
matcher = RaftStereoMatcher()

# 2. Create a pointcloud for each undistorted and rectified image pair
#    and place it on the reference_camera coordinate frame.
for pair in calibration.camera_pairs:
    # 2.1 Build the pose transformation that takes points from 
    #    rect1 -> reference_camera.

    # 2.1.1 construct the rectification pose transform.
    #     This is the R1 result-prameter of stereoRectify() of
    #     the current camera pair.
    R1 = calibration.camera_pair_undistort_rectify_maps[pair]["R1"]
    P_rect_cam = Pose(R1, np.zeros(3))
    
    # 2.1.2 Fetch the world-to-cam pose transform from the
    #     extrinsic calibration data of the current pair.
    P_cam_world = calibration.camera_poses[pair[0]]
    
    # 2.1.3 Build the rect-to-world frame pose-transform.
    #     Note: P_cam_rect  = P_rect_cam.I
    #           P_world_cam = P_cam_world.I
    #     Thus you can read:
    #           world <- rect = world <- cam <- rect
    P_world_rect = P_cam_world.I @ P_rect_cam.I
    
    # 2.1.4 Build the rect-to-reference_camera pose-transform
    #     You can read:
    #           ref <- rect = ref <- world <- cam <- rect
    P_ref_world = calibration.camera_poses[reference_camera]
    P_ref_rect = P_ref_world @ P_world_rect
    
    # 2.2. Triangulate the disparity map and transform point cloud
    #    to reference frame
    
    # 2.2.1 Prepare parameters and color data.
    Q = calibration.camera_pair_undistort_rectify_maps[pair]["Q"]
    rect_img = imgages_undistorted_rectified[pair][0]
    
    # 2.2.2 Compute the disparity map.
    disparity = matcher.match(*imgages_undistorted_rectified[pair])
    
    # 2.2.3 Reconstruct 3D point cloud and apply rect->ref pose transform
    pointcloud = matcher.reconstruct(disparity, rect_img, Q, P_ref_rect)

The main difference to our stereo view example is that we need to compute the extrinsic pose correction for each point cloud of each camera pairing and apply it individually.

Visualizing the results

With the code so far, you will get one large point cloud that lives in any coordinate frame you like (cam2 in our example). In the stereo view example, we used open3d to display our point cloud.

Let’s take that a step further and visualize what this multiview example does under the hood by expanding the last code listing a bit. For brevity, we will leave out the details required to create the pointcloud variable.

# Enable the horizontal splitting of each camera pairings
# point clouds.
HORIZONTAL_SPLIT_CLOUDS = True

# ...

# Construct a pose offset increment. This will shift every new
# point cloud we add to the geometry by 2 meters along the
# x-axis so we can see each camera pairings resulting cloud.
cloud_offset_pose_increment = Pose.from_axis_angle(np.zeros(3), 
                                    np.array([0, 0, 0]))

# If HORIZONTAL_SPLIT_CLOUDS==False the displacement will be 0.
if HORIZONTAL_SPLIT_CLOUDS:
    cloud_offset_pose_increment = Pose.from_axis_angle(np.zeros(3), 
                                    np.array([2, 0, 0]))

# Generate geometry for point cloud visualization.
# The first element we add is the coordinate frame mesh
# to show the frame origin and orientation.
scene_geometry = [o3d.geometry.TriangleMesh.\
create_coordinate_frame(size=0.15, origin=[0, 0, 0])]

# Prepare the pose accumulator that tracks the current point
# cloud displacement.
cloud_offset_pose_accumulator = Pose.from_axis_angle(np.zeros(3), 
                                    np.array([0, 0, 0]))

# Run the matcher for each undistorted and rectified image pair
for pair in calibration.camera_pairs:
    # ... setup code for matcher.reconstruct() inputs removed for brevity
    
    # Add extra displacement for the debug visualization
    P_ref_rect_moved = cloud_offset_pose_accumulator @ P_ref_rect
    
    # Reconstruct 3D point cloud and apply rect->ref pose transform
    pointcloud = matcher.reconstruct(disparity, rect_img, Q, P_ref_rect_moved)
    
    # Add the new point cloud to the scene visualization geometry
    scene_geometry.append(pointcloud)

    # Generate scene geometry for each camera of the current pairing
    for cam in pair:
        # Get the extrinsic pose (world-to-cam pose) for camera from calib data,
        # then convert it from world-to-cam into cam-to-world transformation
        # by inverting the world-to-cam pose.
        P_world_cam = calibration.camera_poses[cam].I

        # Apply reference camera world transform
        P_ref_world = calibration.camera_poses[reference_camera]
        P_world_cam = P_ref_world @ P_world_cam

        # Add extra displacement for the debug visualization
        P_world_cam_moved = cloud_offset_pose_accumulator @ P_world_cam

        # Fetch intrinsic parameters so we can properly render the 3D
        # representation of the cameras.
        intrinsics = calibration.\
        calibration_parameters["sensors"][cam]["intrinsics"]["parameters"]

        # Generate and append camera and name text geometry
        scene_geometry.append(construct_camera(size=0.175, 
              intrinsics=intrinsics, extrinsic_pose=P_world_cam_moved))
        scene_geometry.append(text_3d(text=cam, scale=10, 
              extrinsic_pose=P_world_cam_moved))
        
    # Increment the pose offset for the split-cloud debug visualization.
    cloud_offset_pose_accumulator =\
       cloud_offset_pose_increment @ cloud_offset_pose_accumulator

# Display the open3d geometry and point clouds.
print("Loading geometry into visualizer...")
o3d.visualization.draw_geometries(scene_geometry)

That’s it! It’s basically a few pose transformations applied to scene geometry and point clouds. If you enable the HORIZONTAL_SPLIT_CLOUDS feature, you should get the following image.

Only the pairings with good matcher results.

Visualization of all camera pairings point clouds

If you take a closer look at the 3D visualization, you will see a camera pair and the corresponding point cloud created from that pair. Note that we deliberately added a 2m displacement between each point cloud of each camera pairing to see which cameras contribute to which parts of the overall point cloud.

We get one stitched point cloud when we disable the debug feature HORIZONTAL_SPLIT_CLOUDS=False. In addition, we get all the cameras rendered in their correct positions. The positions of the rendered virtual cameras correspond to the extrinsic calibration of each camera. The snapshot below shows the output we get by disabling the horizontal split debug feature.

Fused point cloud using extrinsic camera poses, see that the floor and calibration pattern are one continuous surface.

To make it completely obvious how the individual point clouds contribute to the stitched point cloud, we have generated the same result but changed the 50 border pixels of each point cloud and depicted it in the next screenshot.

Illustrating the borders of the individual point clouds and their corresponding camera views. The red border corresponds to the edge of cam1s image. Green and blue correspond the cam2 and cam4s visible image boundaries.

Zooming into the point cloud of the calibration pattern we see a nice and flat surface – as we should expect.

Let’s look at the floor more closely. The texture of the fishbone parquet matches up very well between the point clouds. The bright to dark transition line along the floor results from different observed image brightness projected into the scene by different camera pairings.

View of the fishbone parquet texture overlap of the three floor point clouds.

A side view of 3 meters of the floor is also remarkably flat, especially if you consider that it is the result of fusing three individual point clouds via the extrinsic camera poses. The wall to the left of the image unfortunately did not fare as well. Due to low texturedness, the stereo matcher made large errors that show up as a warped point cloud.

Putting it all together

For your convenience, we provide this example in full in our examples repository. Check it out and simply follow the instructions in the accompanying README.md to get up and running with ease.

Conclusions

With this article, we have shown you how to easily match a multiview point cloud and transform all of its components into one consistent coordinate frame. We hope that it helped you push the boundaries of stereo vision and gave you an insight into the various moving parts of a computer vision application that relies on precise camera calibration. We are constantly adding new pieces to help you get familiar with camcalib as well as help you get started with your applications faster. Please also return to check out our other articles that will take you even further.