Search
  • GL

Beautiful 3D Point Clouds with camcalib, OpenCV & RAFT Stereo

This article will show you how to use camcalib YAML files to calculate beautiful dense 3D point clouds with a stereo-camera image pair.


Camcalib is just the first step for most computer vision applications. It is an essential step, yet only a means to your ends. With this article, we want to take you on the next steps of the journey – the steps after you have used camcalib – and show you how easy it is to compute precise 3D point clouds. We aim to get you started fast and with as little trouble as possible.


We will cover the following topics:

  1. Loading and parsing the camcalib YAML result file

  2. Removing lens distortion

  3. Rectification of a stereo image pair

  4. Converting the rectified image pair into a dense 3D point cloud with stereo matching

To keep the examples simple and the corresponding code short, we will be using Python and OpenCV to do the heavy lifting.


Download our examples source code

You don't need to copy and paste the code snippets we show here and puzzle them all together. Just check out the code for the entire example from our examples repository and follow the instructions in the accompanying README.md to get up and running with ease.


Loading the Camera Calibration YAML

Let’s get started! Using camcalib in a previous blog post, we calibrated a stereo camera pair and saved the result as a YAML file.

Tip: installing the PyYAML package

With Python, we import the yaml package to read the result file.

import yaml

filename = "calibration_result.yaml"
with open(filename,"r") as f:
  calibration_parameters = yaml.load(f, Loader=yaml.FullLoader)

We print the loaded data to the console to verify that everything works.

print(type(calibration_parameters))
$ <class 'dict'>

print(calibration_parameters)
$ {'sensors': {'cam1': {'extrinsics': {'axis_angle': [0.0, 0.0, 0.0], 'translation': [0.0, 0.0, 0.0]}, 'intrinsics': {'parameters': {'cx': 631.2595405625116, 'cy': 491.5258233725439, 'fx': 2282.3920867826905, 'fy': 2282.5710507986964, 'image_size': [1280, 1024], 'k1': 0.19408319800216575, 'k2': -0.0032982721692749804, 'k3': -0.018097059724866766, 'k4': -0.2839958241290381}, 'type': 'KannalaBrandt'}}, 'cam2': {'extrinsics': {'axis_angle': [-0.0017098846275279435, 0.1681972700321447, 0.04623455960511724], 'translation': [-0.19068022336063817, -0.0017535410879024739, 0.004499676517813966]}, 'intrinsics': {'parameters': {'cx': 615.0347480681874, 'cy': 472.525057978465, 'fx': 2301.3314301312444, 'fy': 2301.140570503219, 'image_size': [1280, 1024], 'k1': 0.18254782022559218, 'k2': 0.4203174919560141, 'k3': -5.89114449668284, 'k4': 25.664870652441312}, 'type': 'KannalaBrandt'}}}}

The second command shows that the calibration data is loaded into a Python dictionary, enabling us to access any part of the loaded object we need quickly. For later reference, the YAML file contains the following basic structure.

sensors:
  cam1:
    extrinsics: ...
    intrinsics: ...
  cam2:
    extrinsics: ...
    intrinsics: ...

Before we begin processing the loaded data, let's visualize the result to get a better understanding of our extrinsics. Check out the visualization below.

Visualization of "calibration_result.yaml"

If you look closely, you will notice several details reflected in the YAML result:

  1. cam2 is rotated ~10° countcam2er-clockwise around the y-axis.

  2. The 31° FOV is evident from the frustum visualization.

  3. The image center coordinate offset in cam1 and cam2 is not zero. It is subtle in the visualization, but it is there.

  4. Cam1s coordinate system has the x-axis directed right, y-axis oriented down, and z-axis pointing forward. The coordinate system xy-orientation matches the xy-orientation of the image plane coordinates and is a computer vision convention.

Stay tuned for our follow-up article, where we show you how to visualize the extrinsics of stereo and complex multi-view constructions with ease.


Undistortion

With camcalib, we have determined the distortion parameters and offset errors of the lens and sensor geometry. But how can we use the camcalib results to remove the lens distortion from our images?


OpenCV provides a vast library of well-tested computer vision algorithms that will help us accelerate the implementation process and reach our goals with only a few lines of code. So let's get started!


To undistort an image efficiently, we will take the following steps:

  1. Fetch and restructure the intrinsic parameters from the YAML we loaded before.

  2. Use the intrinsic parameters to construct a 2D look-up table – a so-called undistortion map.

  3. Apply a remap function with the undistortion map on every image the camera captures during the capture loop.

It may sound more logical to pack this into a single undistort image function, but you would repeat large amounts of the same computations every time you take a new snapshot. It is better to do the heavy calculations – computing the undistortion map – only once and store the result in memory for repeated later use. This way, you have more CPU resources left to do more valuable computations, like computing 3D point clouds.

Tip: installing the NumPy package.

To create the undistortion map, we will need to fetch the intrinsic parameters and restructure them into a camera matrix and a list of distortion parameters.

import numpy as np

# Extract the calibration data for camera 1 from the
# calibration_parameters variable
cam1_parameters = calibration_parameters['sensors']['cam1']

# Extract camera 1 intrinsic parameters from cam1_parameters
cam1_intrinsics = cam1_parameters['intrinsics']['parameters']

# Prepare variables map creation
image_size = cam1_intrinsics['image_size']
cam1_camera_matrix = np.array(
   [[cam1_intrinsics['fx'], 0, cam1_intrinsics['cx']],
    [0, cam1_intrinsics['fy'], cam1_intrinsics['cy']],
    [0, 0, 1]])
cam1_distortion = np.array(
    [cam1_intrinsics[k] for k in ['k1', 'k2', 'k3', 'k4']])

A detailed review of what the code does

After executing the code, we have all the parameters needed to create our undistort map – image_size, cam1_camera_matrix, and cam1_distortion. Now let's build the undistortion map.

Tip: installing the OpenCV package


import cv2
cv = cv2.fisheye
undistort_map1 = cv.initUndistortRectifyMap(cam1_camera_matrix, 
                                            cam1_distortion, 
                                            np.eye(3), 
                                            cam1_camera_matrix, 
                                            image_size, 
                                            cv2.CV_32F)

That's it! The variable undistort_map1 will contain a tuple. The first element of the tuple will contain all x-coordinates of the look-up table, and the second element will contain all y-coordinates.

More details on what the undistortion map contains, how it is used, and how remap works.

The plot below shows the magnitude of the distortion vectors for cam1 and cam2. The distortion magnitudes can reach hundreds or thousands of pixels in some cases. In our case, the distortion is almost invisible but will cause significant errors if left uncorrected. Note that the lenses used in this example are low-distortion lenses.

Magnitude plots of undistortion maps

How to plot the distortion map's magnitudes

Now we apply the undistortion map:

# Load raw cam 1 image data from png file
img1_raw = cv2.imread("camcalib_tutorial_data/cam1/001.png")

# undistort raw cam 1 image data using undistort_map1
# img1_undist = cv2.remap(img1_raw, *undistort_map1, 
#                         cv2.INTER_LANCZOS4) #short version
img1_undist = cv2.remap(img1_raw, 
                        undistort_map1[0], 
                        undistort_map1[1], 
                        cv2.INTER_LANCZOS4)

The steps are short and straightforward. We use the OpenCV imread function to load cam1/001.png and apply the previously computed undistortion map using OpenCV's remap function. The remap function is a powerful resampling function that can take any type of 2D lookup table and resample image data with the most common sub-pixel interpolation types (we use Lanczos resampling here). You can see the results of remapping our left and right images with the undistortion maps below. Note, that you will probably not be able to tell the difference as the distortion magnitude, as mentioned before, is below 16 pixels at worst.

Example of a strongly distorted fish-eye image

Undistorted images (bottom row) are indistinguishable from inputs (top row) because the distortion magnitude is at most 16 pixels.

If, however, we subtract the raw and undistorted images, we can see a difference, which will lead to significant 3D errors later on if we leave it unchecked. Imperceptible distortion errors are why we calibrate our cameras.

Difference images between distorted (raw input) and undistorted images of cam1 (left) and cam2 (right)

Rectification

For computational efficiency, we carry out undistortion and rectification at once in the same look-up table (the so-called undistort-rectify map). Still, it is more convenient to think of undistortion and rectification as separate processes carried out one after the other. With distortion errors taken care of, we can focus on rectification alone. Thus, for discussion, we will now assume that undistortion has been applied and perform rectification next.


Recall the visualization of our extrinsic calibration earlier; we found that cam2 is tilted (rotated) towards cam1. cam2 is also slightly pitched and rolled w.r.t. cam1, but that is more subtle and difficult to read from the extrinsic parameters. To triangulate 3D points for all pixels of cam1, we will need to find their corresponding pixels in cam2. Finding all correspondences presents us with a 2D search problem that is computationally intense. This is where rectification helps us out big time! The following two sketches, visually illustrate why rectification is useful. In case you are not familiar with the visualizations, check out the following “Visualization hint” expand where we describe the abstract representation of a pinhole camera that is widespread in the computer vision industry.

Visualization hint: abstract representation of the pinhole camera


Triangulation with unaligned (unrectified) image pairs. Finding the correspondences (ex. mountain top) is a 2D search problem.

Triangulation with aligned (rectified) image pairs. Finding the correspondences (ex. mountain top) is a 1D search along a pixel row.

To understand why, think of the undistorted camera image as an image created by a pinhole camera. Each pixel's horizontal and vertical ray-angle is precisely known. Consequently, we can virtually rotate our camera simply by moving the pixels around. Virtually rotating our camera allows us to remove the yaw, pitch, and roll of cam2 w.r.t. cam1. By doing this, we effectively make both cameras' optical axes perfectly parallel. Perfectly parallel optical axes imply that pixels in any row of cam1 will correspond only to pixels of the same row in cam2. Consequently, our 2D search problem becomes a 1D search problem that we can compute faster or even at live-video speeds. Thus, undistorting and rectifying our images leads to many simultaneous efficiency gains!


Let's get to the implementation! To speed up things, we will repeat a few things for cam2 that we left out for the undistortion before without going into too much detail and only discuss the new stuff.

# Extract the calibration data for camera 1 & 2 from calibration_parameters variable
cam1_parameters = calibration_parameters['sensors']['cam1']
cam2_parameters = calibration_parameters['sensors']['cam2']

# Extract camera 1 & 2 intrinsic parameters from cam1_parameters
cam1_intrinsics = cam1_parameters['intrinsics']['parameters']
cam2_intrinsics = cam2_parameters['intrinsics']['parameters']

# Prepare variables map creation
image_size = cam1_intrinsics['image_size']
cam1_camera_matrix = np.array([
    [cam1_intrinsics['fx'], 0, cam1_intrinsics['cx']],
    [0, cam1_intrinsics['fy'], cam1_intrinsics['cy']],
    [0, 0, 1]])
cam2_camera_matrix = np.array([
    [cam2_intrinsics['fx'], 0, cam2_intrinsics['cx']],
    [0, cam2_intrinsics['fy'], cam2_intrinsics['cy']],
    [0, 0, 1]])

cam1_distortion = np.array(
    [cam1_intrinsics[k] for k in ['k1', 'k2', 'k3', 'k4']])
cam2_distortion = np.array(
    [cam2_intrinsics[k] for k in ['k1', 'k2', 'k3', 'k4']])

That takes care of the intrinsic parameters. Now, let's extract the extrinsic parameters

# Extract camera 1 & 2 extrinsic parameters from cam1_parameters
P_c1 = Pose.from_axis_angle(
    cam1_parameters['extrinsics']['axis_angle'], 
    cam1_parameters['extrinsics']['translation'])
P_c2 = Pose.from_axis_angle(
    cam2_parameters['extrinsics']['axis_angle'],
    cam2_parameters['extrinsics']['translation'])

# Compute pose difference between camera 1 and 2
P_c2_c1 = P_c2 @ P_c1.I

The Pose class

There are a few things of note here.

  1. The Pose class takes the extrinsic data and allows you to treat it as a pose matrix with little effort. You can think of the pose as a transformation that applies rotation and translation to a set of 3D points. It can also be chained with other poses to represent a sequence of camera movements.

  2. Cameras 1 and 2 each have an extrinsic pose assigned to them. The extrinsic pose is the position and orientation that we determined earlier during the extrinsic calibration.

  3. For our rectification task, we only care about the position and orientation of cam2 w.r.t. cam1. This is why we subtract the pose of cam1 from cam2 by performing P_c2_c1 = P_c2 @ P_c1.I . Note that we mean the abstract sense of subtraction for poses.

Now we are prepared for rectification.

# Compute rectification parameters and undistort-rectify maps
cv = cv2.fisheye
R1, R2, P1, P2, Q = cv.stereoRectify(cam1_camera_matrix, 
                                     cam1_distortion,
                                     cam2_camera_matrix, 
                                     cam2_distortion,
                                     image_size, 
                                     P_c2_c1.r, 
                                     P_c2_c1.t, 
                                     cv2.CALIB_ZERO_DISPARITY)

# Compute undistort-rectify maps
rect_map1 = cv.initUndistortRectifyMap(cam1_camera_matrix, 
                                       cam1_distortion, 
                                       R1, 
                                       P1, 
                                       image_size, 
                                       cv2.CV_32F)
rect_map2 = cv.initUndistortRectifyMap(cam2_camera_matrix, 
                                       cam2_distortion, 
                                       R2, 
                                       P2, 
                                       image_size, 
                                       cv2.CV_32F)

If you recall, the undistort-rectify map construction is very much like the undistort map construction process. The only difference is that we also take the extrinsic data into account. We only have to compute the pose change from cam1 to cam2 with P_c2_c1. We let OpenCV do the heavy lifting by using the stereoRectify function. stereoRectify takes all the intrinsics and extrinsics we prepared and gives us the transformation parameters we need to construct the undistort-rectify maps. In particular, stereoRectify provides us with the rotation matrices R1 and R2. R1 and R2 are the rotations that orient cam1 and cam2 in the same direction. stereoRectify also provides P1 and P2, which are our new projection matrices. When applied, the new projection matrices determine the virtual camera parameters that the undistort-rectify maps cause. In other words, P1 and P2 describe how our virtual pinhole cameras (undistorted and rectified cameras) project 3D points onto the image plane – or "see" the world.


Put yet another way, R1 and P1 (and corresponding intrinsics) define a map that converts the distorted and rotated view of cam1 into a pinhole view of the world. R2, P2, and corresponding intrinsics do the same for cam2. And as a bonus, our virtual pinhole views will also have parallel optical axes and aligned rows that allow us to perform 1D searches to find correspondences.


With the rect_maps we can compute our virtual aligned (rectified) pinhole views

# load a stereo image pair
img1_raw = cv2.imread("camcalib_tutorial_data/cam1/001.png")
img2_raw = cv2.imread("camcalib_tutorial_data/cam2/001.png")

# Apply the undistort-rectify maps to the stereo image pair
img1_rect = cv2.remap(img1_raw, *rect_map1, cv2.INTER_LANCZOS4)
img2_rect = cv2.remap(img2_raw, *rect_map2, cv2.INTER_LANCZOS4)

That's it. We have rectified views that we can use to compute 3D point clouds. Before we triangulate, though, let's review our results so far to build intuition.


Check out the difference between the raw image pair and the rectified image pair in the plots below. The black lines in the raw image pair do not connect pixel correspondences, while the black lines do so for all pixels of the rectified pair.

Image pairs before and after undistortion and rectification

Also, check out the zoomed-in plot, where the red bars show the errors of the raw image pair.

Vertical errors in input images (before undistortion and rectification)

The red bars vanish in the rectified image pair.

No vertical errors after undistortion and rectification

Now that we have prepared our image data for efficient 1D correspondence search, let's do what we all came here for – computing 3D point clouds!


3D Point Clouds

We have two steps left until we have useful point clouds.

  1. Run a stereo matcher on the rectified image pair. Stereo matching is the 1D correspondence search we mentioned before.

  2. Triangulate pixel correspondences on the left and right images to construct the 3D point cloud.

For the first step, there are tons of algorithms to choose from. Your choice will depend on the trade-offs of your application. For example,

  • if you target a low-power ARM device, you might go with a simple block matcher algorithm.

  • If you have more power, you can go with a Semi-Global Block Matcher (SGBM) implementation.

  • The sky is the limit if your budget and application allow for a powerful GPU and longer run times. Check out the Middlebury Stereo Benchmark to see how different algorithms perform and what they require.

We compute our results using RAFT stereo, but we also will provide a minimal SGBM wrapper around the OpenCV SGBM implementation to get you started.

Tip: installing the Open3D package to visualize point clouds.

The SGBM wrapper

The following code will take you the rest of the way and take care of stereo matching, triangulation, and visualization.

import open3d as o3d

# Setup the stereo matcher
matcher = StereoMatcherSGBM()

# Compute the pixel correspondences - called the disparity map
disparity = matcher.match(img1_rect, img2_rect)

# Construct open3d geometry
pointcloud = matcher.reconstruct(disparity, img1_rect, Q)
# Perform a statistical outlier removal to make
# the results even prettier.
pointcloud, _ = pointcloud.remove_statistical_outlier(
    nb_neighbors=20, std_ratio=1.5)
# Create a coordinate frame to visualize the camera
# coordinate frame
origin_frame = o3d.geometry.TriangleMesh\
    .create_coordinate_frame(size=0.10, origin=[0, 0, 0])

# display open3d geometry
o3d.visualization.draw_geometries([pointcloud, origin_frame])

Let’s take a closer look at what is happening here:

  1. We imported open3d as o3d for short to help with visualizing the point cloud.

  2. matcher.match(img1_rect, img2_rect) uses the rectified images as input to find pixel correspondences. The correspondence is encoded in the form of a disparity. The disparity is the distance between the left and right images correspondences measured in pixels. You can think of the pixel location and disparity as encoding the two angles we need to triangulate a 3D point.

  3. reconstruct(disparity, img1_rect, Q) is provided by our wrapped matcher to compute the 3D point cloud. We will discuss this in more detail below. Just note the parameters. disparity contains the disparity map shown in the image below, img1_rect holds corresponding color values, and Q is the 4x4 perspective-transform or disparity-to-depth matrix used to perform the triangulation step.

  4. Note that the perspective-transform matrix Q was the 5th parameter of the stereoRectify function we called during rectification.

The result of stereo-matching rectified image pairs is the disparity map.

Before we go into the details of matcher.reconstruct(), check out the point clouds below. The high-quality point clouds result from excellent camera calibration and the RAFT-stereo correspondences.



This last image also shows the camera's coordinate system at the origin. Again note the direction of the x-axis and y-axis. The orientation of the coordinate system matches the image-planes coordinate system.

3D point cloud with camera coordinate system origin and orientation

Let’s close this article with a final discussion on the reconstruct function.

def reconstruct(self, img_disp0, img_rect0, Q):
  # reproject disparity to 3D
  xyz = cv2.reprojectImageTo3D(img_disp0, Q)
  
  # construct validity masks based on distance and brightness
  mask_depth = (xyz[:, :, 2] < 5.0) & (xyz[:, :, 2] > 0.1)
  mask_brght = (img_rect0[:, :, 0] > 30) & \
               (img_rect0[:, :, 0] < 250)
  
  # create linear point and color lists
  xyz_linear = xyz.reshape((-1, 3))
  colors_linear = img_rect0.reshape((-1, 3))
  mask_linear = (mask_brght & mask_depth).flatten()
  
  # create open3d geometry
  pcd = o3d.geometry.PointCloud()
  pcd.points = o3d.utility.Vector3dVector(
                   xyz_linear[mask_linear])
  pcd.colors = o3d.utility.Vector3dVector(
                   colors_linear[mask_linear] / 255.0)
  
  return pcd
  1. We use OpenCVs reprojectImageTo3D function to keep the code light and convert the disparity map into 3D coordinates. In principle, we could also do this by hand, and it would take us four lines of code with NumPy. Check out the expand below if you would like to see it.

  2. We build a simple distance and brightness filter mask_brght & mask_depth to remove errors.

  3. The open3d PointCloud() object pcd has the properties points and colors. points is a Vector3dVector constructed with an Nx3 NumPy array containing our triangulated valid 3D points. colors is also a Vector3dVector constructed with an Nx3 NumPy array. colors columns represent red, green, and blue with values ranging from 0.0 for darkest and 1.0 for brightest, so we normalize it with 255.

NumPy implementation of reprojectImageTo3D


Putting it all together

For your convenience, we provide this example in full in our examples repository. Check it out and simply follow the instructions in the accompanying README.md to get up and running with ease.


Conclusions

This article has taken us quite far beyond the scope of camcalib. But we hope that it helped you get going and gave you an insight into the various moving parts of a computer vision application that relies on precise camera calibration. We are constantly adding new pieces to help you get familiar with camcalib as well as help you get started with your applications faster. Please also return to check out our other articles that will take you even further.

75 views0 comments

Recent Posts

See All