This article will show you how to use camcalib YAML files to calculate beautiful dense 3D point clouds with a stereo-camera image pair.

Camcalib is just the first step for most computer vision applications. It is an essential step, yet only a means to your ends. With this article, we want to take you on the next steps of the journey – the steps after you have used camcalib – and show you how easy it is to compute precise 3D point clouds. We aim to get you started fast and with as little trouble as possible.

We will cover the following topics:

Loading and parsing the camcalib YAML result file

Removing lens distortion

Rectification of a stereo image pair

Converting the rectified image pair into a dense 3D point cloud with stereo matching

To keep the examples simple and the corresponding code short, we will be using Python and OpenCV to do the heavy lifting.

## Download our examples source code

You don't need to copy and paste the code snippets we show here and puzzle them all together. Just check out the code for the entire example from our examples __repository__ and follow the instructions in the accompanying __README.md__ to get up and running with ease.

## Loading the Camera Calibration YAML

Let’s get started! Using __camcalib__ in a __previous blog post__, we calibrated a stereo camera pair and saved the result as a YAML file.

Tip: installing the PyYAML package

With Python, we import the **yaml** package to read the result file.

```
import yaml
filename = "calibration_result.yaml"
with open(filename,"r") as f:
calibration_parameters = yaml.load(f, Loader=yaml.FullLoader)
```

We print the loaded data to the console to verify that everything works.

```
print(type(calibration_parameters))
$ <class 'dict'>
print(calibration_parameters)
$ {'sensors': {'cam1': {'extrinsics': {'axis_angle': [0.0, 0.0, 0.0], 'translation': [0.0, 0.0, 0.0]}, 'intrinsics': {'parameters': {'cx': 631.2595405625116, 'cy': 491.5258233725439, 'fx': 2282.3920867826905, 'fy': 2282.5710507986964, 'image_size': [1280, 1024], 'k1': 0.19408319800216575, 'k2': -0.0032982721692749804, 'k3': -0.018097059724866766, 'k4': -0.2839958241290381}, 'type': 'KannalaBrandt'}}, 'cam2': {'extrinsics': {'axis_angle': [-0.0017098846275279435, 0.1681972700321447, 0.04623455960511724], 'translation': [-0.19068022336063817, -0.0017535410879024739, 0.004499676517813966]}, 'intrinsics': {'parameters': {'cx': 615.0347480681874, 'cy': 472.525057978465, 'fx': 2301.3314301312444, 'fy': 2301.140570503219, 'image_size': [1280, 1024], 'k1': 0.18254782022559218, 'k2': 0.4203174919560141, 'k3': -5.89114449668284, 'k4': 25.664870652441312}, 'type': 'KannalaBrandt'}}}}
```

The second command shows that the calibration data is loaded into a Python dictionary, enabling us to access any part of the loaded object we need quickly. For later reference, the YAML file contains the following basic structure.

```
sensors:
cam1:
extrinsics: ...
intrinsics: ...
cam2:
extrinsics: ...
intrinsics: ...
```

Before we begin processing the loaded data, let's visualize the result to get a better understanding of our extrinsics. Check out the visualization below.

If you look closely, you will notice several details reflected in the YAML result:

**cam2**is rotated ~10° countcam2er-clockwise around the**y-axis**.The 31° FOV is evident from the frustum visualization.

The image center coordinate offset in

**cam1**and**cam2**is not zero. It is subtle in the visualization, but it is there.Cam1s coordinate system has the

**x-axis****right**,**y-axis****down**, and**z-axis****forward**. The coordinate system xy-orientation matches the xy-orientation of the image plane coordinates and is a computer vision convention.

Stay tuned for our follow-up article, where we show you how to visualize the extrinsics of stereo and complex multi-view constructions with ease.

## Undistortion

With camcalib, we have determined the distortion parameters and offset errors of the lens and sensor geometry. But how can we use the camcalib results to remove the lens distortion from our images?

OpenCV provides a vast library of well-tested computer vision algorithms that will help us accelerate the implementation process and reach our goals with only a few lines of code. So let's get started!

To undistort an image efficiently, we will take the following steps:

Fetch and restructure the intrinsic parameters from the YAML we loaded before.

Use the intrinsic parameters to construct a 2D look-up table – a so-called undistortion map.

Apply a remap function with the undistortion map on every image the camera captures during the capture loop.

It may sound more logical to pack this into a single undistort image function, but you would repeat large amounts of the same computations every time you take a new snapshot. It is better to do the heavy calculations – computing the undistortion map – only once and store the result in memory for repeated later use. This way, you have more CPU resources left to do more valuable computations, like computing 3D point clouds.

Tip: installing the NumPy package.

To create the undistortion map, we will need to fetch the intrinsic parameters and restructure them into a camera matrix and a list of distortion parameters.

```
import numpy as np
# Extract the calibration data for camera 1 from the
# calibration_parameters variable
cam1_parameters = calibration_parameters['sensors']['cam1']
# Extract camera 1 intrinsic parameters from cam1_parameters
cam1_intrinsics = cam1_parameters['intrinsics']['parameters']
# Prepare variables map creation
image_size = cam1_intrinsics['image_size']
cam1_camera_matrix = np.array(
[[cam1_intrinsics['fx'], 0, cam1_intrinsics['cx']],
[0, cam1_intrinsics['fy'], cam1_intrinsics['cy']],
[0, 0, 1]])
cam1_distortion = np.array(
[cam1_intrinsics[k] for k in ['k1', 'k2', 'k3', 'k4']])
```

A detailed review of what the code does

After executing the code, we have all the parameters needed to create our undistort map – **image_size**, **cam1_camera_matrix**, and **cam1_distortion**. Now let's build the undistortion map.

Tip: installing the OpenCV package

```
import cv2
cv = cv2.fisheye
undistort_map1 = cv.initUndistortRectifyMap(cam1_camera_matrix,
cam1_distortion,
np.eye(3),
cam1_camera_matrix,
image_size,
cv2.CV_32F)
```

That's it! The variable **undistort_map1** will contain a tuple. The first element of the tuple will contain all x-coordinates of the look-up table, and the second element will contain all y-coordinates.

More details on what the undistortion map contains, how it is used, and how remap works.

The plot below shows the magnitude of the distortion vectors for **cam1** and **cam2**. The distortion magnitudes can reach hundreds or thousands of pixels in some cases. In our case, the distortion is almost invisible but will cause significant errors if left uncorrected. Note that the lenses used in this example are low-distortion lenses.

How to plot the distortion map's magnitudes

Now we apply the undistortion map:

```
# Load raw cam 1 image data from png file
img1_raw = cv2.imread("camcalib_tutorial_data/cam1/001.png")
# undistort raw cam 1 image data using undistort_map1
# img1_undist = cv2.remap(img1_raw, *undistort_map1,
# cv2.INTER_LANCZOS4) #short version
img1_undist = cv2.remap(img1_raw,
undistort_map1[0],
undistort_map1[1],
cv2.INTER_LANCZOS4)
```

The steps are short and straightforward. We use the OpenCV **imread** function to load **cam1/001.png** and apply the previously computed undistortion map using OpenCV's **remap** function. The remap function is a powerful resampling function that can take any type of 2D lookup table and resample image data with the most common sub-pixel interpolation types (we use Lanczos resampling here). You can see the results of remapping our left and right images with the undistortion maps below. Note, that you will probably not be able to tell the difference as the distortion magnitude, as mentioned before, is below 16 pixels at worst.

Example of a strongly distorted fish-eye image

If, however, we subtract the raw and undistorted images, we can see a difference, which will lead to significant 3D errors later on if we leave it unchecked. Imperceptible distortion errors are why we calibrate our cameras.

## Rectification

For computational efficiency, we carry out undistortion and rectification at once in the same look-up table (the so-called undistort-rectify map). Still, it is more convenient to think of undistortion and rectification as separate processes carried out one after the other. With distortion errors taken care of, we can focus on rectification alone. Thus, for discussion, we will now assume that undistortion has been applied and perform rectification next.

Recall the visualization of our extrinsic calibration earlier; we found that **cam2** is tilted (rotated) towards **cam1**. **cam2** is also slightly pitched and rolled w.r.t. **cam1**, but that is more subtle and difficult to read from the extrinsic parameters. To triangulate 3D points for all pixels of **cam1**, we will need to find their corresponding pixels in **cam2**. Finding all correspondences presents us with a 2D search problem that is computationally intense. This is where rectification helps us out big time! The following two sketches, visually illustrate why rectification is useful. In case you are not familiar with the visualizations, check out the following “Visualization hint” expand where we describe the abstract representation of a pinhole camera that is widespread in the computer vision industry.

Visualization hint: abstract representation of the pinhole camera

To understand why, think of the undistorted camera image as an image created by a pinhole camera. Each pixel's horizontal and vertical ray-angle is precisely known. Consequently, we can virtually rotate our camera simply by moving the pixels around. Virtually rotating our camera allows us to remove the yaw, pitch, and roll of **cam2** w.r.t. **cam1**. By doing this, we effectively make both cameras' optical axes perfectly parallel. Perfectly parallel optical axes imply that pixels in any row of **cam1** will correspond only to pixels of the same row in **cam2**. Consequently, our 2D search problem becomes a 1D search problem that we can compute faster or even at live-video speeds. Thus, undistorting and rectifying our images leads to many simultaneous efficiency gains!

Let's get to the implementation! To speed up things, we will repeat a few things for cam2 that we left out for the undistortion before without going into too much detail and only discuss the new stuff.

```
# Extract the calibration data for camera 1 & 2 from calibration_parameters variable
cam1_parameters = calibration_parameters['sensors']['cam1']
cam2_parameters = calibration_parameters['sensors']['cam2']
# Extract camera 1 & 2 intrinsic parameters from cam1_parameters
cam1_intrinsics = cam1_parameters['intrinsics']['parameters']
cam2_intrinsics = cam2_parameters['intrinsics']['parameters']
# Prepare variables map creation
image_size = cam1_intrinsics['image_size']
cam1_camera_matrix = np.array([
[cam1_intrinsics['fx'], 0, cam1_intrinsics['cx']],
[0, cam1_intrinsics['fy'], cam1_intrinsics['cy']],
[0, 0, 1]])
cam2_camera_matrix = np.array([
[cam2_intrinsics['fx'], 0, cam2_intrinsics['cx']],
[0, cam2_intrinsics['fy'], cam2_intrinsics['cy']],
[0, 0, 1]])
cam1_distortion = np.array(
[cam1_intrinsics[k] for k in ['k1', 'k2', 'k3', 'k4']])
cam2_distortion = np.array(
[cam2_intrinsics[k] for k in ['k1', 'k2', 'k3', 'k4']])
```

That takes care of the intrinsic parameters. Now, let's extract the extrinsic parameters

```
# Extract camera 1 & 2 extrinsic parameters from cam1_parameters
P_c1 = Pose.from_axis_angle(
cam1_parameters['extrinsics']['axis_angle'],
cam1_parameters['extrinsics']['translation'])
P_c2 = Pose.from_axis_angle(
cam2_parameters['extrinsics']['axis_angle'],
cam2_parameters['extrinsics']['translation'])
# Compute pose difference between camera 1 and 2
P_c2_c1 = P_c2 @ P_c1.I
```

The Pose class

There are a few things of note here.

The Pose class takes the extrinsic data and allows you to treat it as a pose matrix with little effort. You can think of the pose as a transformation that applies rotation and translation to a set of 3D points. It can also be chained with other poses to represent a sequence of camera movements.

Cameras 1 and 2 each have an extrinsic pose assigned to them. The extrinsic pose is the position and orientation that we

__determined earlier__during the extrinsic calibration.For our rectification task, we only care about the position and orientation of

**cam2**w.r.t.**cam1**. This is why we subtract the pose of**cam1**from**cam2****P_c2_c1 = P_c2 @ P_c1.I**. Note that we mean the abstract sense of subtraction for poses.

Now we are prepared for rectification.

```
# Compute rectification parameters and undistort-rectify maps
cv = cv2.fisheye
R1, R2, P1, P2, Q = cv.stereoRectify(cam1_camera_matrix,
cam1_distortion,
cam2_camera_matrix,
cam2_distortion,
image_size,
P_c2_c1.r,
P_c2_c1.t,
cv2.CALIB_ZERO_DISPARITY)
# Compute undistort-rectify maps
rect_map1 = cv.initUndistortRectifyMap(cam1_camera_matrix,
cam1_distortion,
R1,
P1,
image_size,
cv2.CV_32F)
rect_map2 = cv.initUndistortRectifyMap(cam2_camera_matrix,
cam2_distortion,
R2,
P2,
image_size,
cv2.CV_32F)
```

If you recall, the undistort-rectify map construction is very much like the undistort map construction process. The only difference is that we also take the extrinsic data into account. We only have to compute the pose change from **cam1** to **cam2** with **P_c2_c1**. We let OpenCV do the heavy lifting by using the **stereoRectify** function. **stereoRectify** takes all the intrinsics and extrinsics we prepared and gives us the transformation parameters we need to construct the undistort-rectify maps. In particular, **stereoRectify** provides us with the rotation matrices **R1** and **R2**. **R1** and **R2** are the rotations that orient cam1 and cam2 in the same direction. **stereoRectify** also provides **P1** and **P2**, which are our new projection matrices. When applied, the new projection matrices determine the virtual camera parameters that the undistort-rectify maps cause. In other words, **P1** and **P2** describe how our virtual pinhole cameras (undistorted and rectified cameras) project 3D points onto the image plane – or "see" the world.

Put yet another way, **R1** and **P1** (and corresponding intrinsics) define a map that converts the distorted and rotated view of cam1 into a pinhole view of the world. **R2**, **P2**, and corresponding intrinsics do the same for **cam2**. And as a bonus, our virtual pinhole views will also have parallel optical axes and aligned rows that allow us to perform 1D searches to find correspondences.

With the **rect_maps** we can compute our virtual aligned (rectified) pinhole views

```
# load a stereo image pair
img1_raw = cv2.imread("camcalib_tutorial_data/cam1/001.png")
img2_raw = cv2.imread("camcalib_tutorial_data/cam2/001.png")
# Apply the undistort-rectify maps to the stereo image pair
img1_rect = cv2.remap(img1_raw, *rect_map1, cv2.INTER_LANCZOS4)
img2_rect = cv2.remap(img2_raw, *rect_map2, cv2.INTER_LANCZOS4)
```

That's it. We have rectified views that we can use to compute 3D point clouds. Before we triangulate, though, let's review our results so far to build intuition.

Check out the difference between the raw image pair and the rectified image pair in the plots below. The black lines in the raw image pair do not connect pixel correspondences, while the black lines do so for all pixels of the rectified pair.

Also, check out the zoomed-in plot, where the red bars show the errors of the raw image pair.

The red bars vanish in the rectified image pair.

Now that we have prepared our image data for efficient 1D correspondence search, let's do what we all came here for – computing 3D point clouds!

## 3D Point Clouds

We have two steps left until we have useful point clouds.

Run a stereo matcher on the rectified image pair. Stereo matching is the 1D correspondence search we mentioned before.

Triangulate pixel correspondences on the left and right images to construct the 3D point cloud.

For the first step, there are tons of algorithms to choose from. Your choice will depend on the trade-offs of your application. For example,

if you target a low-power ARM device, you might go with a simple block matcher algorithm.

If you have more power, you can go with a Semi-Global Block Matcher (SGBM) implementation.

The sky is the limit if your budget and application allow for a powerful GPU and longer run times. Check out the

__Middlebury Stereo Benchmark__to see how different algorithms perform and what they require.

We compute our results using __RAFT stereo__, but we also will provide a minimal SGBM wrapper around the OpenCV SGBM implementation to get you started.

Tip: installing the Open3D package to visualize point clouds.

The SGBM wrapper

The following code will take you the rest of the way and take care of stereo matching, triangulation, and visualization.

```
import open3d as o3d
# Setup the stereo matcher
matcher = StereoMatcherSGBM()
# Compute the pixel correspondences - called the disparity map
disparity = matcher.match(img1_rect, img2_rect)
# Construct open3d geometry
pointcloud = matcher.reconstruct(disparity, img1_rect, Q)
# Perform a statistical outlier removal to make
# the results even prettier.
pointcloud, _ = pointcloud.remove_statistical_outlier(
nb_neighbors=20, std_ratio=1.5)
# Create a coordinate frame to visualize the camera
# coordinate frame
origin_frame = o3d.geometry.TriangleMesh\
.create_coordinate_frame(size=0.10, origin=[0, 0, 0])
# display open3d geometry
o3d.visualization.draw_geometries([pointcloud, origin_frame])
```

Let’s take a closer look at what is happening here:

We imported

**open3d as o3d**for short to help with visualizing the point cloud.**matcher.match(img1_rect, img2_rect)**uses the rectified images as input to find pixel correspondences. The correspondence is encoded in the form of a disparity. The disparity is the distance between the left and right images correspondences measured in pixels. You can think of the pixel location and disparity as encoding the two angles we need to triangulate a 3D point.**reconstruct(disparity, img1_rect, Q)**is provided by our wrapped**matcher**to compute the 3D point cloud. We will discuss this in more detail below. Just note the parameters.**disparity**contains the disparity map shown in the image below,**img1_rect**holds corresponding color values, and**Q**is the 4x4 perspective-transform or disparity-to-depth matrix used to perform the triangulation step.Note that the perspective-transform matrix

**Q**was the 5th parameter of the**stereoRectify**function we called during rectification.

Before we go into the details of **matcher.reconstruct()**, check out the point clouds below. The high-quality point clouds result from excellent camera calibration and the RAFT-stereo correspondences.

This last image also shows the camera's coordinate system at the origin. Again note the direction of the **x-axis** and **y-axis**. The orientation of the coordinate system matches the image-planes coordinate system.

Let’s close this article with a final discussion on the **reconstruct** function.

```
def reconstruct(self, img_disp0, img_rect0, Q):
# reproject disparity to 3D
xyz = cv2.reprojectImageTo3D(img_disp0, Q)
# construct validity masks based on distance and brightness
mask_depth = (xyz[:, :, 2] < 5.0) & (xyz[:, :, 2] > 0.1)
mask_brght = (img_rect0[:, :, 0] > 30) & \
(img_rect0[:, :, 0] < 250)
# create linear point and color lists
xyz_linear = xyz.reshape((-1, 3))
colors_linear = img_rect0.reshape((-1, 3))
mask_linear = (mask_brght & mask_depth).flatten()
# create open3d geometry
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(
xyz_linear[mask_linear])
pcd.colors = o3d.utility.Vector3dVector(
colors_linear[mask_linear] / 255.0)
return pcd
```

We use OpenCVs

**reprojectImageTo3D**function to keep the code light and convert the disparity map into 3D coordinates. In principle, we could also do this by hand, and it would take us four lines of code with NumPy. Check out the expand below if you would like to see it.We build a simple distance and brightness filter

**mask_brght & mask_depth**to remove errors.The open3d

**PointCloud()**object**pcd**has the properties**points**and**colors**.**points**is a**Vector3dVector**constructed with an Nx3 NumPy array containing our triangulated valid 3D points.**colors**is also a**Vector3dVector**constructed with an Nx3 NumPy array.**colors**columns represent**red**,**green****,**and**blue**with values ranging from 0.0 for darkest and 1.0 for brightest, so we normalize it with 255.

NumPy implementation of **reprojectImageTo3D**

## Putting it all together

For your convenience, we provide this example in full in our __examples repository__. Check it out and simply follow the instructions in the accompanying __README.md__ to get up and running with ease.

## Conclusions

This article has taken us quite far beyond the scope of camcalib. But we hope that it helped you get going and gave you an insight into the various moving parts of a computer vision application that relies on precise camera calibration. We are constantly adding new pieces to help you get familiar with camcalib as well as help you get started with your applications faster. Please also return to check out our other articles that will take you even further.

## コメント