[REP Index] [REP Source]
REP: | 118 |
---|---|
Title: | Depth Images |
Author: | Patrick Mihelich <mihelich at willowgarage.com> |
Status: | Final |
Type: | Standards Track |
Content-Type: | text/x-rst |
Created: | 01-Dec-2011 |
ROS-Version: | Fuerte |
Post-History: | 06-Dec-2011 |
Contents
This REP defines a representation for depth images in ROS. Depth images may be produced by a variety of camera technologies, including stereo, structured light and time-of-flight.
Depth images are published as sensor_msgs/Image encoded as 32-bit float. Each pixel is a depth (along the camera Z axis) in meters.
The non-finite values NaN, +Inf and -Inf have special meanings as defined by REP 117.
The ROS API for producers of depth images follows the standard camera driver API. Depth images are published on the image topic. The camera_info topic describes how to interpret the depth image geometrically. Whereas each pixel in a standard image can only be projected to a 3D ray, the depth image can (given the camera calibration) be converted to a 3D point cloud.
Alternatively, a device driver may publish depth images encoded as 16-bit unsigned integer, where each pixel is depth in millimeters. This differs from the standard units recommended in REP 103.
The value 0 denotes an invalid depth, equivalent to a NaN floating point distance.
Raw depth images are published on the image_raw topic. The image_pipeline stack will provide a nodelet to convert the image_raw topic to the canonical image topic.
Consumers of depth images are only required to support the canonical floating point representation.
With the addition of depth images, ROS now has three messages suitable for representing dense depth data: sensor_msgs/Image, sensor_msgs/DisparityImage, and sensor_msgs/PointCloud2. PointCloud2 is more general than a depth image, but also more verbose. The DisparityImage representation however is very similar.
The DisparityImage message exists for historical reasons: stereo cameras were used with ROS long before any other type of depth sensor, and disparity images are the natural "raw" output of stereo correlation algorithms. For some vision algorithms (e.g. VSLAM), disparities are a convenient input to error metrics with pixel units.
In practice, the DisparityImage message also has drawbacks.
sensor_msgs/DisparityImage will continue to exist for backwards compatibility and for applications where it truly is the better representation. The image_pipeline stack will provide a nodelet for converting depth images to disparity images. Producers of dense depth data are encouraged to use sensor_msgs/Image instead of sensor_msgs/DisparityImage.
Disparity images are represented by a distinct sensor_msgs/DisparityImage type, so why not define a sensor_msgs/DepthImage?
Defining a new image-like message incurs significant tooling costs. The new message is incompatible with image_transport, standard image viewers, and various utilities such as converters between bags and images/video.
On the other hand, perhaps there is additional metadata that a depth image ought to include. Let's consider the fields added by sensor_msgs/DisparityImage:
The main information we are unable to capture with an (Image, CameraInfo) pair is the min/max range. That does not seem to justify breaking from the established camera driver API. If necessary, the min/max range and other metadata could be published as another side channel, similar to the camera_info topic.
Including the uint16 OpenNI format is unfortunate in some ways. It adds complexity, is tied to a particular family of hardware, and uses different units from the rest of ROS. There are, nevertheless, some compelling reasons:
This REP codifies existing behavior in the openni_kinect stack, so backwards compatibility is not expected to be an issue.
This document has been placed in the public domain.