Motion Analysis

Our eyes see movies and much of our perception is based on motion. Motion can give computer vision many cues that can be used to retrieve much of the scene information. These cues can be used to perform such tasks as: segmentation , 3D reconstruction , compression ...

Ego Motion

Motion observed in an image sequence is usually ego-motion, the motion of the camera. Knowledge of the camera rotation and translation between frames is crucial input to many vision applications. Still the search goes on for reliable methods giving precise results.
Our approach is to calculate first the rotation and then the translation. By using multiple iterations between rotation and translation we achieve more robustness and thus improve our results further. Our methods either use weak assumptions on the 3D structure (existence of a planer surface in the image) or need three images as input. We assume small rotations, this is a reasonable assumption since in most image sequences the rotation is relatively small.
An earlier method for calculating ego-motion between two images uses Plane + Parallax . The method is based on detecting a single planer surface in the scene directly from image intensities using temporal integration of registered images. Next the parametric 2D image motion is computed for this plane. We use this 2D motion to register the images giving a displacement field affected only by camera translation. The 3D camera translation is computed by finding the focus-of-expansion (the intersection of all displacement vectors) in the registered frames. Finally calculation of ego-motion is completed by computing the camera rotation using a set of linear equations.
By using homography matrices (a linear transformation that registers 3D plane between 2 images) we sophisticated our approach and get an even more robust method that makes no 3D structure assumptions. This method is based on a theoretical result that homography matrices between two images form a linear space of rank four. Assuming small rotations and given three homography matrices between the images we can compute the rotational component of the camera motion. The rotational component is a linear combination of these homographies. Given three views (instead of two) we calculate the Trilinear Tensor which directly provides three ready homography matrices. By using the three images we gain numerical redundancy giving more numerical stability. Further more the homography matrices calculated by this method use correspondences from the whole image scene, not just in the plane associated with the homography.

Selected Papers

Michal Irani, Benny Rousso, Shmuel Peleg Robust Recovery of Ego-Motion CAIP 1993, Budapest, 13-15 Sept 1993.

Benny Rousso, Shai Avidan, Amnon Shashua, Shmuel Peleg Robust recovery of camera rotation from three frames ARPA Image Understanding Workshop, February 1996.

Image Stabilization

Many video systems receive movie input from an unstable source. Consider a camera mounted on the back of a jeep driving on a bumpy dirt road. Consider a camera held by a shaking hand. In image stabilization our input is a jittering jumping sequence of images. Our output is a smooth sequence of images displaying the same motion.
In most cases it is correct to assume that the translation of the camera is the intended motion (the movement of the jeep for example). While the camera rotation is the result of the camera jittering along some rotational axis and is the cause for de-stabilization. By calculating the camera rotation we can register the images eliminating the rotational component between images. The result is a sequence of images that are stabilized.

Benny's Stabilization

Selected Papers

Michal Irani, Benny Rousso, Shmuel Peleg Recovery of Ego-Motion Using Image Stabilization CVPR-94, Seattle, June 1994

B. Rousso, S. Avidan, A. Shashua and S. Peleg. Robust Recovery of Camera Rotation from Three Frames. CVPR, June 1996.

Motion Segmentation

Almost any scene is composed a few distinct objects and a background. Any little kid looking at this scene easily knows how to segment it, yet this is one of the hardest problems in computer vision. The segmentation of an image is important for object recognition and image analysis. When our input is a movie and we have differently moving objects in the scene, we can use their optical flow to identify and track the moving objects. Using the differences in the motion fields we can separate the moving object from the rest of the image. In return we can use this segmentation to improve our motion analysis .
Our method for detecting and tracking multiple moving objects assumes both a large spatial and a large temporal region. Because of the large amount of information we do not need to assume temporal motion constancy and can handle regions with more than one motion. Our algorithm can handle the difficult situation of transparent and occluding objects.
The idea is detect one object at a time. The object that constitutes the dominant motion in the scene is detected by temporal integration of images registered with this motion. We do this using iterations refining our results with each iteration. Using the dominant motion to refine the pixels in our object, the pixels moving with those motion parameters. Then further refining the motion parameters using only the pixels in our object for the calculation. Once the object has been detected we can exclude it and move on to the next object.

The Vanishing Lady

We improve our segmentation iterating the segmentation of the region and the calculation of its motion parameters until our iterations converge. For this segmentation we have developed a method that allows us to relax the need for accurate motion models. Given some motion parameters of an object we register the image using those parameters. We expect pixels belonging to our object to register correctly while pixels not belonging to our object to register incorrectly. How good a pixel registers is its prediction error. When trying to decide if a pixel belongs to the object we consider the convergence of the prediction error. We consider the size of a optical flow vectors between the first image and the registered image as a value to set a threshold to.

Selected Papers

Michal Irani, Benny Rousso, Shmuel Peleg Computing Occluding and Transparent Motions IJCV January 1994

Moshe Ben-Ezra, Shmuel Peleg, Benny Rousso Motion Segmentation Using Convergence Properties APRA Image Understanding Workshop, November 1994.

Michal Irani, Shmuel Peleg Motion Analysis for Image Enhancement: Resolution, Occlusion,and Transparency VCIR, Vol 4 No. 4, December 1993.

[Back] Back to Research Page