Our approach is to calculate first the rotation and then the translation. By using multiple iterations between rotation and translation we achieve more robustness and thus improve our results further. Our methods either use weak assumptions on the 3D structure (existence of a planer surface in the image) or need three images as input. We assume small rotations, this is a reasonable assumption since in most image sequences the rotation is relatively small.

An earlier method for calculating ego-motion between two images uses Plane + Parallax . The method is based on detecting a single planer surface in the scene directly from image intensities using temporal integration of registered images. Next the parametric 2D image motion is computed for this plane. We use this 2D motion to register the images giving a displacement field affected only by camera translation. The 3D camera translation is computed by finding the focus-of-expansion (the intersection of all displacement vectors) in the registered frames. Finally calculation of ego-motion is completed by computing the camera rotation using a set of linear equations.

By using homography matrices (a linear transformation that registers 3D plane between 2 images) we sophisticated our approach and get an even more robust method that makes no 3D structure assumptions. This method is based on a theoretical result that homography matrices between two images form a linear space of rank four. Assuming small rotations and given three homography matrices between the images we can compute the rotational component of the camera motion. The rotational component is a linear combination of these homographies. Given three views (instead of two) we calculate the Trilinear Tensor which directly provides three ready homography matrices. By using the three images we gain numerical redundancy giving more numerical stability. Further more the homography matrices calculated by this method use correspondences from the whole image scene, not just in the plane associated with the homography.

Michal Irani, Benny Rousso, Shmuel Peleg
*
Robust Recovery of Ego-Motion*
CAIP 1993, Budapest, 13-15 Sept 1993.

Benny Rousso, Shai Avidan, Amnon Shashua, Shmuel Peleg
*
Robust recovery of camera rotation from three frames *
ARPA Image Understanding Workshop, February 1996.

In most cases it is correct to assume that the translation of the camera is the intended motion (the movement of the jeep for example). While the camera rotation is the result of the camera jittering along some rotational axis and is the cause for de-stabilization. By calculating the camera rotation we can register the images eliminating the rotational component between images. The result is a sequence of images that are stabilized.

Michal Irani, Benny Rousso, Shmuel Peleg
*
Recovery of Ego-Motion Using Image Stabilization *
CVPR-94, Seattle, June 1994

B. Rousso, S. Avidan, A. Shashua and S. Peleg.
*
Robust Recovery of Camera Rotation from Three Frames.*
CVPR, June 1996.

Our method for detecting and tracking multiple moving objects assumes both a large spatial and a large temporal region. Because of the large amount of information we do not need to assume temporal motion constancy and can handle regions with more than one motion. Our algorithm can handle the difficult situation of transparent and occluding objects.

The idea is detect one object at a time. The object that constitutes the dominant motion in the scene is detected by temporal integration of images registered with this motion. We do this using iterations refining our results with each iteration. Using the dominant motion to refine the pixels in our object, the pixels moving with those motion parameters. Then further refining the motion parameters using only the pixels in our object for the calculation. Once the object has been detected we can exclude it and move on to the next object.

We improve our segmentation iterating the segmentation of the region and the calculation of its motion parameters until our iterations converge. For this segmentation we have developed a method that allows us to relax the need for accurate motion models. Given some motion parameters of an object we register the image using those parameters. We expect pixels belonging to our object to register correctly while pixels not belonging to our object to register incorrectly. How good a pixel registers is its prediction error. When trying to decide if a pixel belongs to the object we consider the convergence of the prediction error. We consider the size of a optical flow vectors between the first image and the registered image as a value to set a threshold to.

Michal Irani, Benny Rousso, Shmuel Peleg
*
Computing Occluding and Transparent Motions *
IJCV January 1994

Moshe Ben-Ezra, Shmuel Peleg, Benny Rousso
*
Motion Segmentation Using Convergence Properties *
APRA Image Understanding Workshop, November 1994.

Michal Irani, Shmuel Peleg
*
Motion Analysis for Image Enhancement: Resolution,
Occlusion,and Transparency*
VCIR, Vol 4 No. 4, December 1993.