Temporal Segmentation and Indexing of Egocentric Videos

Abstract

The use of wearable cameras makes it possible to record life logging egocentric videos. Browsing such long unstructured videos is time consuming and tedious. Segmentation into meaningful chapters is an important first step towards adding structure to egocentric videos, enabling efficient browsing, indexing and summarization of the long videos. Two sources of information for video segmentation are (i) the motion of the camera wearer, and (ii) the objects and activities recorded in the video. In these works we address the motion cues for video segmentation.

In our most recent paper, Compact CNN for Indexing Egocentric Videos, we present a compact 3D Convolutional Neural Network (CNN) architecture for long-term activity recognition in egocentric videos. Recognizing long-term activities enables us to temporally segment (index) long and unstructured egocentric videos. Given a sparse optical flow volume as input, our CNN classifies the camera wearer's activity. We obtain classification accuracy of 89%, which outperforms our previous work by 19%. Additional evaluation is performed on an extended egocentric video dataset, classifying twice the amount of categories than our previous work. Furthermore, our CNN is able to recognize whether a video is egocentric or not with 99.2% accuracy, up by 24% from current state-of-the-art. To better understand what the network actually learns, we propose a novel visualization of CNN kernels as flow fields.

In our original work, Temporal Segmentation of Egocentric Videos, we propose a robust temporal segmentation of egocentric videos into a hierarchy of motion classes using a new Cumulative Displacement Curves. Unlike instantaneous motion vectors, segmentation using integrated motion vectors performs well even in dynamic and crowded scenes. No assumptions are made on the underlying scene structure and the algorithm works in indoor as well as outdoor situations. We demonstrate the effectiveness of our approach using publicly available videos as well as choreographed videos. We also suggest an approach to detect the fixation of wearer's gaze in the walking portion of the egocentric videos.

WACV 2016 Paper - Compact CNN for Indexing Egocentric Videos

Overview

PDF

@inproceedings{poleg_wacv16_compactcnn,
  title     = {Compact CNN for Indexing Egocentric Videos},
  author    = {Yair Poleg and Ariel Ephrat and Shmuel Peleg and Chetan Arora},
  year      = {2016},
  booktitle = {WACV}
}

Code

Coming soon...

CVPR 2014 Paper - Temporal Segmentation of Egocentric Videos

Overview

PDF

@inproceedings{poleg_cvpr14_egoseg,
  title     = {Temporal Segmentation of Egocentric Videos},
  author    = {Yair Poleg and Chetan Arora and Shmuel Peleg},
  year      = {2014},
  booktitle = {CVPR}
}

Code

Download Code (Matlab & C++)

HUJI EgoSeg Dataset


Our original egocentric video dataset with 122 videos comprised of footage shot by us, along with curated videos from YouTube.