Abstract :
Our goal is to recover temporal trajectories of all pixels in a reference image for the given image sequence, and segment the image based on motion similarities. These trajectories can be visualized by observing the 3D (x, y, t) spatiotemporal volume. The mathematical formalism describing the evolution of pixels in time is that of fiber bundles, but it is difficult to implement directly. Instead, we express the problem in a higher dimensional 5D space, in which pixels with coherent apparent motion produce smooth 3D layers. The coordinates in this 5D space are (x, y, t, vx, vy). It is initially populated by peaks of correlation. We then enforce smoothness both in the spatial and temporal domains simultaneously, using the tensor voting framework. Unlike the previous 4D approach which uses only two frames, we fully take advantage of the temporal information through multiple images, and it significantly improves the motion analysis results. The approach is generic, in the sense that it does not make restrictive assumptions on the observed scene or on the camera motion. We present some results on real data sets, and they are very good on even challenging image sequences such as serious occlusion.