Abstract :
Traditionally, signal processing is considered simply low-level processing. In the past decade, however, signal processing has grown to become the area where a variety of tools are created to solve high-level problems that conventionally would be studied by computer vision or machine learning researchers exclusively. For example, multiresolution analysis created popular image features like SIFT (scale-invariant feature transform), and statistical analysis gave birth to graphical models such as HMM (hidden Markov models) and topic models. In this talk, we will use one application to illustrate this growth of signal processing: object discovery, i.e., extracting the "object of interest" from a set of images in a completely unsupervised manner. Often based on image features like SIFT, and the topic models, object discovery has recently attracted a lot of attention in video content extraction. In this talk, we will outline this approach and extend it from still images to motion videos. We will propose a novel spatial-temporal framework that applies statistical models to both appearance modeling and motion modeling. The spatial and temporal models are integrated so that motion ambiguities can be resolved by appearance, and appearance ambiguities can be resolved by motion. In addition, we can extract hierarchical relationships among objects, completely driven by data without any manual labeling. This framework finds application in video retrieval (e.g., for YouTube or Google Video) and video surveillance.