• DocumentCode
    491680
  • Title

    Unsupervised object of interest discovery in multi-view video sequence

  • Author

    Thummanuntawat, Thanaphat ; Kumwilaisak, Wuttipong ; Chinrungrueng, Jatuporn

  • Author_Institution
    Electron. & Telecommun. Dept., King Mongkut´´s Univ. of Technol., Bangkok
  • Volume
    03
  • fYear
    2009
  • fDate
    15-18 Feb. 2009
  • Firstpage
    1622
  • Lastpage
    1627
  • Abstract
    This paper presents a novel algorithm in unsupervised object of interest discovery for multi-view video sequences. We classify a multi-view video sequence based on the degree of movement in a video sequence. In a video sequence with movement, we first group video frames along and across views as a group of picture (GOP). Key points or feature vectors representing textures existing in video frames in GOP are extracted using Scale-Invariant Feature Transform (SIFT). Key points are clustered using K-mean algorithm. Visual words are assigned to all key points based on their clusters. Patches represented small areas with textures are generated using the Maximally Stable Extremal Regions (MSER) operator. One patch can contain more than one key point, which leads to more than one visual word. Therefore, the patch can be represented by different visual words in different degrees. Motion detection algorithm is used to determine movement regions in video frames. Patches in the movement regions have higher likelihoods to be parts of the object of interest. With the developed spatial modeling and appearance modeling as well as the motion detection, we compute the likelihood which patches will belong to the object of interest. The group of patches with high likelihoods is clustered and indicated as the object of interest. When there are no or not significant movement, we assume that the human subjects are the most important objects in video sequences. A face detection algorithm is used to determine the location of the object of interest. When there are no human subjects in video sequences, the frequencies of visual words occurring in video sequences are used to identify the object of interest. This can be done because patches, which will be parts of the objects of interest, can be derived from the visual words. The experimental results in various types of multi-view video sequences show that our proposed algorithm can discover the objects of interest in multi-view video sequence- - s correctly over 80% by average.
  • Keywords
    image motion analysis; image sequences; image texture; object detection; K-mean algorithm; appearance modeling; face detection algorithm; group of picture; group video frames; maximally stable extremal regions operator; motion detection algorithm; multiview video sequence; scale-invariant feature transform; spatial modeling; unsupervised object; visual words; Cameras; Clustering algorithms; Face detection; Humans; Laboratories; Layout; Motion detection; Multimedia communication; Resource management; Video sequences; Multi-view video; maximally stable extremal regions; motion detection; scale-invariant feature transform; spatial and appearance modeling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Communication Technology, 2009. ICACT 2009. 11th International Conference on
  • Conference_Location
    Phoenix Park
  • ISSN
    1738-9445
  • Print_ISBN
    978-89-5519-138-7
  • Electronic_ISBN
    1738-9445
  • Type

    conf

  • Filename
    4809383