• DocumentCode
    1138025
  • Title

    Computable scenes and structures in films

  • Author

    Sundaram, Hari ; Chang, Shih-Fu

  • Author_Institution
    Dept. of Electr. Eng., Columbia Univ., New York, NY, USA
  • Volume
    4
  • Issue
    4
  • fYear
    2002
  • fDate
    12/1/2002 12:00:00 AM
  • Firstpage
    482
  • Lastpage
    491
  • Abstract
    We present a computational scene model and also derive novel algorithms for computing audio and visual scenes and within-scene structures in films. We use constraints derived from film-making rules and from experimental results in the psychology of audition, in our computational scene model. Central to the computational model is the notion of a causal, finite-memory viewer model. We segment the audio and video data separately. In each case, we determine the degree of correlation of the most recent data in the memory with the past. The audio and video scene boundaries are determined using local maxima and minima, respectively. We derive four types of computable scenes that arise due to different kinds of audio and video scene boundary synchronizations. We show how to exploit the local topology of an image sequence in conjunction with statistical tests, to determine dialogs. We also derive a simple algorithm to detect silences in audio. An important feature of our work is to introduce semantic constraints based on structure and silence in our computational model. This results in computable scenes that are more consistent with human observations. The algorithms were tested on a difficult data set: three commercial films. We take the first hour of data from each of the three films. The best results: computational scene detection: 94%; dialogue detection: 91%; and recall 100% precision.
  • Keywords
    audio signal processing; edge detection; image segmentation; video signal processing; audio scenes; audition; causal finite-memory viewer model; commercial films; computable scenes; computational scene model; data set; dialogue detection; film-making production rules; film-making rules; films; human observations; image sequence; joint audio-visual segmentation; local maxima; local minima; local topology; psychology; semantic constraints; silence detection; statistical tests; structure discovery; visual scenes; within-scene structures; Computational modeling; Detectors; Humans; Image sequences; Layout; Navigation; Production; Psychology; Testing; Topology;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2002.802017
  • Filename
    1176946