• DocumentCode
    1017478
  • Title

    Audio Keywords Discovery for Text-Like Audio Content Analysis and Retrieval

  • Author

    Lu, Lie ; Hanjalic, Alan

  • Author_Institution
    Microsoft Res. Asia, Beijing
  • Volume
    10
  • Issue
    1
  • fYear
    2008
  • Firstpage
    74
  • Lastpage
    85
  • Abstract
    Inspired by classical text document analysis employing the concept of (key) words, this paper presents an unsupervised approach to discover (key) audio elements in general audio documents. The (key) audio elements can be considered the equivalents of the text (key) words, and enable content-based audio analysis and retrieval following the analogy to the proven text analysis theories and methods. Since general audio signals usually show complicated and strongly varying distribution and density in the feature space, we propose an iterative spectral clustering method with context-dependent scaling factors to decompose an audio data stream into audio elements. Using this clustering method, temporal signal segments with similar low-level features are grouped into natural clusters that we adopt as audio elements. To detect those audio elements that are most representative for the semantic content, that is, the key audio elements, two cases are considered. First, if only one audio document is available for analysis, a number of heuristic importance indicators are defined and employed to detect the key audio elements. For the case that multiple audio documents are available, more sophisticated measures for audio element importance, including expected term frequency (ETF), expected inverse document frequency (EIDF), expected term duration (ETD) and expected inverse document duration (EIDD), are proposed. Our experiments showed encouraging results regarding the quality of the obtained (key) audio elements and their potential applicability for content-based audio document analysis and retrieval.
  • Keywords
    audio signal processing; content-based retrieval; data mining; iterative methods; pattern clustering; text analysis; audio data stream; content-based audio document analysis; content-based audio document retrieval; context-dependent scaling factor; expected inverse document duration; expected inverse document frequency; expected term duration; expected term frequency; iterative spectral clustering method; temporal signal segment; text document analysis; unsupervised audio keywords discovery; Clustering methods; Content based retrieval; Frequency measurement; Hidden Markov models; Iterative methods; Signal analysis; Streaming media; Support vector machine classification; Support vector machines; Text analysis; Audio content mining; audio element; audio keywords; content-based audio analysis; key audio element; knowledge discovery;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2007.911304
  • Filename
    4407815