• DocumentCode
    1063214
  • Title

    Prominence Detection Using Auditory Attention Cues and Task-Dependent High Level Information

  • Author

    Kalinli, Ozlem ; Narayanan, Shrikanth

  • Author_Institution
    Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA
  • Volume
    17
  • Issue
    5
  • fYear
    2009
  • fDate
    7/1/2009 12:00:00 AM
  • Firstpage
    1009
  • Lastpage
    1024
  • Abstract
    Auditory attention is a complex mechanism that involves the processing of low-level acoustic cues together with higher level cognitive cues. In this paper, a novel method is proposed that combines biologically inspired auditory attention cues with higher level lexical and syntactic information to model task-dependent influences on a given spoken language processing task. A set of low-level multiscale features (intensity, frequency contrast, temporal contrast, orientation, and pitch) is extracted in parallel from the auditory spectrum of the sound based on the processing stages in the central auditory system to create feature maps that are converted to auditory gist features that capture the essence of a sound scene. The auditory attention model biases the gist features in a task-dependent way to maximize target detection in a given scene. Furthermore, the top-down task-dependent influence of lexical and syntactic information is incorporated into the model using a probabilistic approach. The lexical information is incorporated by using a probabilistic language model, and the syntactic knowledge is modeled using part-of-speech (POS) tags. The combined model is tested on automatically detecting prominent syllables in speech using the BU Radio News Corpus. The model achieves 88.33% prominence detection accuracy at the syllable level and 85.71% accuracy at the word level. These results compare well with reported human performance on this task.
  • Keywords
    acoustic signal processing; audio signal processing; feature extraction; probability; auditory attention cues; auditory attention model; central auditory system; efficiency 85.71 percent; efficiency 88.33 percent; feature extraction; lexical information; part-of-speech tags; probabilistic language model; prominence detection; spoken language processing; target detection; task-dependent high level information; Acoustic signal detection; Auditory system; Automatic testing; Biological system modeling; Data mining; Frequency conversion; Layout; Natural languages; Object detection; Speech; Accent; auditory attention; auditory gist; lexical rules; prominence; stress; syntax; task-dependent;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2009.2014795
  • Filename
    5067419