• DocumentCode
    1070929
  • Title

    Exploiting Acoustic and Syntactic Features for Automatic Prosody Labeling in a Maximum Entropy Framework

  • Author

    Sridhar, Vivek Kumar Rangarajan ; Bangalore, Srinivas ; Narayanan, Shrikanth S.

  • Author_Institution
    Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA
  • Volume
    16
  • Issue
    4
  • fYear
    2008
  • fDate
    5/1/2008 12:00:00 AM
  • Firstpage
    797
  • Lastpage
    811
  • Abstract
    In this paper, we describe a maximum entropy-based automatic prosody labeling framework that exploits both language and speech information. We apply the proposed framework to both prominence and phrase structure detection within the Tones and Break Indices (ToBI) annotation scheme. Our framework utilizes novel syntactic features in the form of supertags and a quantized acoustic-prosodic feature representation that is similar to linear parameterizations of the prosodic contour. The proposed model is trained discriminatively and is robust in the selection of appropriate features for the task of prosody detection. The proposed maximum entropy acoustic-syntactic model achieves pitch accent and boundary tone detection accuracies of 86.0% and 93.1% on the Boston University Radio News corpus, and, 79.8% and 90.3% on the Boston Directions corpus. The phrase structure detection through prosodic break index labeling provides accuracies of 84% and 87% on the two corpora, respectively. The reported results are significantly better than previously reported results and demonstrate the strength of maximum entropy model in jointly modeling simple lexical, syntactic, and acoustic features for automatic prosody labeling.
  • Keywords
    natural language processing; Boston Directions corpus; Boston University Radio News corpus; automatic prosody labeling; language information; linear parameterizations; maximum entropy framework; phrase structure detection; prosodic break index labeling; speech information; Acoustic–prosodic representation; ToBI annotation; maximum entropy model; phrasing; prominence; spoken language processing; supertags; suprasegmental information;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2008.917071
  • Filename
    4453862