• DocumentCode
    741014
  • Title

    An Information-Extraction Approach to Speech Processing: Analysis, Detection, Verification, and Recognition

  • Author

    Chin-Hui Lee ; Siniscalchi, Sabato Marco

  • Author_Institution
    Sch. of Electr. & Comput. Eng., Georgia Inst. of Technol., Atlanta, GA, USA
  • Volume
    101
  • Issue
    5
  • fYear
    2013
  • fDate
    5/1/2013 12:00:00 AM
  • Firstpage
    1089
  • Lastpage
    1115
  • Abstract
    The field of automatic speech recognition (ASR) has enjoyed more than 30 years of technology advances due to the extensive utilization of the hidden Markov model (HMM) framework and a concentrated effort by the speech community to make available a vast amount of speech and language resources, known today as the Big Data Paradigm. State-of-the-art ASR systems achieve a high recognition accuracy for well-formed utterances of a variety of languages by decoding speech into the most likely sequence of words among all possible sentences represented by a finite-state network (FSN) approximation of all the knowledge sources required by the ASR task. However, the ASR problem is still far from being solved because not all information available in the speech knowledge hierarchy can be directly integrated into the FSN to improve the ASR performance and enhance system robustness. It is believed that some of the current issues of integrating various knowledge sources in top-down integrated search can be partially addressed by processing techniques that take advantage of the full set of acoustic and language information in speech. It has long been postulated that human speech recognition (HSR) determines the linguistic identity of a sound based on detected evidence that exists at various levels of the speech knowledge hierarchy, ranging from acoustic phonetics to syntax and semantics. This calls for a bottom-up attribute detection and knowledge integration framework that links speech processing with information extraction, by spotting speech cues with a bank of attribute detectors, weighting and combining acoustic evidence to form cognitive hypotheses, and verifying these theories until a consistent recognition decision can be reached. The recently proposed automatic speech attribute transcription (ASAT) framework is an attempt to mimic some HSR capabilities with asynchronous speech event detection followed by bottom-up knowledge integration and verification. In the last few year- , ASAT has demonstrated good potential and has been applied to a variety of existing applications in speech processing and information extraction.
  • Keywords
    decoding; hidden Markov models; speech recognition; ASAT; FSN; HMM; HSR capabilities; acoustic information; acoustic phonetics; automatic speech attribute transcription framework; automatic speech recognition; big data paradigm; bottom-up attribute detection; bottom-up knowledge integration; finite-state network approximation; hidden Markov model; information-extraction approach; knowledge integration framework; language information; speech decoding; speech knowledge hierarchy; speech processing; state-of-the-art ASR systems; top-down integrated search; Acoustic signal processing; Data models; Hidden Markov models; Information processing; Knowledge management; Natural language processing; Speech processing; Speech recognition; Acoustic phonetics; automatic speech attribute transcription (ASAT); automatic speech recognition (ASR); cross-language phone recognition; knowledge integration; lattice rescoring; place and manner of articulation; speech attribute detection;
  • fLanguage
    English
  • Journal_Title
    Proceedings of the IEEE
  • Publisher
    ieee
  • ISSN
    0018-9219
  • Type

    jour

  • DOI
    10.1109/JPROC.2013.2238591
  • Filename
    6457407