An Information-Extraction Approach to Speech Processing: Analysis, Detection, Verification, and Recognition

Author

Chin-Hui Lee ; Siniscalchi, Sabato Marco

Author_Institution

Sch. of Electr. & Comput. Eng., Georgia Inst. of Technol., Atlanta, GA, USA

Volume

101

Issue

fYear

2013

fDate

5/1/2013 12:00:00 AM

Firstpage

1089

Lastpage

1115

Abstract

The field of automatic speech recognition (ASR) has enjoyed more than 30 years of technology advances due to the extensive utilization of the hidden Markov model (HMM) framework and a concentrated effort by the speech community to make available a vast amount of speech and language resources, known today as the Big Data Paradigm. State-of-the-art ASR systems achieve a high recognition accuracy for well-formed utterances of a variety of languages by decoding speech into the most likely sequence of words among all possible sentences represented by a finite-state network (FSN) approximation of all the knowledge sources required by the ASR task. However, the ASR problem is still far from being solved because not all information available in the speech knowledge hierarchy can be directly integrated into the FSN to improve the ASR performance and enhance system robustness. It is believed that some of the current issues of integrating various knowledge sources in top-down integrated search can be partially addressed by processing techniques that take advantage of the full set of acoustic and language information in speech. It has long been postulated that human speech recognition (HSR) determines the linguistic identity of a sound based on detected evidence that exists at various levels of the speech knowledge hierarchy, ranging from acoustic phonetics to syntax and semantics. This calls for a bottom-up attribute detection and knowledge integration framework that links speech processing with information extraction, by spotting speech cues with a bank of attribute detectors, weighting and combining acoustic evidence to form cognitive hypotheses, and verifying these theories until a consistent recognition decision can be reached. The recently proposed automatic speech attribute transcription (ASAT) framework is an attempt to mimic some HSR capabilities with asynchronous speech event detection followed by bottom-up knowledge integration and verification. In the last few year- , ASAT has demonstrated good potential and has been applied to a variety of existing applications in speech processing and information extraction.

Keywords

decoding; hidden Markov models; speech recognition; ASAT; FSN; HMM; HSR capabilities; acoustic information; acoustic phonetics; automatic speech attribute transcription framework; automatic speech recognition; big data paradigm; bottom-up attribute detection; bottom-up knowledge integration; finite-state network approximation; hidden Markov model; information-extraction approach; knowledge integration framework; language information; speech decoding; speech knowledge hierarchy; speech processing; state-of-the-art ASR systems; top-down integrated search; Acoustic signal processing; Data models; Hidden Markov models; Information processing; Knowledge management; Natural language processing; Speech processing; Speech recognition; Acoustic phonetics; automatic speech attribute transcription (ASAT); automatic speech recognition (ASR); cross-language phone recognition; knowledge integration; lattice rescoring; place and manner of articulation; speech attribute detection;

fLanguage

English

Journal_Title

Proceedings of the IEEE

Publisher

ieee

ISSN

0018-9219

Type

jour

DOI

10.1109/JPROC.2013.2238591

Filename

6457407

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=741014