DocumentCode :
20661
Title :
Generalized Hough Transform for Speech Pattern Classification
Author :
Dennis, Jonathan ; Tran, Huy Dat ; Haizhou Li
Author_Institution :
Inst. for Infocomm Res., A*STAR, Singapore, Singapore
Volume :
23
Issue :
11
fYear :
2015
fDate :
Nov. 2015
Firstpage :
1963
Lastpage :
1972
Abstract :
While typical hybrid neural network architectures for automatic speech recognition (ASR) use a context window of frame-based features, this may not be the best approach to capture the wider temporal context, which contains phonetic and linguistic information that is equally important. In this paper, we introduce a system that integrates both the spectral and geometrical shape information from the acoustic spectrum, inspired by research in the field of machine vision. In particular, we focus on the Generalized Hough Transform (GHT), which is a sophisticated technique that can model the geometrical distribution of speech information over the wider temporal context. To integrate the GHT as part of a hybrid-ASR system, we propose to use a neural network, with features derived from the probabilistic Hough voting step of the GHT, to implement an improved version of the GHT where the output of the network represents the conventional target class posteriors. A major advantage of our approach is that each step of the GHT is highly interpretable, particularly compared to deep neural network (DNN) systems which are commonly treated as powerful black-box classifiers that give little insight into how the output is achieved. Experiments are carried out on two speech pattern classification tasks. The first is the TIMIT phoneme classification, which demonstrates the performance of the approach on a standard ASR task. The second is a spoken word recognition challenge, which highlights the flexibility of the approach to capture phonetic information within a longer temporal context.
Keywords :
Hough transforms; computational geometry; neural nets; probability; signal classification; speech recognition; GHT; TIMIT phoneme classification; automatic speech recognition; context window; frame-based features; generalized Hough transform; geometrical shape information; geometrical speech information distribution; hybrid neural network architectures; hybrid-ASR system; linguistic information; phonetic information; probabilistic Hough voting step; spectral shape information; speech pattern classification; spoken word recognition challenge; Context; Hidden Markov models; Neural networks; Speech; Speech processing; Speech recognition; Transforms; Codebook activation map; TIMIT; generalized Hough transform; speech pattern classification;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
2329-9290
Type :
jour
DOI :
10.1109/TASLP.2015.2459599
Filename :
7163536
Link To Document :
بازگشت