DocumentCode
20661
Title
Generalized Hough Transform for Speech Pattern Classification
Author
Dennis, Jonathan ; Tran, Huy Dat ; Haizhou Li
Author_Institution
Inst. for Infocomm Res., A*STAR, Singapore, Singapore
Volume
23
Issue
11
fYear
2015
fDate
Nov. 2015
Firstpage
1963
Lastpage
1972
Abstract
While typical hybrid neural network architectures for automatic speech recognition (ASR) use a context window of frame-based features, this may not be the best approach to capture the wider temporal context, which contains phonetic and linguistic information that is equally important. In this paper, we introduce a system that integrates both the spectral and geometrical shape information from the acoustic spectrum, inspired by research in the field of machine vision. In particular, we focus on the Generalized Hough Transform (GHT), which is a sophisticated technique that can model the geometrical distribution of speech information over the wider temporal context. To integrate the GHT as part of a hybrid-ASR system, we propose to use a neural network, with features derived from the probabilistic Hough voting step of the GHT, to implement an improved version of the GHT where the output of the network represents the conventional target class posteriors. A major advantage of our approach is that each step of the GHT is highly interpretable, particularly compared to deep neural network (DNN) systems which are commonly treated as powerful black-box classifiers that give little insight into how the output is achieved. Experiments are carried out on two speech pattern classification tasks. The first is the TIMIT phoneme classification, which demonstrates the performance of the approach on a standard ASR task. The second is a spoken word recognition challenge, which highlights the flexibility of the approach to capture phonetic information within a longer temporal context.
Keywords
Hough transforms; computational geometry; neural nets; probability; signal classification; speech recognition; GHT; TIMIT phoneme classification; automatic speech recognition; context window; frame-based features; generalized Hough transform; geometrical shape information; geometrical speech information distribution; hybrid neural network architectures; hybrid-ASR system; linguistic information; phonetic information; probabilistic Hough voting step; spectral shape information; speech pattern classification; spoken word recognition challenge; Context; Hidden Markov models; Neural networks; Speech; Speech processing; Speech recognition; Transforms; Codebook activation map; TIMIT; generalized Hough transform; speech pattern classification;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher
ieee
ISSN
2329-9290
Type
jour
DOI
10.1109/TASLP.2015.2459599
Filename
7163536
Link To Document