Generalized Hough Transform for Speech Pattern Classification

Author

Dennis, Jonathan ; Tran, Huy Dat ; Haizhou Li

Author_Institution

Inst. for Infocomm Res., A*STAR, Singapore, Singapore

Volume

23

Issue

11

fYear

2015

fDate

Nov. 2015

Firstpage

1963

Lastpage

1972

Abstract

While typical hybrid neural network architectures for automatic speech recognition (ASR) use a context window of frame-based features, this may not be the best approach to capture the wider temporal context, which contains phonetic and linguistic information that is equally important. In this paper, we introduce a system that integrates both the spectral and geometrical shape information from the acoustic spectrum, inspired by research in the field of machine vision. In particular, we focus on the Generalized Hough Transform (GHT), which is a sophisticated technique that can model the geometrical distribution of speech information over the wider temporal context. To integrate the GHT as part of a hybrid-ASR system, we propose to use a neural network, with features derived from the probabilistic Hough voting step of the GHT, to implement an improved version of the GHT where the output of the network represents the conventional target class posteriors. A major advantage of our approach is that each step of the GHT is highly interpretable, particularly compared to deep neural network (DNN) systems which are commonly treated as powerful black-box classifiers that give little insight into how the output is achieved. Experiments are carried out on two speech pattern classification tasks. The first is the TIMIT phoneme classification, which demonstrates the performance of the approach on a standard ASR task. The second is a spoken word recognition challenge, which highlights the flexibility of the approach to capture phonetic information within a longer temporal context.

Keywords

Hough transforms; computational geometry; neural nets; probability; signal classification; speech recognition; GHT; TIMIT phoneme classification; automatic speech recognition; context window; frame-based features; generalized Hough transform; geometrical shape information; geometrical speech information distribution; hybrid neural network architectures; hybrid-ASR system; linguistic information; phonetic information; probabilistic Hough voting step; spectral shape information; speech pattern classification; spoken word recognition challenge; Context; Hidden Markov models; Neural networks; Speech; Speech processing; Speech recognition; Transforms; Codebook activation map; TIMIT; generalized Hough transform; speech pattern classification;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE/ACM Transactions on

Publisher

ieee

ISSN

2329-9290

Type

jour

DOI

10.1109/TASLP.2015.2459599

Filename

7163536