DocumentCode
2970927
Title
From speech to letters - using a novel neural network architecture for grapheme based ASR
Author
Eyben, Florian ; Wöllmer, Martin ; Schuller, Björn ; Graves, Alex
Author_Institution
Inst. for Human-Machine Commun., Tech. Univ. Munchen, Munich, Germany
fYear
2009
fDate
Nov. 13 2009-Dec. 17 2009
Firstpage
376
Lastpage
380
Abstract
Main-stream automatic speech recognition systems are based on modelling acoustic sub-word units such as phonemes. Phonemisation dictionaries and language model based decoding techniques are applied to transform the phoneme hypothesis into orthographic transcriptions. Direct modelling of graphemes as sub-word units using HMM has not been successful. We investigate a novel ASR approach using Bidirectional Long Short-Term Memory Recurrent Neural Networks and Connectionist Temporal Classification, which is capable of transcribing graphemes directly and yields results highly competitive with phoneme transcription. In design of such a grapheme based speech recognition system phonemisation dictionaries are no longer required. All that is needed is text transcribed on the sentence level, which greatly simplifies the training procedure. The novel approach is evaluated extensively on the Wall Street Journal 1 corpus.
Keywords
recurrent neural nets; speech coding; speech recognition; bidirectional long short-term memory recurrent neural networks; connectionist temporal classification; decoding techniques; grapheme based speech recognition system phonemisation dictionaries; language model; main-stream automatic speech recognition systems; orthographic transcriptions; Automatic speech recognition; Computer science; Context modeling; Dictionaries; Hidden Markov models; Man machine systems; Natural languages; Neural networks; Recurrent neural networks; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on
Conference_Location
Merano
Print_ISBN
978-1-4244-5478-5
Electronic_ISBN
978-1-4244-5479-2
Type
conf
DOI
10.1109/ASRU.2009.5373257
Filename
5373257
Link To Document