DocumentCode :
653727
Title :
Text spotting in large speech databases for under-resourced languages
Author :
Buzo, Andi ; Cucu, H. ; Burileanu, C.
Author_Institution :
Speech & Dialogue (SpeeD) Res. Lab., Univ. “Politeh.” of Bucharest, Bucharest, Romania
fYear :
2013
fDate :
16-19 Oct. 2013
Firstpage :
1
Lastpage :
6
Abstract :
Lightly supervised acoustic modeling in under-resourced languages raises new issues due to the poor accuracy of Automatic Speech Recognition (ASR) systems for such languages and the quality of the speech transcriptions that may be found. In these conditions, the common alignment techniques are not always capable of aligning the ASR output and the approximate transcription. We propose two aligning methods that overcome these issues. In the first approach we apply an image processing algorithm on the matching matrix of the two texts to be aligned, while the second alignment approach is based on segmental DTW. The approaches outperform the current Dynamic Time Warping technique (DTW) by extracting in average 29% and 27% respectively more speech data than the currently used DTW.
Keywords :
acoustic signal processing; audio databases; image matching; matrix algebra; speech recognition; text analysis; ASR output; ASR systems; aligning method; automatic speech recognition systems; image processing algorithm; large speech databases; lightly supervised acoustic modeling; matching matrix; speech data extraction; speech transcriptions quality; text spotting; under-resourced languages; Accuracy; Databases; Speech; Speech recognition; Training; Wiener filters; lightly supervised acoustic modeling; text alignment; under-resourced languages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Speech Technology and Human - Computer Dialogue (SpeD), 2013 7th Conference on
Conference_Location :
Cluj-Napoca
Type :
conf
DOI :
10.1109/SpeD.2013.6682654
Filename :
6682654
Link To Document :
بازگشت