Title :
Automatic Speech Sentence Segmentation from Multi-paragraph Databases
Author :
Zhang Wei ; Pang Minhui ; Du Ranran ; Liu Yayu
Author_Institution :
Dept. of Comput. Sci. & Technol., Ocean Univ. of China, Qingdao, China
Abstract :
Speech sentence is the input of automatic phonetic segmentation or transcription. This paper discusses our efforts on automatic speech sentence segmentation from multi-paragraph speech databases for building Text-To-Speech (TTS) system speech corpus automatically. We present a) a system of automatic speech sentence segmentation from broadcasting audio based on forced alignment technique, in which a checking Mechanism based on speech recognition technique is also used, b) an iterative algorithm to improve the system, c) a music detector based on a scheme combination of Variable Duration Hidden Markov Model (VDHMM) and Gaussian Mixture Model (GMM). Experiments show that the improved system has 98.93% of Sentence Accurate Rate (SAR) and generates 646 correct sentences, compared with 97.85% of SAR, and 155 correct sentences in original system.
Keywords :
Gaussian processes; hidden Markov models; speech processing; speech recognition; speech synthesis; Gaussian mixture model; automatic phonetic segmentation; automatic speech sentence segmentation; broadcasting audio; forced alignment technique; iterative algorithm; multiparagraph speech databases; music detector; sentence accurate rate; speech recognition technique; text-to-speech system speech corpus; variable duration hidden Markov model; Broadcasting; Databases; Detectors; Hidden Markov models; Iterative algorithms; Marine technology; Sea measurements; Speech processing; Speech synthesis; Viterbi algorithm; automatic speech sentence segmentation; multi-paragraph speech database; speech process;
Conference_Titel :
Measuring Technology and Mechatronics Automation (ICMTMA), 2010 International Conference on
Conference_Location :
Changsha City
Print_ISBN :
978-1-4244-5001-5
Electronic_ISBN :
978-1-4244-5739-7
DOI :
10.1109/ICMTMA.2010.784