DocumentCode :
118043
Title :
An inter-speaker evaluation through simulation of electrolarynx control based on statistical F0 prediction
Author :
Tanaka, Kou ; Toda, Tomoki ; Neubig, Graham ; Sakti, Sakriani ; Nakamura, Satoshi
Author_Institution :
Grad. Sch. of Inf. Sci., Nara Inst. of Sci. & Technol., Nara, Japan
fYear :
2014
fDate :
9-12 Dec. 2014
Firstpage :
1
Lastpage :
4
Abstract :
An electrolarynx is a device that artificially generates excitation sounds to produce electrolaryngeal (EL) speech. Although proficient laryngectomees can produce intelligible EL speech by using this device, it sounds quite unnatural due to the mechanical excitation. To address this issue, we have proposed several EL speech enhancement methods using statistical voice conversion and showed that statistical prediction of excitation parameters, such as F0 patterns, was essential to significantly improve naturalness of EL speech. Based on this result, we have also proposed a direct control method of F0 patterns of excitation sounds generated from the electrolarynx based on the statistical excitation prediction, which may allow EL speech enhancement to be applied to face-to-face conversation. In our previous work, this direct control method was evaluated through simulation using only a single laryngectomee´s EL speech and it was demonstrated that this method allows for improved naturalness of EL speech while preserving listenability. However, because quality of EL speech highly depends on the proficiency of each laryngectomee, it is still not clear whether these methods will generalize to other speakers. In addition, while previous work only evaluated the naturalness and listenability, intelligibility is also an important factor that has not been evaluated. In this paper, we apply the direct control method to multiple speakers consisting of two real laryngectomees and one non-laryngectomee and evaluate its performance through simulations in terms of naturalness, listenability, and intelligibility. The experimental results demonstrate that the proposed method yields significant improvements in naturalness of EL speech for multiple laryngectomees while maintaining listenability and intelligibility.
Keywords :
medical signal processing; prosthetics; speech enhancement; speech intelligibility; statistical analysis; EL speech enhancement method; EL speech intelligibility; EL speech listenability; EL speech naturalness; direct control method; electrolaryngeal speech; electrolarynx control; excitation sound; face-to-face conversation; intelligible EL speech; inter-speaker evaluation; laryngectomees; mechanical excitation; statistical F0 prediction; statistical excitation prediction; statistical voice conversion; Accuracy; Delays; Feature extraction; Real-time systems; Speech; Speech enhancement;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)
Conference_Location :
Siem Reap
Type :
conf
DOI :
10.1109/APSIPA.2014.7041593
Filename :
7041593
Link To Document :
بازگشت