DocumentCode :
179469
Title :
Vocal timbre analysis using latent Dirichlet allocation and cross-gender vocal timbre similarity
Author :
Nakano, T. ; Yoshii, Kazutomo ; Goto, Misako
Author_Institution :
Nat. Inst. of Adv. Ind. Sci. & Technol. (AIST), Tsukuba, Japan
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
5202
Lastpage :
5206
Abstract :
This paper presents a vocal timbre analysis method based on topic modeling using latent Dirichlet allocation (LDA). Although many works have focused on analyzing characteristics of singing voices, none have dealt with “latent” characteristics (topics) of vocal timbre, which are shared by multiple singing voices. In the work described in this paper, we first automatically extracted vocal timbre features from polyphonic musical audio signals including vocal sounds. The extracted features were used as observed data, and mixing weights of multiple topics were estimated by LDA. Finally, the semantics of each topic were visualized by using a word-cloud-based approach. Experimental results for a singer identification task using 36 songs sung by 12 singers showed that our method achieved a mean reciprocal rank of 0.86. We also proposed a method for estimating cross-gender vocal timbre similarity by generating pitch-shifted (frequency-warped) signals of every singing voice. Experimental results for a cross-gender singer retrieval task showed that our method discovered interesting similar pitch-shifted singers.
Keywords :
audio signals; feature extraction; speech processing; LDA; automatically extracted vocal timbre features; cross-gender singer retrieval task; cross-gender vocal timbre similarity; frequency-warped signals; latent Dirichlet allocation; latent characteristics; mean reciprocal rank; mixing weights; multiple singing voices; observed data; pitch-shifted signals; polyphonic musical audio signals; singer identification task; vocal sounds; vocal timbre analysis; word cloud; Estimation; Feature extraction; Resource management; Timbre; Vectors; Visualization; cross-gender similarity; latent Dirichlet allocation; music information retrieval; vocal timbre; word cloud;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6854595
Filename :
6854595
Link To Document :
بازگشت