Vocal timbre analysis using latent Dirichlet allocation and cross-gender vocal timbre similarity

Author

Nakano, T. ; Yoshii, Kazutomo ; Goto, Misako

Author_Institution

Nat. Inst. of Adv. Ind. Sci. & Technol. (AIST), Tsukuba, Japan

fYear

2014

fDate

4-9 May 2014

Firstpage

5202

Lastpage

5206

Abstract

This paper presents a vocal timbre analysis method based on topic modeling using latent Dirichlet allocation (LDA). Although many works have focused on analyzing characteristics of singing voices, none have dealt with “latent” characteristics (topics) of vocal timbre, which are shared by multiple singing voices. In the work described in this paper, we first automatically extracted vocal timbre features from polyphonic musical audio signals including vocal sounds. The extracted features were used as observed data, and mixing weights of multiple topics were estimated by LDA. Finally, the semantics of each topic were visualized by using a word-cloud-based approach. Experimental results for a singer identification task using 36 songs sung by 12 singers showed that our method achieved a mean reciprocal rank of 0.86. We also proposed a method for estimating cross-gender vocal timbre similarity by generating pitch-shifted (frequency-warped) signals of every singing voice. Experimental results for a cross-gender singer retrieval task showed that our method discovered interesting similar pitch-shifted singers.

Keywords

audio signals; feature extraction; speech processing; LDA; automatically extracted vocal timbre features; cross-gender singer retrieval task; cross-gender vocal timbre similarity; frequency-warped signals; latent Dirichlet allocation; latent characteristics; mean reciprocal rank; mixing weights; multiple singing voices; observed data; pitch-shifted signals; polyphonic musical audio signals; singer identification task; vocal sounds; vocal timbre analysis; word cloud; Estimation; Feature extraction; Resource management; Timbre; Vectors; Visualization; cross-gender similarity; latent Dirichlet allocation; music information retrieval; vocal timbre; word cloud;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location

Florence

Type

conf

DOI

10.1109/ICASSP.2014.6854595

Filename

6854595