Transcribing vocal expression from polyphonic music

Author

Ikemiya, Yukara ; Itoyama, Katsutoshi ; Okuno, Hiroshi G.

Author_Institution

Grad. Sch. of Inf., Kyoto Univ., Kyoto, Japan

fYear

2014

fDate

4-9 May 2014

Firstpage

3127

Lastpage

3131

Abstract

A method for transcribing vocal expressions such as vibrato, glissando, and kobushi separately from polyphonic music is described. The expressions appear as fluctuation in the fundamental frequency contour of the singing voice. They can be used for search and retrieval of music and for expressive singing voice synthesis based on singing style since they strongly reflect the individuality of the singer. The fundamental frequency contour of the singing voice is estimated using the Viterbi algorithm with limitation from a corresponding note sequence. Next, the notes are aligned with the fundamental frequency sequence temporally. Finally, each expression is identified and parameterized in accordance with designed rules. Experiments demonstrated that this method can transcribe expressions in the singing voice from commercial recordings.

Keywords

information retrieval; music; speech synthesis; Viterbi algorithm; expressive singing voice synthesis; fundamental frequency contour; music information retrieval; note sequence; polyphonic music; vocal expressions; Accuracy; Cost function; Estimation; Frequency estimation; Hidden Markov models; Speech; Time-frequency analysis; F0 estimation; Singing voice analysis; Vocal expression identification / transcription;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location

Florence

Type

conf

DOI

10.1109/ICASSP.2014.6854176

Filename

6854176