DocumentCode :
1554130
Title :
Integrating Additional Chord Information Into HMM-Based Lyrics-to-Audio Alignment
Author :
Mauch, Matthias ; Fujihara, Hiromasa ; Goto, Masataka
Author_Institution :
Nat. Inst. of Adv. Ind. Sci. & Technol. (AIST), Tsukuba, Japan
Volume :
20
Issue :
1
fYear :
2012
Firstpage :
200
Lastpage :
210
Abstract :
Aligning lyrics to audio has a wide range of applications such as the automatic generation of karaoke scores, song-browsing by lyrics, and the generation of audio thumbnails. Existing methods are restricted to using only lyrics and match them to phoneme features extracted from the audio (usually mel-frequency cepstral coefficients). Our novel idea is to integrate the textual chord information provided in the paired chords-lyrics format known from song books and Internet sites into the inference procedure. We propose two novel methods that implement this idea: First, assuming that all chords of a song are known, we extend a hidden Markov model (HMM) framework by including chord changes in the Markov chain and an additional audio feature (chroma) in the emission vector; second, for the more realistic case in which some chord information is missing, we present a method that recovers the missing chord information by exploiting repetition in the song. We conducted experiments with five changing parameters and show that with accuracies of 87.5% and 76.7%, respectively, both methods perform better than the baseline with statistical significance. We introduce the new accompaniment interface Song Prompter, which uses the automatically aligned lyrics to guide musicians through a song. It demonstrates that the automatic alignment is accurate enough to be used in a musical performance.
Keywords :
audio signal processing; hidden Markov models; inference mechanisms; information retrieval; music; speech processing; HMM; Markov chain; audio thumbnails generation; chord information; emission vector; hidden Markov model; inference procedure; karaoke scores; lyrics-to-audio alignment; mel-frequency cepstral coefficient; phoneme feature; song prompter; song-browsing; Feature extraction; Hidden Markov models; Instruments; Internet; Materials; Mel frequency cepstral coefficient; Audio user interfaces; hidden Markov models (HMMs); music; music information retrieval; speech processing;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2011.2159595
Filename :
5876304
Link To Document :
بازگشت