Title :
Three techniques for improving automatic synchronization between music and lyrics: Fricative detection, filler model, and novel feature vectors for vocal activity detection
Author :
Fujihara, Hiromasa ; Goto, Masataka
Author_Institution :
Nat. Inst. of Adv. Ind. Sci. & Technol., Ikeda
fDate :
March 31 2008-April 4 2008
Abstract :
Three techniques are described that improve a previously developed system for automatically synchronizing lyrics with musical audio signals. Although this system achieves state-of-the-art accuracy by extracting vocal vowels from polyphonic sound mixtures and using forced alignment between those vowels and a phoneme network of the lyrics, there was still room for improvement. The first technique detects nonexistence regions in which fricative consonant sounds do not exist, which were not utilized in the previous system, and prohibits the alignment of the fricative phonemes to those regions. The second technique inserts a filler model between phrases of the phoneme network. This model improves the accuracy of the forced alignment by ignoring inter-phrase vowel utterances not included in the lyrics. The third technique introduces novel feature vectors for vocal activity detection that enable a distance calculation between two sets of the harmonic structure without estimating their spectral envelopes. Experimental results showed that all three techniques contribute to improved synchronization.
Keywords :
acoustic signal detection; acoustic signal processing; audio signal processing; feature extraction; music; speech; synchronisation; automatic synchronization; feature vectors; filler model; fricative detection; harmonic structure; music-lyrics synchronization; musical audio signals; phoneme network; polyphonic sound mixtures; singing voices; vocal activity detection; vocal vowel extraction; Automatic speech recognition; Envelope detectors; Frequency synchronization; Multiple signal classification; Music; Power system harmonics; Signal processing; Speech recognition; Vectors; Viterbi algorithm; Filler model; Fricative sounds; Lyrics; Music; Spectral representation;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4244-1483-3
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2008.4517548