مرکز منطقه ای اطلاع رساني علوم و فناوري - Three techniques for improving automatic synchronization between music and lyrics: Fricative detection, filler model, and novel feature vectors for vocal activity detection

DocumentCode :

3403852

Title :

Three techniques for improving automatic synchronization between music and lyrics: Fricative detection, filler model, and novel feature vectors for vocal activity detection

Author :

Fujihara, Hiromasa ; Goto, Masataka

Author_Institution :

Nat. Inst. of Adv. Ind. Sci. & Technol., Ikeda

fYear :

2008

fDate :

March 31 2008-April 4 2008

Firstpage :

Lastpage :

Abstract :

Three techniques are described that improve a previously developed system for automatically synchronizing lyrics with musical audio signals. Although this system achieves state-of-the-art accuracy by extracting vocal vowels from polyphonic sound mixtures and using forced alignment between those vowels and a phoneme network of the lyrics, there was still room for improvement. The first technique detects nonexistence regions in which fricative consonant sounds do not exist, which were not utilized in the previous system, and prohibits the alignment of the fricative phonemes to those regions. The second technique inserts a filler model between phrases of the phoneme network. This model improves the accuracy of the forced alignment by ignoring inter-phrase vowel utterances not included in the lyrics. The third technique introduces novel feature vectors for vocal activity detection that enable a distance calculation between two sets of the harmonic structure without estimating their spectral envelopes. Experimental results showed that all three techniques contribute to improved synchronization.

Keywords :

acoustic signal detection; acoustic signal processing; audio signal processing; feature extraction; music; speech; synchronisation; automatic synchronization; feature vectors; filler model; fricative detection; harmonic structure; music-lyrics synchronization; musical audio signals; phoneme network; polyphonic sound mixtures; singing voices; vocal activity detection; vocal vowel extraction; Automatic speech recognition; Envelope detectors; Frequency synchronization; Multiple signal classification; Music; Power system harmonics; Signal processing; Speech recognition; Vectors; Viterbi algorithm; Filler model; Fricative sounds; Lyrics; Music; Spectral representation;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on

Conference_Location :

Las Vegas, NV

ISSN :

1520-6149

Print_ISBN :

978-1-4244-1483-3

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2008.4517548

Filename :

4517548

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3403852