Singing voice identification and lyrics transcription for music information retrieval invited paper

Author

Mesaros, Annamaria

Author_Institution

Dept. of Signal Process. & Acoust., Aalto Univ., Espoo, Finland

fYear

2013

fDate

16-19 Oct. 2013

Firstpage

1

Lastpage

10

Abstract

This paper presents an overview of methods and applications dealing with analysis of singing voice audio signals, related to singer identity and lyrics content of the singing. Singer identification in polyphonic music is based on general audio classification methods. The presence of instruments is detrimental to voice identification performance, and eliminating the effect of instrumental accompaniment is an important aspect of the prob-lem. The results show that classification of singing voices can be done robustly in polyphonic music when using source separation. Lyrics transcription is approached as a speech recognition prob-lem, with specific elements for dealing with singing voice. The variability of phonation in singing poses a significant challenge to the speech recognition approach. The word recognition accuracy of the lyrics transcription from singing is quite low, but it is shown to be useful in a query-by-singing application, for performing a textual search based on the words recognized from the query. A system for automatic alignment of lyrics and audio is also presented, with sufficient performance for facilitating applications such as automatic karaoke annotation or song browsing.

Keywords

audio signal processing; music; query processing; signal classification; source separation; speech recognition; automatic karaoke annotation; general audio classification methods; instrumental accompaniment; lyrics automatic alignment; lyrics transcription; music information retrieval; phonation variability; polyphonic music; query-by-singing application; singer identity; singing voice audio signal analysis; singing voice classification; singing voice identification; song browsing; source separation; speech recognition problem; textual search; word recognition accuracy; Databases; Hidden Markov models; Instruments; Multiple signal classification; Music; Speech; Speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Speech Technology and Human - Computer Dialogue (SpeD), 2013 7th Conference on

Conference_Location

Cluj-Napoca

Type

conf

DOI

10.1109/SpeD.2013.6682644

Filename

6682644