• DocumentCode
    653717
  • Title

    Singing voice identification and lyrics transcription for music information retrieval invited paper

  • Author

    Mesaros, Annamaria

  • Author_Institution
    Dept. of Signal Process. & Acoust., Aalto Univ., Espoo, Finland
  • fYear
    2013
  • fDate
    16-19 Oct. 2013
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    This paper presents an overview of methods and applications dealing with analysis of singing voice audio signals, related to singer identity and lyrics content of the singing. Singer identification in polyphonic music is based on general audio classification methods. The presence of instruments is detrimental to voice identification performance, and eliminating the effect of instrumental accompaniment is an important aspect of the prob-lem. The results show that classification of singing voices can be done robustly in polyphonic music when using source separation. Lyrics transcription is approached as a speech recognition prob-lem, with specific elements for dealing with singing voice. The variability of phonation in singing poses a significant challenge to the speech recognition approach. The word recognition accuracy of the lyrics transcription from singing is quite low, but it is shown to be useful in a query-by-singing application, for performing a textual search based on the words recognized from the query. A system for automatic alignment of lyrics and audio is also presented, with sufficient performance for facilitating applications such as automatic karaoke annotation or song browsing.
  • Keywords
    audio signal processing; music; query processing; signal classification; source separation; speech recognition; automatic karaoke annotation; general audio classification methods; instrumental accompaniment; lyrics automatic alignment; lyrics transcription; music information retrieval; phonation variability; polyphonic music; query-by-singing application; singer identity; singing voice audio signal analysis; singing voice classification; singing voice identification; song browsing; source separation; speech recognition problem; textual search; word recognition accuracy; Databases; Hidden Markov models; Instruments; Multiple signal classification; Music; Speech; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Speech Technology and Human - Computer Dialogue (SpeD), 2013 7th Conference on
  • Conference_Location
    Cluj-Napoca
  • Type

    conf

  • DOI
    10.1109/SpeD.2013.6682644
  • Filename
    6682644