• DocumentCode
    2799366
  • Title

    Recognition of phonemes and words in singing

  • Author

    Mesaros, Annamaria ; Virtanen, Tuomas

  • Author_Institution
    Dept. of Signal Process., Tampere Univ. of Technol., Tampere, Finland
  • fYear
    2010
  • fDate
    14-19 March 2010
  • Firstpage
    2146
  • Lastpage
    2149
  • Abstract
    This paper studies the influence of n-gram language models in the recognition of sung phonemes and words. We train uni-, bi-, and trigram language models for phonemes and bi- and trigrams for words. The word-level language model is estimated from a textual lyrics database. In the recognition we use a hidden Markov model based phonetic recognizer adapted to singing voice. The models were tested on monophonic singing and on vocal lines separated from polyphonic music. On clean singing the phoneme recognition accuracies varied from 20% (no language model) to 39% (bigram) and on polyphonic music from 6% (no language model) to 20% (bigram). In word recognition, one fifth of the words were recognized in clean singing, the performance being lower on polyphonic music. We study the use of the recognition results in a query-by-singing application. Using the recognized words, we retrieve the songs by searching for the text in a text lyrics database. For the word recognition system having only 24% correct recognition rate, the first retrieved song is correct in 57% of the test cases.
  • Keywords
    Markov processes; grammars; information analysis; information retrieval systems; musical acoustics; speech recognition; Markov model; information analysis; n-gram language model; phoneme recognition; phonetic recognizer; polyphonic music; query-by-singing application; singing voice; song retrieval; speech recognition; textual lyrics database; word recognition; word-level language model; Automatic speech recognition; Databases; Hidden Markov models; Information analysis; Music information retrieval; Natural languages; Signal processing; Speech recognition; System testing; Text recognition; query-by-singing; singing recognition; speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-4295-9
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2010.5495585
  • Filename
    5495585