• DocumentCode
    1872787
  • Title

    Multidimensional humming transcription using a statistical approach for query by humming systems

  • Author

    Shih, Hsuan-Huei ; Narayanan, Shrikanth S. ; Kuo, C. C Jay

  • Author_Institution
    Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
  • Volume
    3
  • fYear
    2003
  • fDate
    6-9 July 2003
  • Abstract
    A new statistical pattern recognition approach applied to human humming transcription is proposed in this research. A music note has two important attributes, i.e. pitch and duration. The proposed algorithm generates multidimensional humming transcriptions, which contain both pitch and duration information. Query by humming provides a natural means for content-based retrieval from music databases, and this research provides a robust front-end for such application. The segment of a note in the humming waveform is modeled by a hidden Markov model (HMM) while the pitch of the note is modeled by a pitch model using a Gaussian mixture model. Preliminary real-time recognition experiments are carried out with models trained by data obtained from eight human objects, and an overall correct recognition rate of around 80% is demonstrated.
  • Keywords
    Gaussian processes; content-based retrieval; hidden Markov models; learning (artificial intelligence); speech recognition; statistical analysis; Gaussian mixture model; content-based retrieval; duration information; hidden Markov model; human humming transcription; humming systems; humming waveform; multidimensional humming transcription; music databases; music note; pitch information; pitch model; real-time recognition; statistical pattern recognition; Content based retrieval; Decoding; Hidden Markov models; Humans; Indexing; Multidimensional systems; Multimedia databases; Music information retrieval; Robustness; Signal processing algorithms;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on
  • Print_ISBN
    0-7803-7965-9
  • Type

    conf

  • DOI
    10.1109/ICME.2003.1221329
  • Filename
    1221329