• DocumentCode
    1400930
  • Title

    Multiple Fundamental Frequency Estimation by Modeling Spectral Peaks and Non-Peak Regions

  • Author

    Duan, Zhiyao ; Pardo, Bryan ; Zhang, Changshui

  • Author_Institution
    Dept. of Electr. Eng. & Comput. Sci., Northwestern Univ., Evanston, IL, USA
  • Volume
    18
  • Issue
    8
  • fYear
    2010
  • Firstpage
    2121
  • Lastpage
    2133
  • Abstract
    This paper presents a maximum-likelihood approach to multiple fundamental frequency (F0) estimation for a mixture of harmonic sound sources, where the power spectrum of a time frame is the observation and the F0s are the parameters to be estimated. When defining the likelihood model, the proposed method models both spectral peaks and non-peak regions (frequencies further than a musical quarter tone from all observed peaks). It is shown that the peak likelihood and the non-peak region likelihood act as a complementary pair. The former helps find F0s that have harmonics that explain peaks, while the latter helps avoid F0s that have harmonics in non-peak regions. Parameters of these models are learned from monophonic and polyphonic training data. This paper proposes an iterative greedy search strategy to estimate F0s one by one, to avoid the combinatorial problem of concurrent F0 estimation. It also proposes a polyphony estimation method to terminate the iterative process. Finally, this paper proposes a postprocessing method to refine polyphony and F0 estimates using neighboring frames. This paper also analyzes the relative contributions of different components of the proposed method. It is shown that the refinement component eliminates many inconsistent estimation errors. Evaluations are done on ten recorded four-part J. S. Bach chorales. Results show that the proposed method shows superior F0 estimation and polyphony estimation compared to two state-of-the-art algorithms.
  • Keywords
    frequency estimation; greedy algorithms; iterative methods; maximum likelihood detection; maximum likelihood estimation; signal processing; four-part J. S. Bach chorales; harmonic sound source; iterative greedy search strategy; maximum-likelihood approach; monophonic training data; multiple fundamental frequency estimation; nonpeak regions; polyphonic training data; polyphony estimation; power spectrum; spectral peaks; Frequency estimation; Humans; Maximum likelihood estimation; Multiple signal classification; Music; Parameter estimation; Power harmonic filters; Power system harmonics; State estimation; Training data; Fundamental frequency; maximum likelihood; pitch estimation; spectral peaks;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2010.2042119
  • Filename
    5404324