DocumentCode
53685
Title
Combining Rhythm-Based and Pitch-Based Methods for Background and Melody Separation
Author
Rafii, Zafar ; Zhiyao Duan ; Pardo, Bryan
Author_Institution
Dept. of Electr. Eng. & Comput. Sci., Northwestern Univ., Evanston, IL, USA
Volume
22
Issue
12
fYear
2014
fDate
Dec. 2014
Firstpage
1884
Lastpage
1893
Abstract
Musical works are often composed of two characteristic components: the background (typically the musical accompaniment), which generally exhibits a strong rhythmic structure with distinctive repeating time elements, and the melody (typically the singing voice or a solo instrument), which generally exhibits a strong harmonic structure with a distinctive predominant pitch contour. Drawing from findings in cognitive psychology, we propose to investigate the simple combination of two dedicated approaches for separating those two components: a rhythm-based method that focuses on extracting the background via a rhythmic mask derived from identifying the repeating time elements in the mixture and a pitch-based method that focuses on extracting the melody via a harmonic mask derived from identifying the predominant pitch contour in the mixture. Evaluation on a data set of song clips showed that combining such two contrasting yet complementary methods can help to improve separation performance-from the point of view of both components-compared with using only one of those methods, and also compared with two other state-of-the-art approaches.
Keywords
acoustic signal processing; audio signal processing; cognitive systems; music; background separation; cognitive psychology; distinctive predominant pitch contour; distinctive repeating time elements; harmonic mask; melody separation; musical accompaniment; pitch-based methods; rhythm-based methods; rhythmic mask; strong harmonic structure; strong rhythmic structure; Harmonic analysis; Psychology; Rhythm; Source separation; Spectrogram; Speech; Speech processing; Background; melody; pitch; rhythm; separation;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher
ieee
ISSN
2329-9290
Type
jour
DOI
10.1109/TASLP.2014.2354242
Filename
6891207
Link To Document