Combining Rhythm-Based and Pitch-Based Methods for Background and Melody Separation

Author

Rafii, Zafar ; Zhiyao Duan ; Pardo, Bryan

Author_Institution

Dept. of Electr. Eng. & Comput. Sci., Northwestern Univ., Evanston, IL, USA

Volume

22

Issue

12

fYear

2014

fDate

Dec. 2014

Firstpage

1884

Lastpage

1893

Abstract

Musical works are often composed of two characteristic components: the background (typically the musical accompaniment), which generally exhibits a strong rhythmic structure with distinctive repeating time elements, and the melody (typically the singing voice or a solo instrument), which generally exhibits a strong harmonic structure with a distinctive predominant pitch contour. Drawing from findings in cognitive psychology, we propose to investigate the simple combination of two dedicated approaches for separating those two components: a rhythm-based method that focuses on extracting the background via a rhythmic mask derived from identifying the repeating time elements in the mixture and a pitch-based method that focuses on extracting the melody via a harmonic mask derived from identifying the predominant pitch contour in the mixture. Evaluation on a data set of song clips showed that combining such two contrasting yet complementary methods can help to improve separation performance-from the point of view of both components-compared with using only one of those methods, and also compared with two other state-of-the-art approaches.

Keywords

acoustic signal processing; audio signal processing; cognitive systems; music; background separation; cognitive psychology; distinctive predominant pitch contour; distinctive repeating time elements; harmonic mask; melody separation; musical accompaniment; pitch-based methods; rhythm-based methods; rhythmic mask; strong harmonic structure; strong rhythmic structure; Harmonic analysis; Psychology; Rhythm; Source separation; Spectrogram; Speech; Speech processing; Background; melody; pitch; rhythm; separation;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE/ACM Transactions on

Publisher

ieee

ISSN

2329-9290

Type

jour

DOI

10.1109/TASLP.2014.2354242

Filename

6891207