SiPTH: Singing Transcription Based on Hysteresis Defined on the Pitch-Time Curve

Author

Molina, Emilio ; Tardon, Lorenzo J. ; Barbancho, Ana M. ; Barbancho, Isabel

Author_Institution

ATIC Res. Group, Univ. de Malaga, Malaga, Spain

Volume

23

Issue

2

fYear

2015

fDate

Feb. 2015

Firstpage

252

Lastpage

263

Abstract

In this paper, we present a method for monophonic singing transcription based on hysteresis defined on the pitch-time curve. This method is designed to perform note segmentation even when the pitch evolution during the same note behaves unstably, as in the case of untrained singers. The selected approach estimates the regions in which the chroma is stable, these regions are classified as voiced or unvoiced according to a decision tree classifier using two descriptors based on aperiodicity and power. Then, a note segmentation stage based on pitch intervals of the sung signal is carried out. To this end, a dynamic averaging of the pitch curve is performed after the beginning of a note is detected in order to roughly estimate the pitch. Deviations of the actual pitch curve with respect to this average are measured to determine the next note change according to a hysteresis process defined on the pitch-time curve. Finally, each note is labeled using three single values: rounded pitch (to semitones), duration and volume. Also, a complete evaluation methodology that includes the definition of different relevant types of errors, measures and a method for the computation of the evaluation measures are presented. The proposed system improves significantly the performance of the baseline approach, and attains results similar to previous approaches.

Keywords

acoustic signal processing; decision trees; hysteresis; SiPTH; decision tree classifier; hysteresis process; monophonic singing transcription; note segmentation; pitch curve; pitch evolution; pitch intervals; pitch-time curve; Decision trees; Feature extraction; Hysteresis; Indexes; Labeling; Speech; Speech processing; Acoustic signal processing; fundamental frequency; pitch; singing transcription; singing voice analysis;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE/ACM Transactions on

Publisher

ieee

ISSN

2329-9290

Type

jour

DOI

10.1109/TASLP.2014.2331102

Filename

6837431