Title :
SiPTH: Singing Transcription Based on Hysteresis Defined on the Pitch-Time Curve
Author :
Molina, Emilio ; Tardon, Lorenzo J. ; Barbancho, Ana M. ; Barbancho, Isabel
Author_Institution :
ATIC Res. Group, Univ. de Malaga, Malaga, Spain
Abstract :
In this paper, we present a method for monophonic singing transcription based on hysteresis defined on the pitch-time curve. This method is designed to perform note segmentation even when the pitch evolution during the same note behaves unstably, as in the case of untrained singers. The selected approach estimates the regions in which the chroma is stable, these regions are classified as voiced or unvoiced according to a decision tree classifier using two descriptors based on aperiodicity and power. Then, a note segmentation stage based on pitch intervals of the sung signal is carried out. To this end, a dynamic averaging of the pitch curve is performed after the beginning of a note is detected in order to roughly estimate the pitch. Deviations of the actual pitch curve with respect to this average are measured to determine the next note change according to a hysteresis process defined on the pitch-time curve. Finally, each note is labeled using three single values: rounded pitch (to semitones), duration and volume. Also, a complete evaluation methodology that includes the definition of different relevant types of errors, measures and a method for the computation of the evaluation measures are presented. The proposed system improves significantly the performance of the baseline approach, and attains results similar to previous approaches.
Keywords :
acoustic signal processing; decision trees; hysteresis; SiPTH; decision tree classifier; hysteresis process; monophonic singing transcription; note segmentation; pitch curve; pitch evolution; pitch intervals; pitch-time curve; Decision trees; Feature extraction; Hysteresis; Indexes; Labeling; Speech; Speech processing; Acoustic signal processing; fundamental frequency; pitch; singing transcription; singing voice analysis;
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
DOI :
10.1109/TASLP.2014.2331102