Title :
Efficient audio stream segmentation via the combined T2 statistic and Bayesian information criterion
Author :
Zhou, Bowen ; Hansen, John H L
fDate :
7/1/2005 12:00:00 AM
Abstract :
In many speech and audio applications, it is first necessary to partition and classify acoustic events prior to voice coding for communication or speech recognition for spoken document retrieval. In this paper, we propose an efficient approach for unsupervised audio stream segmentation and clustering via the Bayesian Information Criterion (BIC). The proposed method extends an earlier formulation by Chen and Gopalakrishnan. In our formulation, Hotelling´s T2-Statistic is used to pre-select candidate segmentation boundaries followed by BIC to perform the segmentation decision. The proposed algorithm also incorporates a variable-size increasing window scheme and a skip-frame test. Our experiments show that we can improve the final algorithm speed by a factor of 100 compared to that in Chen and Gopalakrishnan´s while achieving a 6.7% reduction in the acoustic boundary miss rate at the expense of a 5.7% increase in false alarm rate using DARPA Hub4 1997 evaluation data. The approach is particularly successful for short segment turns of less than 2 s in duration. The results suggest that the proposed algorithm is sufficiently effective and efficient for audio stream segmentation applications.
Keywords :
Bayes methods; audio signal processing; statistical analysis; Bayesian information criterion; acoustic events classification; speech recognition; spoken document retrieval; unsupervised audio stream segmentation; voice coding; Acoustic applications; Bayesian methods; Clustering algorithms; Information retrieval; Loudspeakers; Robustness; Speech coding; Speech recognition; Statistics; Streaming media; Audio segmentation; Bayesian information criterion; Hotelling´s; spoken document retrieval;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on
DOI :
10.1109/TSA.2005.845790