DocumentCode
2646466
Title
A Simple Approach to Unsupervised Speaker Indexing
Author
Ofoegbu, Uchechukwu O. ; Iyer, Ananth N. ; Yantorno, Robert E. ; Smolenski, Brett Y
Author_Institution
Lab. of Signal Process., Temple Univ., Philadelphia, PA
fYear
2006
fDate
12-15 Dec. 2006
Firstpage
339
Lastpage
342
Abstract
Unsupervised speaker indexing is a rapidly developing field in speech processing, which involves determining who is speaking when, without having prior knowledge about the speakers being observed. In this research, a distance-based technique for indexing telephone conversations is presented. Sub-models are formed (using data of approximately equal sizes) from the conversations, from which two references models are judiciously chosen such that they represent the two different speakers in the conversation. Models are then matched to the reference speakers based on a technique referred to as the restrained-relative minimum distance (RRMD) approach. Some models, which fail to meet the RRMD criteria, are considered "undecided" and left unmatched with either of the reference speakers. Analysis is made to determine the appropriate size (or length of data to be used) for these models, which are formed using cepstral coefficients of the speech data. The T-square statistic is used for speaker differentiation. Evaluation is performed based on the indexing accuracy as well as the amount of undecided speech obtained. The proposed system was able to yield a minimum indexing error of about 9% with a maximum undecided error of 18.5% , and an equal error rate of 11% on 245 files (with an average length of about 400 seconds each) from the SWITCHBOARD database
Keywords
speaker recognition; speech processing; T-square statistics; distance-based technique; restrained-relative minimum distance approach; speaker differentiation; speech data cepstral coefficients; speech processing; telephone conversation indexing; unsupervised speaker indexing; Cepstral analysis; Indexing; Laboratories; Predictive models; Signal processing; Speech analysis; Speech processing; Statistics; Telephony; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Signal Processing and Communications, 2006. ISPACS '06. International Symposium on
Conference_Location
Yonago
Print_ISBN
0-7803-9732-0
Electronic_ISBN
0-7803-9733-9
Type
conf
DOI
10.1109/ISPACS.2006.364901
Filename
4212288
Link To Document