Title :
Forensically inspired approaches to automatic speaker recognition
Author :
Han, K.J. ; Omar, M.K. ; Pelecanos, J. ; Pendus, C. ; Yaman, S. ; Zhu, W.
Author_Institution :
IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
Abstract :
This paper presents ongoing research leveraging forensic methods for automatic speaker recognition. Some of the methods forensic scientists employ include identifying speaker distinctive audio segments and comparing these segments using features such as pitch, formant, and other information. Other approaches have also involved performing a phonetic analysis to recognize idiolectal attributes, and an implicit analysis of the demographics of speakers. Inspired by these forensic phonetic approaches, we target three threads of work; hot-spot analysis, speaker style and pronunciation modelling, and demographics analysis. As a result of this work we show that a phonetic analysis conditioned on select speech events (or hot-spots) can outperform a phonetic analysis performed over all speech without conditioning. In the area of pronunciation modelling, one set of results demonstrate significantly improved robustness by exploiting phonetic structure in an automatic speech recognition system. For demographics analysis, we present state-of-the-art results of systems capable of detecting dialect, non-nativeness and native language.
Keywords :
speech recognition; automatic speaker recognition; demographics analysis; forensic methods; forensic phonetic approach; hot-spot analysis; pronunciation modelling; speaker distinctive audio segment identification; Forensics; Hidden Markov models; NIST; Speaker recognition; Speech; Speech recognition; Training; Forensics; demographics; hot-spot; pronunciation modelling; speaker verification;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location :
Prague
Print_ISBN :
978-1-4577-0538-0
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2011.5947519