Title :
Exploiting temporal coherence in speech for data-driven feature extraction
Author :
Carlin, Michael A. ; Elhilali, Mounya
Author_Institution :
Center for Language & Speech Process., Johns Hopkins Univ., Baltimore, MD, USA
Abstract :
It is well known that speech sounds evolve at multiple timescales over the course of tens to hundreds of milliseconds. Such temporal modulations are crucial for speech perception and are believed to directly influence the underlying code for representing acoustic stimuli. The present work seeks to explicitly quantify this relationship using the principle of temporal coherence. Here we show that by constraining the outputs of model linear neurons to be highly correlated over timescales relevant to speech, we observe the emergence of neural response fields that are bandpass, localized, and reflective of the rich spectro-temporal structure present in speech. The emergent response fields also appear to share qualitative similarities those observed in auditory neurophysiology. Importantly, learning is accomplished using unlabeled speech data, and the emergent neural properties well-characterize the spectro-temporal statistics of the input. We analyze the characteristics and coverage of ensembles of learned response fields for a variety of timescales, and suggest uses of such a coherence learning framework for common speech tasks.
Keywords :
coherence; data handling; feature extraction; hearing; neurophysiology; speech processing; statistical analysis; acoustic stimuli; auditory neurophysiology; coherence learning; data driven feature extraction; emergent response field; linear neuron; neural response field; spectro-temporal statistics; spectro-temporal structure; speech perception; speech sound; temporal coherence; Electronic mail; Writing;
Conference_Titel :
Information Sciences and Systems (CISS), 2011 45th Annual Conference on
Conference_Location :
Baltimore, MD
Print_ISBN :
978-1-4244-9846-8
Electronic_ISBN :
978-1-4244-9847-5
DOI :
10.1109/CISS.2011.5766159