Multi-view features in a DNN-CRF model for improved sentence unit detection on English broadcast news

Author

Guangpu Huang ; Chenglin Xu ; Xiong Xiao ; Lei Xie ; Eng Siong Chng ; Haizhou Li

Author_Institution

Temasek Labs. @NLU, Singapore, Singapore

fYear

2014

fDate

9-12 Dec. 2014

Firstpage

1

Lastpage

9

Abstract

This paper presents a deep neural network-conditional random field (DNN-CRF) system with multi-view features for sentence unit detection on English broadcast news. We proposed a set of multi-view features extracted from the acoustic, articulatory, and linguistic domains, and used them together in the DNN-CRF model to predict the sentence boundaries. We tested the accuracy of the multi-view features on the standard NIST RT-04 English broadcast news speech data. Experiments show that the best system outperforms the state-of-the-art sentence unit detection system significantly by 13.2% absolute NIST sentence error rate reduction using the reference transcription. However, the performance gain is limited on the recognized transcription partly due to the high word error rate.

Keywords

feature extraction; natural language processing; neural nets; speech recognition; DNN-CRF model; NIST RT-04 english broadcast news speech data; NIST sentence error rate reduction; acoustic domain; articulatory domain; deep neural network conditional random field system; linguistic domain; multiview feature extraction; sentence unit detection improvement; word error rate; Acoustics; Feature extraction; Hidden Markov models; Pragmatics; Speech; Tongue; Training;

fLanguage

English

Publisher

ieee

Conference_Titel

Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)

Conference_Location

Siem Reap

Type

conf

DOI

10.1109/APSIPA.2014.7041543

Filename

7041543