Title :
Automatic key term extraction from spoken course lectures using branching entropy and prosodic/semantic features
Author :
Chen, Yun-nung ; Huang, Yu ; Kong, Sheng-Yi ; Lee, Lin-shan
Author_Institution :
Grad. Inst. of Comput. Sci. & Inf. Eng., Nat. Taiwan Univ., Taipei, Taiwan
Abstract :
This paper proposes a set of approaches to automatically extract key terms from spoken course lectures including audio signals, ASR transcriptions and slides. We divide the key terms into two types: key phrases and keywords and develop different approaches to extract them in order. We extract key phrases using right/left branching entropy and extract keywords by learning from three sets of features: prosodic features, lexical features and semantic features from Probabilistic Latent Semantic Analysis (PLSA). The learning approaches include an unsupervised method (K-means exemplar) and two supervised ones (AdaBoost and neural network). Very encouraging preliminary results were obtained with a corpus of course lectures, and it is found that all approaches and all sets of features proposed here are useful.
Keywords :
audio signal processing; entropy; learning (artificial intelligence); neural nets; probability; speech recognition; ASR transcriptions; AdaBoost; audio signals; automatic key term extraction; branching entropy; key phrase extraction; keyword extraction; lexical features; neural network; probabilistic latent semantic analysis; prosodic features; semantic features; spoken course lectures; unsupervised method; K-means; PAT tree; Probabilistic Latent Semantic Analysis (PLSA); course lectures; entropy; key phrase extraction; keyword extraction; machine learning; prosody;
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2010 IEEE
Conference_Location :
Berkeley, CA
Print_ISBN :
978-1-4244-7904-7
Electronic_ISBN :
978-1-4244-7902-3
DOI :
10.1109/SLT.2010.5700862