Title :
Keyword extraction for web news documents based on LM-BP neural network
Author :
Xiaohui Liu ; Xin Yan ; Zhengtao Yu ; Guangshun Qin ; Yuanyuan Mo
Author_Institution :
Sch. of Inf. Eng. & Autom., Kunming Univ. of Sci. & Technol., Kunming, China
Abstract :
In view of the actual demand, the paper provides a new idea on keyword extraction for web news documents by adopting the improved LM algorithm based on BP artificial neural network. First, preprocess the web news documents which are of consistent HTML format. The preprocessed work includes noise filter, web content extraction, word segmentation, POS tagging, stop words removal, etc. Also, select effective features like TF, location of words based on the characteristics of news documents. Then the selected features will be considered in training and constructing the BP neural network. Finally, extract keywords with LM algorithm which has parameters adjustment and solves training too long and getting stuck in local minimum of BP so that improve network convergence speed and keyword classification performance. The results show that LM algorithm has better effect and convergence performance comparing with BP in the field of keyword extraction.
Keywords :
Internet; backpropagation; feature extraction; feature selection; neural nets; statistical analysis; text analysis; word processing; HTML format; LM-BP neural network; Levenberg-Marquardt algorithm; POS tagging; Web content extraction; Web news document; feature selection; keyword extraction; machine learning; noise filter; statistics-based method; stop words removal; word segmentation; Approximation algorithms; Biological neural networks; Classification algorithms; Convergence; Feature extraction; Training; BP neural network; Keyword Extraction; Levenberg-Marquardt Algorithm;
Conference_Titel :
Control and Decision Conference (CCDC), 2015 27th Chinese
Conference_Location :
Qingdao
Print_ISBN :
978-1-4799-7016-2
DOI :
10.1109/CCDC.2015.7162346