DocumentCode
3309450
Title
An improved Naive Bayesian algorithm for Web page text classification
Author
He Youquan ; Xie Jianfang ; Xu Cheng
Author_Institution
Inf. Sci. & Eng. Dept., Chongqing Jiaotong Univ., Chongqing, China
Volume
3
fYear
2011
fDate
26-28 July 2011
Firstpage
1765
Lastpage
1768
Abstract
This paper studies the process and methods of text classification. Based on Naive Bayesian algorithm and the semi-structured feature in Web page information, this paper proposes an improved Algorithm for Web page text Information classification which utilizes Html tag Information in classification. Experiments show that this algorithm is feasible and effective and can apply to information extraction in topic search engine, which can enhance the theme fitness of the search results and further improve the searching efficiency.
Keywords
Bayes methods; Web sites; information retrieval; pattern classification; search engines; text analysis; HTML tag information; Naive Bayesian algorithm; Web page text Information classification; information extraction; search engine; semistructured feature; Accuracy; Algorithm design and analysis; Bayesian methods; Classification algorithms; Text categorization; Web pages; Naive Bayesian; Text classification; Web page;
fLanguage
English
Publisher
ieee
Conference_Titel
Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on
Conference_Location
Shanghai
Print_ISBN
978-1-61284-180-9
Type
conf
DOI
10.1109/FSKD.2011.6019801
Filename
6019801
Link To Document