FAQ Extracting and Domain Filtering Based on Improved Bayes

Author

Yu, Zhengtao ; Zong, Huanyun ; Xu, Yangbo ; Guo, Jianyi ; Mao, Yu ; Meng, Xiangyan

Author_Institution

Sch. of Inf. Eng. & Autom., Kunming Univ. of Sci. & Technol., Kunming, China

fYear

2009

fDate

7-8 Nov. 2009

Firstpage

108

Lastpage

112

Abstract

FAQ (frequently asked questions) is the basis of question answering system (QA) that oriented frequently asked questions database. For the FAQ is difficult to collect and organize, this paper proposed an automatic acquisition method of domain FAQ based on improved Bayes. Parsing HTML pages into DOM tree, combining with the restricted domain knowledge base, extracting the node information and structural characteristics of DOM tree as the classified feature, using the improved Bayesian classified learning algorithm, constructing the classification model, acquiring FAQ from the HTML page automatically and filtering out the domain FAQ , the experimental results of this method show that it has a remarkable effect.

Keywords

Bayes methods; database management systems; information filtering; learning (artificial intelligence); automatic acquisition method; domain knowledge base; frequently asked questions database; improved Bayesian classified learning algorithm; node information; question answering system; structural characteristics; Classification tree analysis; Data engineering; Data mining; Databases; HTML; Information filtering; Information filters; Information systems; Internet; Space technology; FAQ Domain Filtering; FAQ Extracting; Improved Bayes; Question Answering Syste; Restricted domain;

fLanguage

English

Publisher

ieee

Conference_Titel

Web Information Systems and Mining, 2009. WISM 2009. International Conference on

Conference_Location

Shanghai

Print_ISBN

978-0-7695-3817-4

Type

conf

DOI

10.1109/WISM.2009.30

Filename

5368164