DocumentCode :
2260571
Title :
Searching semantically similar questions from a large community-based question archive
Author :
Liu, Mingrong ; Liu, Yicen ; Yang, Qing
Author_Institution :
Nat. Lab. of Pattern Recognition, Chinese Acad. of Sci., Beijing, China
fYear :
2009
fDate :
24-27 Sept. 2009
Firstpage :
1
Lastpage :
8
Abstract :
This paper provides a novel and totally statistical method to search similar questions from a large question archive for a given queried question. Firstly, a word relevance model is trained based on the whole question archive which is made up of millions of natural language questions proposed by users on the Web. The word relevance model is utilized to find most semantically related words to a specific word. Secondly, in order to find semantically similar questions for a queried question, each non-stop word in a question is expanded with the help of word relevance model and represented as a word vector. Elements of the vector include the word itself and some semantically related words to it. Elements of the word vector are weighted by combining both classical IR term weighting method and word transformation probability learned from the relevance model. Then the question is mapped to a question vector as the normalized center of the word vectors representing these words contained in it. The problem of question retrieval can be solved by comparing the similarity between question vectors. The method is actually a simple question expansion based Kernel approach. Experimental results indicate the proposed method outperforms the baseline methods such as Vector Space Model (VSM) and Language Model for Information Retrieval (LMIR).
Keywords :
information needs; query formulation; statistical analysis; classical IR term weighting method; community based question archive; language model; large question archive; natural language question; question expansion based Kernel approach; semantically related word; similar question searching; statistical method; vector space model; word relevance model; word transformation probability; word vector; Automation; Databases; Information retrieval; Kernel; Laboratories; Natural languages; Pattern recognition; Search engines; Statistical analysis; Web search;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
Conference_Location :
Dalian
Print_ISBN :
978-1-4244-4538-7
Electronic_ISBN :
978-1-4244-4540-0
Type :
conf
DOI :
10.1109/NLPKE.2009.5313808
Filename :
5313808
Link To Document :
بازگشت