Title :
Pruning dependency trees in a syntactic information retrieval model
Author :
Liu, Chang ; Wang, Hui ; Mcclean, Sally ; Liu, Jun ; Wu, Shengli
Author_Institution :
Sch. of Comput. & Math, Univ. of Ulster, Newtownabbey
Abstract :
Natural language processing (NLP) techniques are believe to have the potential to aid information retrieval (IR) in terms of retrieval accuracy. In previous work, we report a proof of concept study on a new approach to NLP-based IR, proposed as a syntactic IR model (SIR). In SIR, Documents and queries are represented on the basis of syntactic parse trees, which are generated by a natural language parser. Based on this tree structured representation of documents and queries, the matching between a document and a query is executed on their tree representations, with tree comparison as the key operation. In this paper, we extend the dataset of the IR experiment for testing SIR into full documents. Additionally, the raw parse trees output by the parser are pruned before being fed into the indexing process of SIR. This operation is necessary and has the similar role of the stop words list strategy in the term based IR index construction. Experimental results show that retrieval accuracy is improved if the pruning operation is applied.
Keywords :
computational linguistics; grammars; natural language processing; query processing; text analysis; tree data structures; trees (mathematics); document representation; indexing process; natural language parser; natural language text processing; pruning dependency tree; query processing; syntactic information retrieval model; Indexing; Information retrieval; Mathematics; Natural language processing; Natural languages; Optical computing; Speech; Testing; Tree data structures; Tree graphs;
Conference_Titel :
Audio, Language and Image Processing, 2008. ICALIP 2008. International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-1723-0
Electronic_ISBN :
978-1-4244-1724-7
DOI :
10.1109/ICALIP.2008.4590035