DocumentCode :
2067681
Title :
Research on the application of page segmentation in information retrieval
Author :
Rui, Men ; Yueheng, Sun ; Zheng, Deng ; Weijie, Ni
Author_Institution :
Sch. of Comput. Sci. & Technol., Tianjin Univ., Tianjin, China
fYear :
2011
fDate :
16-18 Dec. 2011
Firstpage :
295
Lastpage :
298
Abstract :
Existing search engines index web pages as a whole and use them for information retrieval, which leads to irrelevant documents being returned to users. This paper proposes a new indexing approach for solving this problem by 1) using VIPS algorithm for page segmentation, 2) filtering out the function blocks through several heuristic rules, 3) clustering feature blocks into different sub-documents and indexing them respectively. For three user queries, the initial results retrieved from Google are compared with the search results of improved indexing system, which shows that our approach gets a higher performance in terms of precision and F-measure.
Keywords :
Web sites; document handling; feature extraction; indexing; information retrieval; pattern clustering; search engines; F-measure; Google; VIPS algorithm; feature blocks clustering; function blocks filtering; heuristic rules; indexing system; information retrieval; page segmentation; search engines index Web pages; subdocuments; user queries; Google; Indexing; Particle separators; Search engines; Visualization; Web pages; VIPS algorithm; indexing approach; information retrieval; page segmentation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Transportation, Mechanical, and Electrical Engineering (TMEE), 2011 International Conference on
Conference_Location :
Changchun
Print_ISBN :
978-1-4577-1700-0
Type :
conf
DOI :
10.1109/TMEE.2011.6199201
Filename :
6199201
Link To Document :
بازگشت