DocumentCode
2067681
Title
Research on the application of page segmentation in information retrieval
Author
Rui, Men ; Yueheng, Sun ; Zheng, Deng ; Weijie, Ni
Author_Institution
Sch. of Comput. Sci. & Technol., Tianjin Univ., Tianjin, China
fYear
2011
fDate
16-18 Dec. 2011
Firstpage
295
Lastpage
298
Abstract
Existing search engines index web pages as a whole and use them for information retrieval, which leads to irrelevant documents being returned to users. This paper proposes a new indexing approach for solving this problem by 1) using VIPS algorithm for page segmentation, 2) filtering out the function blocks through several heuristic rules, 3) clustering feature blocks into different sub-documents and indexing them respectively. For three user queries, the initial results retrieved from Google are compared with the search results of improved indexing system, which shows that our approach gets a higher performance in terms of precision and F-measure.
Keywords
Web sites; document handling; feature extraction; indexing; information retrieval; pattern clustering; search engines; F-measure; Google; VIPS algorithm; feature blocks clustering; function blocks filtering; heuristic rules; indexing system; information retrieval; page segmentation; search engines index Web pages; subdocuments; user queries; Google; Indexing; Particle separators; Search engines; Visualization; Web pages; VIPS algorithm; indexing approach; information retrieval; page segmentation;
fLanguage
English
Publisher
ieee
Conference_Titel
Transportation, Mechanical, and Electrical Engineering (TMEE), 2011 International Conference on
Conference_Location
Changchun
Print_ISBN
978-1-4577-1700-0
Type
conf
DOI
10.1109/TMEE.2011.6199201
Filename
6199201
Link To Document