• DocumentCode
    2067681
  • Title

    Research on the application of page segmentation in information retrieval

  • Author

    Rui, Men ; Yueheng, Sun ; Zheng, Deng ; Weijie, Ni

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Tianjin Univ., Tianjin, China
  • fYear
    2011
  • fDate
    16-18 Dec. 2011
  • Firstpage
    295
  • Lastpage
    298
  • Abstract
    Existing search engines index web pages as a whole and use them for information retrieval, which leads to irrelevant documents being returned to users. This paper proposes a new indexing approach for solving this problem by 1) using VIPS algorithm for page segmentation, 2) filtering out the function blocks through several heuristic rules, 3) clustering feature blocks into different sub-documents and indexing them respectively. For three user queries, the initial results retrieved from Google are compared with the search results of improved indexing system, which shows that our approach gets a higher performance in terms of precision and F-measure.
  • Keywords
    Web sites; document handling; feature extraction; indexing; information retrieval; pattern clustering; search engines; F-measure; Google; VIPS algorithm; feature blocks clustering; function blocks filtering; heuristic rules; indexing system; information retrieval; page segmentation; search engines index Web pages; subdocuments; user queries; Google; Indexing; Particle separators; Search engines; Visualization; Web pages; VIPS algorithm; indexing approach; information retrieval; page segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Transportation, Mechanical, and Electrical Engineering (TMEE), 2011 International Conference on
  • Conference_Location
    Changchun
  • Print_ISBN
    978-1-4577-1700-0
  • Type

    conf

  • DOI
    10.1109/TMEE.2011.6199201
  • Filename
    6199201