• DocumentCode
    1845010
  • Title

    Detecting the content related parts of Web pages

  • Author

    Li, Yong ; Gong, Zhiguo ; Qi, Ke

  • Author_Institution
    Fac. of Sci. & Technol., Macau Univ., Macau
  • Volume
    2
  • fYear
    2005
  • fDate
    13-15 June 2005
  • Firstpage
    1071
  • Abstract
    Many Web pages are semantic diverse. That is, the whole content of a Web page is not consistent to address one topic. However, current search engines are page-oriented (other than topic-oriented). But, most Web users retrieve their target information by topics. Therefore, how to partition Web pages by semantics is one of interesting research topics. In this paper, we firstly build a tree (called semantic tree, ST) to partition the Web page into the content parts (called semantic part, SP) based on the Web page tags. Then we analyze the characteristics of the words (or terms) appearing on the Web page in order to build a term weighting formula. Based on these term weight values we employ the similarity formula to calculate the semantic similar degree between each two SPs. Finally, we consider the balance point of precision and recall as the reference value of the similarity - threshold. Through the work above we can find the content-related parts (or segmentations) of a Web page. And we achieved a satisfied result.
  • Keywords
    Web sites; content management; data mining; information analysis; information retrieval; semantic Web; semantic networks; Web mining; Web page partitioning; Web page tags; Web pages; Web sites; content management; content related parts; data mining; information analysis; semantic Web; semantic networks; semantic tree; Data mining; Feature extraction; HTML; Information retrieval; Java; Packaging; Search engines; Systems engineering and theory; Web mining; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Services Systems and Services Management, 2005. Proceedings of ICSSSM '05. 2005 International Conference on
  • Print_ISBN
    0-7803-8971-9
  • Type

    conf

  • DOI
    10.1109/ICSSSM.2005.1500159
  • Filename
    1500159