• DocumentCode
    3309450
  • Title

    An improved Naive Bayesian algorithm for Web page text classification

  • Author

    He Youquan ; Xie Jianfang ; Xu Cheng

  • Author_Institution
    Inf. Sci. & Eng. Dept., Chongqing Jiaotong Univ., Chongqing, China
  • Volume
    3
  • fYear
    2011
  • fDate
    26-28 July 2011
  • Firstpage
    1765
  • Lastpage
    1768
  • Abstract
    This paper studies the process and methods of text classification. Based on Naive Bayesian algorithm and the semi-structured feature in Web page information, this paper proposes an improved Algorithm for Web page text Information classification which utilizes Html tag Information in classification. Experiments show that this algorithm is feasible and effective and can apply to information extraction in topic search engine, which can enhance the theme fitness of the search results and further improve the searching efficiency.
  • Keywords
    Bayes methods; Web sites; information retrieval; pattern classification; search engines; text analysis; HTML tag information; Naive Bayesian algorithm; Web page text Information classification; information extraction; search engine; semistructured feature; Accuracy; Algorithm design and analysis; Bayesian methods; Classification algorithms; Text categorization; Web pages; Naive Bayesian; Text classification; Web page;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-61284-180-9
  • Type

    conf

  • DOI
    10.1109/FSKD.2011.6019801
  • Filename
    6019801