• DocumentCode
    2889135
  • Title

    Design and Implementation of Web Hot-Topic Talk Mining Based on Scale-Free Network

  • Author

    Qin, Sen ; Dai, Guan-Zhong ; Li, Yan-ling

  • Author_Institution
    Coll. of Autom., Northwestern Polytech. Univ., Xi´´an
  • fYear
    2006
  • fDate
    13-16 Aug. 2006
  • Firstpage
    1184
  • Lastpage
    1189
  • Abstract
    Data mining of Web hot-topic talks is one of the important branches on the Web text mining. In traditional data mining system of the Web hot-topic talks, it is assumed that the importance of each Web page is equal. However, complex network composed of the Web hot-topic talks is not a homogeneous network because of having the scale-free characteristic on the Internet. So the assumption above is not reasonable for this network. In this paper the topology of the complex network is analyzed and shown that the network has the scale-free characteristic firstly. Then the mining system based on the scale-free topology is designed, and some main modules are introduced. The workflow of this system is presented, and the implementations of two core modules, which are the analysis module of sites´ topology and the distributing proportion to these Web-pages, are proposed in detail. Finally, the merits and shortcomings of this system are concluded, and this paper is summarized
  • Keywords
    Internet; data mining; Internet; Web hot-topic talk mining; Web page; Web text mining; data mining system; scale-free network; Automation; Complex networks; Cybernetics; Data mining; Design automation; Educational institutions; Electronic mail; IP networks; Internet; Machine learning; Network topology; Text mining; Web pages; World Wide Web; Web hot-topic talk; data mining; power-law distribution; scale-free network;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2006 International Conference on
  • Conference_Location
    Dalian, China
  • Print_ISBN
    1-4244-0061-9
  • Type

    conf

  • DOI
    10.1109/ICMLC.2006.258602
  • Filename
    4028243