• DocumentCode
    2582998
  • Title

    Development of a framework for sub-topic discovery from the Web

  • Author

    Uluhan, Eray ; Badur, Bertan

  • Author_Institution
    Manage. Inf. Syst. Dept., Bogazici Univ., Istanbul
  • fYear
    2008
  • fDate
    27-31 July 2008
  • Firstpage
    878
  • Lastpage
    888
  • Abstract
    The motivation behind sub-topic or topic specific keyword discovery through Web pages is helping a user, who is insufficient in knowledge and experience about a topic, to find important concepts without much effort. Intuitively, a Web user would start searching the Web via querying search engines, visiting some pages, and spending a lot of time on deciding what is important about the topic and what is not. In this study, we try to mine important sub-topics or key concepts of a given topic automatically, through HTML based Web pages. Starting with a search query, the system gathers top-ranking pages returned from a search engine; and selects informative pages among them. These pages are processed further for extracting important phrases and then applied data mining techniques on these phrases to find candidate sub-topics. Each candidate phrase is given scores based on its relevance with the search query over the Web space. Using the proposed technique, the user should be able to quickly learn sub-topics or key concepts about a topic without going through the ordeal of browsing a large number of non-informative pages returned by the search engine.
  • Keywords
    Internet; data mining; search engines; HTML based Web pages; World Wide Web; data mining; querying search engine; topic specific keyword discovery; Africa; Cities and towns; Data mining; HTML; Indexing; Information retrieval; Internet; Search engines; Web mining; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Management of Engineering & Technology, 2008. PICMET 2008. Portland International Conference on
  • Conference_Location
    Cape Town
  • Print_ISBN
    978-1-890843-17-5
  • Electronic_ISBN
    978-1-890843-18-2
  • Type

    conf

  • DOI
    10.1109/PICMET.2008.4599696
  • Filename
    4599696