DocumentCode
2582998
Title
Development of a framework for sub-topic discovery from the Web
Author
Uluhan, Eray ; Badur, Bertan
Author_Institution
Manage. Inf. Syst. Dept., Bogazici Univ., Istanbul
fYear
2008
fDate
27-31 July 2008
Firstpage
878
Lastpage
888
Abstract
The motivation behind sub-topic or topic specific keyword discovery through Web pages is helping a user, who is insufficient in knowledge and experience about a topic, to find important concepts without much effort. Intuitively, a Web user would start searching the Web via querying search engines, visiting some pages, and spending a lot of time on deciding what is important about the topic and what is not. In this study, we try to mine important sub-topics or key concepts of a given topic automatically, through HTML based Web pages. Starting with a search query, the system gathers top-ranking pages returned from a search engine; and selects informative pages among them. These pages are processed further for extracting important phrases and then applied data mining techniques on these phrases to find candidate sub-topics. Each candidate phrase is given scores based on its relevance with the search query over the Web space. Using the proposed technique, the user should be able to quickly learn sub-topics or key concepts about a topic without going through the ordeal of browsing a large number of non-informative pages returned by the search engine.
Keywords
Internet; data mining; search engines; HTML based Web pages; World Wide Web; data mining; querying search engine; topic specific keyword discovery; Africa; Cities and towns; Data mining; HTML; Indexing; Information retrieval; Internet; Search engines; Web mining; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Management of Engineering & Technology, 2008. PICMET 2008. Portland International Conference on
Conference_Location
Cape Town
Print_ISBN
978-1-890843-17-5
Electronic_ISBN
978-1-890843-18-2
Type
conf
DOI
10.1109/PICMET.2008.4599696
Filename
4599696
Link To Document