Title :
Design and implementation of a web structure mining algorithm using breadth first search strategy for academic search application
Author :
Jeyalatha, S. ; Vijayakumar, B.
Author_Institution :
Dept. of Comput. Sci., BITS Pilani, Dubai, United Arab Emirates
Abstract :
This paper deals with Web Structure Mining, using the Breadth First Search strategy. While browsing the web, the user has to go through many pages of the Internet, filter data and download required information. This task of searching and downloading is time consuming. Sometimes the search queries call for specific option, say, limiting search to few links. To reduce the time spent by users, a web link extraction tool has been designed and implemented in Java, that analyzes the ways of extracting web link information using a standard interface. The Test Scenario has been presented with various keywords like Higher Education, Conference Alerts and Special Interest Group. The present work can be a useful input to Web Users, Faculty, Students and Web Administrators in a University Environment. The web extraction tool helps to save time in searching and downloading files from the web. Another strong requirement is to verify whether the search keywords which have been entered by the user, gives an user accurate and relevant results. This is made possible by performing a quick check on search links. The user can also view the internal links present in the selected HTML files and the adjacency list of the crawled files.
Keywords :
Internet; data mining; tree searching; Internet; Java; Web extraction tool; Web link extraction tool; Web link information; Web structure mining; academic search application; breadth first search; search queries; standard interface; Data mining; Google; HTML; Java; User interfaces; Web pages; XML; Adjacency List; Breadth First Search; Downloading; Web Extraction; Web Structure Mining;
Conference_Titel :
Internet Technology and Secured Transactions (ICITST), 2011 International Conference for
Conference_Location :
Abu Dhabi
Print_ISBN :
978-1-4577-0884-8