Title :
Web indexing using HTML priority system
Author_Institution :
Dept. of Inf. Technol., SRM Univ., Kattankulathur, India
Abstract :
The unstructured nature and the sheer size of the World Wide Web make it a challenging task to index. This paper will discuss about how web can be incrementally indexed using Inverted Indices and Distributed Hash Table for efficiently organizing the data while incrementally build the index using the search mechanism itself, and HTML Priority System for ranking the pages to improve precision and recall. It also discusses certain challenges that a content-based ranking system must face to counter spam.
Keywords :
Internet; hypermedia markup languages; indexing; HTML priority system; Web indexing; World Wide Web; content-based ranking system; distributed hash table; inverted indices; spam; Crawlers; HTML; Indexing; Search engines; Uniform resource locators; Unsolicited electronic mail; Distributed Hash Tables; HTML Priority System; Inverted Index; Search Engine; Web Indexing;
Conference_Titel :
Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE), 2015 International Conference on
Conference_Location :
Noida
Print_ISBN :
978-1-4799-8432-9
DOI :
10.1109/ABLAZE.2015.7154929