Implementation of two-tier link extractor in optimized search engine filtering system

Author

Kumar, S. Mohan ; Revathy, P. ; Vijayalakshmi, K.

Author_Institution

IT Dept., ACE Eng. Coll., Hyderabad, India

fYear

2009

fDate

9-11 Dec. 2009

Firstpage

1

Lastpage

4

Abstract

In the present world, Internet has become very familiar to everyone. In Internet, Search Engine is an efficient tool to retrieve documents related to user queries. But the documents retrieved are often large in number and most of them are unrelated to queries. The present day problem is to minimize the unrelated documents. This paper is trying to find a solution by considering a new filtering system to reduce the number of unrelated documents by the search engine. This optimization is performed in various steps. Each step includes several modules. One of these modules is Link Extractor. This research is towards the link extractor´s architectural design. After searching the result from the Web this filtering system will display the result to user by Re-ranker, which assigns the value for search engine´s retrieved result links. After re-ranking, the most challenging task is to find out duplicate URL´s. The impact of Tier I Link Extractor is it scans every URL´s content by extraction technique. After this extraction of links, we can eliminate the duplicate URL´s in two ways as URL´s are same & anchor-text information is same. Elimination in first case is easy, but the second case i.e., checking every link´s content, is too complicated. Tier II Link Extractor implements these two ways. And also it involves in the process of elimination, by document comparison methods with the help of some filters. By performing all these steps, this filtering system can reduce the access time of the users.

Keywords

Internet; data mining; filtering theory; optimisation; search engines; Internet; anchor text information; document comparison methods; extraction technique; link extractor; optimized search engine filtering system; retrieve documents; two tier link extractor implementation; unrelated documents; Clustering algorithms; Data mining; Educational institutions; Information filtering; Information filters; Internet; Search engines; Space technology; Uniform resource locators; Voting; Anchor-text Information; Keyword; Link Extractor; Re-ranker; Search Engine; Vector Formulator;

fLanguage

English

Publisher

ieee

Conference_Titel

Internet Multimedia Services Architecture and Applications (IMSAA), 2009 IEEE International Conference on

Conference_Location

Bangalore

Print_ISBN

978-1-4244-4792-3

Electronic_ISBN

978-1-4244-4793-0

Type

conf

DOI

10.1109/IMSAA.2009.5439452

Filename

5439452