• DocumentCode
    1970577
  • Title

    Implementation of two-tier link extractor in optimized search engine filtering system

  • Author

    Kumar, S. Mohan ; Revathy, P. ; Vijayalakshmi, K.

  • Author_Institution
    IT Dept., ACE Eng. Coll., Hyderabad, India
  • fYear
    2009
  • fDate
    9-11 Dec. 2009
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    In the present world, Internet has become very familiar to everyone. In Internet, Search Engine is an efficient tool to retrieve documents related to user queries. But the documents retrieved are often large in number and most of them are unrelated to queries. The present day problem is to minimize the unrelated documents. This paper is trying to find a solution by considering a new filtering system to reduce the number of unrelated documents by the search engine. This optimization is performed in various steps. Each step includes several modules. One of these modules is Link Extractor. This research is towards the link extractor´s architectural design. After searching the result from the Web this filtering system will display the result to user by Re-ranker, which assigns the value for search engine´s retrieved result links. After re-ranking, the most challenging task is to find out duplicate URL´s. The impact of Tier I Link Extractor is it scans every URL´s content by extraction technique. After this extraction of links, we can eliminate the duplicate URL´s in two ways as URL´s are same & anchor-text information is same. Elimination in first case is easy, but the second case i.e., checking every link´s content, is too complicated. Tier II Link Extractor implements these two ways. And also it involves in the process of elimination, by document comparison methods with the help of some filters. By performing all these steps, this filtering system can reduce the access time of the users.
  • Keywords
    Internet; data mining; filtering theory; optimisation; search engines; Internet; anchor text information; document comparison methods; extraction technique; link extractor; optimized search engine filtering system; retrieve documents; two tier link extractor implementation; unrelated documents; Clustering algorithms; Data mining; Educational institutions; Information filtering; Information filters; Internet; Search engines; Space technology; Uniform resource locators; Voting; Anchor-text Information; Keyword; Link Extractor; Re-ranker; Search Engine; Vector Formulator;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Internet Multimedia Services Architecture and Applications (IMSAA), 2009 IEEE International Conference on
  • Conference_Location
    Bangalore
  • Print_ISBN
    978-1-4244-4792-3
  • Electronic_ISBN
    978-1-4244-4793-0
  • Type

    conf

  • DOI
    10.1109/IMSAA.2009.5439452
  • Filename
    5439452