Title of article :
Optimization of web search techniques using frequency analysis
Author/Authors :
Aishwarya, M Technology - Coimbatore, India , Ilayaraja, N Technology - Coimbatore, India , Periakaruppan, R.M Applied Mathematics and Computational Sciences - PSG College of Technology - Coimbatore, India
Abstract :
The raw data obtained in the form of search results may be large for any particular problem, but
is often a relatively small subset of the data that are relevant, and a search engine does not enable
discovering the necessary subset of relevant text data in a large text collection. In this paper,
a solution to a problem called conformity to truth, which studies how to find websites with the
maximum amount of true facts, from a large amount of conflicting information on the user-defined
topic, is proposed. Two algorithms called ParaSearch and FactFinder, which helps in identifying the
best web links for searching general information and finding individual facts respectively are proposed.
In ParaSearch, latent Dirichlet allocation (LDA) is used to identify the top 10 frequent terms using
which we further construct a similarity matrix to identify the best web pages. In FactFinder, the
usage of semantic processing is done to identify the best web pages, building upon the existing
Page Rank Algorithm to further optimize the search results. The results prove that ParaSearch can
identify web pages with the maximum number of facts conforming to the truth much better than
popular search engines. The ambiguity of the individual facts is decreased to a great extent by using
the FactFinder algorithm. Thus these algorithms will increase the accuracy of identifying possible
web links for a given search word much better than most of the popular search engines.
Keywords :
frequency analysis , text mining , latent Dirichlet allocation
Journal title :
International Journal of Nonlinear Analysis and Applications