DocumentCode :
2664583
Title :
A Novel Efficient Classification Algorithm for Search Engines
Author :
Alla, H.A.H.M.A. ; Al-Ghreimil, N.
Author_Institution :
Inf. Technol. Dept., King Saud Univ., Riyadh, Saudi Arabia
fYear :
2008
fDate :
10-12 Dec. 2008
Firstpage :
773
Lastpage :
778
Abstract :
In this paper a new classification algorithm of Web documents into a set of categories, is proposed. The proposed technique is based on analyzing relationships between different documents and the terms they contain by producing a set of rules relating the category of the document, its terms and their frequencies. Each document is represented by a graph that correlates its most frequent combined words and its category. The relationships among these graphs and the documentspsila categories are captured. The proposed technique has three phases. The first phase is a training phase where human experts determines the categories of different Web pages and articles and combine these categories with appropriate weighted index. The second phase is the blind categorization phase to build a database that will be categorized according to the result of the first phase. The third phase is applying the proposed graph representation technique on the whole set of documents per category to determine its final graph representation. The third phase will produce better classification rules because the sample size is larger with no additional cost of supervised categorization. Experiments using data sets collected from different Web portals are conducted.
Keywords :
document handling; graph theory; search engines; Web documents; Web portals; blind categorization phase; classification algorithm; documents categories; graph representation technique; search engines; training phase; weighted index; Classification algorithms; Database systems; Educational institutions; Humans; Information technology; Portals; Search engines; Web mining; Web pages; World Wide Web; Document Classification.; Information Processing; Supervised Classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence for Modelling Control & Automation, 2008 International Conference on
Conference_Location :
Vienna
Print_ISBN :
978-0-7695-3514-2
Type :
conf
DOI :
10.1109/CIMCA.2008.68
Filename :
5172723
Link To Document :
بازگشت