DocumentCode :
2526666
Title :
Classifying Web Pages Using Information Extraction Patterns Preliminary Results and Findings
Author :
Soon, Lay-Ki ; Lee, Sang Ho
Author_Institution :
Fac. of Inf. Technol., Multimedia Univ., Selangor, Malaysia
fYear :
2010
fDate :
15-18 Dec. 2010
Firstpage :
195
Lastpage :
202
Abstract :
Web page classification plays an essential role in facilitating more efficient information retrieval and information processing. Conventionally, web text documents are represented by term frequency matrix for classification purpose. However, considering the limitations of representing documents using terms or keywords, we propose to represent web pages using information extraction patterns that are identified within the pages respectively. In this paper, we present the results as well as the findings obtained from our preliminary experiments. Our experimental results indicate that the existence of a word in different contexts has different impact to the classification task. Thus, the extraction patterns used to represent each document are more semantically meaningful and give better insight to web classification in comparison with keywords.
Keywords :
Internet; classification; data mining; information retrieval; matrix algebra; text analysis; Web mining; Web page classification; Web text documents; information extraction patterns; information processing; information retrieval; term frequency matrix; Bayesian methods; Classification algorithms; Computer science; Indexing; Text categorization; Web pages; decision tree; information extraction; information gain; web classification; web mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal-Image Technology and Internet-Based Systems (SITIS), 2010 Sixth International Conference on
Conference_Location :
Kuala Lumpur
Print_ISBN :
978-1-4244-9527-6
Electronic_ISBN :
978-0-7695-4319-2
Type :
conf
DOI :
10.1109/SITIS.2010.42
Filename :
5714552
Link To Document :
بازگشت