DocumentCode
3229742
Title
Web Directory Integration Using Conditional Random Fields
Author
Wu, Terry Chia-Wei ; Hsu, Wen-Lian
Author_Institution
Inst. of Inf. Sci., Acad. Sinica, Taipei
fYear
2006
fDate
18-22 Dec. 2006
Firstpage
540
Lastpage
543
Abstract
The purpose of integrating web directories is to transfer instances from a source to a target directory. Unlike conventional text categorization, in directory integration, there is extra information about the source directory that can be used to improve the classification accuracy. Many approaches exploit the measured similarity between two corresponding classes to enhance traditional text classifiers. These methods perform well if the topics of two classes are very similar, but they could lead to misclassification if the topics are dissimilar. We propose a directory integration approach based on the conditional random fields (CRFs) model, and model the integration process using a finite-state model. The advantage of using CRFs is that the transition features naturally include information about the relations between classes. Our results show that CRFs outperform conventional text classifiers. In addition, CRFs allow us to apply complex features to integrate the information about the contents of class and their labels. The performance of our approach can be improved by applying these features, especially for instances whose source and target classes are moderately similar
Keywords
Internet; Markov processes; classification; probability; text analysis; Markov process; Web directory integration; classification; conditional random fields; finite-state model; probability; Computer science; Crawlers; Information science; Internet; Libraries; Search engines; Text categorization; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on
Conference_Location
Hong Kong
Print_ISBN
0-7695-2747-7
Type
conf
DOI
10.1109/WI.2006.190
Filename
4061428
Link To Document