مرکز منطقه ای اطلاع رساني علوم و فناوري - Learning to Group Web Text Incorporating Prior Information

DocumentCode :

3127614

Title :

Learning to Group Web Text Incorporating Prior Information

Author :

Cheng, Yu ; Zhang, Kunpeng ; Xie, Yusheng ; Agrawal, Ankit ; Liao, Wei-keng ; Choudhary, Alok

Author_Institution :

Dept. of Electr. Eng. & Comput. Sci., Northwestern Univ., Evanston, IL, USA

fYear :

2011

fDate :

11-11 Dec. 2011

Firstpage :

212

Lastpage :

219

Abstract :

Clustering similar items for web text has become increasingly important in many Web and Information Retrieval applications. For several kinds of web text data, it is much easier to obtain some external information other than textual features which can be utilized to improve the performance of clustering analysis. This external information, called prior information, indicates label sign and pair wise constraints on sample points. We propose a unifying framework that can incorporate prior information of cluster membership for web text cluster analysis and develop a novel semi-supervised clustering model. The proposed framework offers several advantages over existing semi-supervised approaches. First, most previous work handles labeled data by converting it to pair wise constraints and thus leads to much more computation. The proposed approach can handle pair wise constraints together with labeled data simultaneously so that the computation is greatly reduced. Second, the framework allows us to obtain these prior information automatically or only with little human effort, thus, making it possible to boost the clustering learning performance relatively easily. We evaluated the proposed method on the real-world problems of automatically grouping online news feeds and web blog messages. Experimental results indicate the proposed framework incorporating prior information can indeed lead to statistically significant clustering improvements over the performance of approaches access only to textual features.

Keywords :

Internet; Web sites; learning (artificial intelligence); pattern clustering; text analysis; Web blog messages; Web text cluster analysis; Web text grouping learning; information retrieval applications; items clustering; label sign; online news feed grouping; pair wise constraints; prior information; semisupervised clustering model; textual features; Accuracy; Approximation methods; Data mining; Data models; Entropy; Google; Training; pairwise constraints; prior information; semi-supervised clustering; web text;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on

Conference_Location :

Vancouver, BC

Print_ISBN :

978-1-4673-0005-6

Type :

conf

DOI :

10.1109/ICDMW.2011.111

Filename :

6137382

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3127614