DocumentCode
2772665
Title
Promoting Total Efficiency in Text Clustering via Iterative and Interactive Metric Learning
Author
Momma, Michinari ; Morinaga, Satoshi ; Komura, Daisuke
Author_Institution
Common Platform Software Res. Labs., NEC Corp., Kawasaki, Japan
fYear
2009
fDate
6-9 Dec. 2009
Firstpage
878
Lastpage
883
Abstract
In this paper, we propose a framework to make the text clustering process, as a whole, efficient. In a real text clustering task, an analyst usually has some expectation on the results in mind. However, a single run of a clustering algorithm on the preprocessed data would not satisfy the expectation. Then the analyst faces labor-intensive trials for improving the results that involve repetitive feature refinement and parameter tuning. We develop the Iterative and Interactive Metric Learning System (IIMLS) for addressing the challenge. Specifically, IIMLS allows analysts to input feedback on a current clustering result. Given the feedback, IIMLS optimizes metric in the feature space so that the clustering algorithm applied with the refined metric would reflect the feedback. As a byproduct, learned metric may be used for a similar dataset. Illustrative examples on a real-world dataset show IIMLS can dramatically improve efficiency of a text clustering task. The learned ¿knowledge¿, or the metric, is visualized for gaining insights of the optimized feature metric.
Keywords
algorithm theory; iterative methods; optimisation; pattern classification; text analysis; IIMLS optimizes metric; current clustering result; interactive metric learning; interactive metric learning system; labor intensive trials; optimized feature metric; parameter tuning; promoting total efficiency; real text clustering task; real world dataset show; repetitive feature refinement; single run clustering algorithm; text clustering process; via iterative; Clustering algorithms; Data mining; Engines; Feedback; Iterative algorithms; Laboratories; Learning systems; Man machine systems; National electric code; Visualization; data preprocessing; interactive system; metric learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on
Conference_Location
Miami, FL
ISSN
1550-4786
Print_ISBN
978-1-4244-5242-2
Electronic_ISBN
1550-4786
Type
conf
DOI
10.1109/ICDM.2009.124
Filename
5360327
Link To Document