DocumentCode :
3165904
Title :
External Evaluation of Topic Models: A Graph Mining Approach
Author :
Hau Chan ; Akoglu, Leman
Author_Institution :
Dept. of Comput. Sci., Stony Brook Univ., Stony Brook, NY, USA
fYear :
2013
fDate :
7-10 Dec. 2013
Firstpage :
973
Lastpage :
978
Abstract :
Given a topic and its top-k most relevant words generated by a topic model, how can we tell whether it is a low-quality or a high-quality topic? Topic models provide a low-dimensional representation of large document corpora, and drive many important applications such as summarization, document segmentation, word-sense disambiguation, etc. Evaluation of topic models is an important issue, since low-quality topics potentially degrade the performance of these applications. In this paper, we develop a graph mining and machine learning approach for the external evaluation of topic models. Based on the graph-centric features we extract from the projection of topic words on the Wikipedia page-links graph, we learn models that can predict the human-perceived quality of topics (based on human judgments), and classify them as high or low quality. Experiments on four real-world corpora show that our approach boosts the prediction performance up to 30% over three baselines of various complexities, and demonstrate the generality of our method to diverse domains. In addition, we provide an interpretation of our models and outline the discriminating characteristics of topic quality.
Keywords :
Web sites; data mining; feature extraction; graph theory; learning (artificial intelligence); natural language processing; pattern classification; text analysis; Wikipedia page-links graph; document segmentation; graph mining; graph-centric feature extraction; high-quality topic; human-perceived quality; large document corpora; low-dimensional representation; low-quality topic; machine learning approach; summarization; top-k most relevant words; topic model; topic quality; topic word projection; word sense disambiguation; Digital signal processing; Electronic publishing; Encyclopedias; Feature extraction; Internet; Predictive models; graph mining; human evaluation; topic models;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2013 IEEE 13th International Conference on
Conference_Location :
Dallas, TX
ISSN :
1550-4786
Type :
conf
DOI :
10.1109/ICDM.2013.112
Filename :
6729584
Link To Document :
بازگشت