DocumentCode
3527864
Title
Phrase Ranking and Wikipedia Based Cluster Labeling
Author
Chinthala, Pradyumna Reddy
Author_Institution
Goa Campus, Dept. of Comput. Sci., BITS Pilani, Zuarinagar, India
fYear
2013
fDate
21-23 Dec. 2013
Firstpage
199
Lastpage
202
Abstract
Automatically labeling document clusters with words which indicate their topics is a relatively new and active research field. The most frequently used process, labeling with the most frequent words in the clusters, turns out using several words that are virtually void of descriptive power even after traditional stop words are eliminated. Another procedure, labeling with the most anticipated words, often include rather obscure results. We present Phrase Rank, a variation of the Page Rank algorithm based on relational graph representation of the content of web document collections. Phrase Rank achieves segregation and ranking of discriminative phrases higher than the ambiguous Phrases followed by common phrases. Thus a set of important text features are first extracted from the cluster documents. Further we use these features to extract cluster labels from the external knowledge sources such as pre-categorized knowledge of Wikipedia. We experiment with a test dataset to demonstrate the efficacy of Phrase Rank algorithm.
Keywords
Web sites; graph theory; pattern clustering; text analysis; Web document collections; Wikipedia based cluster labeling; active research field; cluster documents; document clusters; most anticipated words; most frequent words; page rank algorithm; phrase ranking; relational graph representation; text feature extraction; topics; Clustering algorithms; Electronic publishing; Encyclopedias; Games; Internet; Labeling; Cluster labeling; PageRank; Phrase ranking; Wikipedia;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Intelligence and Research Advancement (ICMIRA), 2013 International Conference on
Conference_Location
Katra
Type
conf
DOI
10.1109/ICMIRA.2013.44
Filename
6918821
Link To Document