• DocumentCode
    3527864
  • Title

    Phrase Ranking and Wikipedia Based Cluster Labeling

  • Author

    Chinthala, Pradyumna Reddy

  • Author_Institution
    Goa Campus, Dept. of Comput. Sci., BITS Pilani, Zuarinagar, India
  • fYear
    2013
  • fDate
    21-23 Dec. 2013
  • Firstpage
    199
  • Lastpage
    202
  • Abstract
    Automatically labeling document clusters with words which indicate their topics is a relatively new and active research field. The most frequently used process, labeling with the most frequent words in the clusters, turns out using several words that are virtually void of descriptive power even after traditional stop words are eliminated. Another procedure, labeling with the most anticipated words, often include rather obscure results. We present Phrase Rank, a variation of the Page Rank algorithm based on relational graph representation of the content of web document collections. Phrase Rank achieves segregation and ranking of discriminative phrases higher than the ambiguous Phrases followed by common phrases. Thus a set of important text features are first extracted from the cluster documents. Further we use these features to extract cluster labels from the external knowledge sources such as pre-categorized knowledge of Wikipedia. We experiment with a test dataset to demonstrate the efficacy of Phrase Rank algorithm.
  • Keywords
    Web sites; graph theory; pattern clustering; text analysis; Web document collections; Wikipedia based cluster labeling; active research field; cluster documents; document clusters; most anticipated words; most frequent words; page rank algorithm; phrase ranking; relational graph representation; text feature extraction; topics; Clustering algorithms; Electronic publishing; Encyclopedias; Games; Internet; Labeling; Cluster labeling; PageRank; Phrase ranking; Wikipedia;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Intelligence and Research Advancement (ICMIRA), 2013 International Conference on
  • Conference_Location
    Katra
  • Type

    conf

  • DOI
    10.1109/ICMIRA.2013.44
  • Filename
    6918821