• DocumentCode
    2621091
  • Title

    Measuring article quality in Wikipedia: Lexical clue model

  • Author

    Xu, Yanxiang ; Luo, Tiejian

  • Author_Institution
    Grad. Univ. of Chinese Acad. of Sci., Beijing, China
  • fYear
    2011
  • fDate
    26-28 Oct. 2011
  • Firstpage
    141
  • Lastpage
    146
  • Abstract
    Wikipedia is the most entry-abundant on-line encyclopedia. Some studies published by Nature proved that the scientific entries in Wikipedia are of good quality comparable to those in the Encyclopedia Britannica which are mainly maintained by experts. But the manual partition of the articles in Wikipedia from a WikiProject implies that high-quality articles are usually reached grade by grade via being repeatedly revised. So many work address to automatically measuring the article quality in Wikipedia based on some assumption of the relationship between the article quality and contributors´ reputations, view behaviors, article status, inter-article link, or so on. In this paper, a lexical clue based measuring method is proposed to assess article quality in Wikipedia. The method is inspired the idea that the good articles have more regular statistic features on lexical usage than the primary ones due to the more revise by more people. We select 8 lexical features derived from the statistic on word usages in articles as the factors that can reflect article quality in Wikipedia. A decision tree is trained based on the lexical clue model. Using the decision tree, our experiments on a well-labeled collection of 200 Wikipedia articles shows that our method has more than 83% precise and recall.
  • Keywords
    Web sites; decision trees; text analysis; WikiProject; Wikipedia; article quality; decision tree; lexical clue model; on-line encyclopedia; regular statistic features; Argon; Electronic publishing; Encyclopedias; Internet; Manuals; Wikipedia; article quality; decision tree; lexical clue;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Society (SWS), 2011 3rd Symposium on
  • Conference_Location
    Port Elizabeth
  • ISSN
    2158-6985
  • Print_ISBN
    978-1-4577-0212-9
  • Type

    conf

  • DOI
    10.1109/SWS.2011.6101286
  • Filename
    6101286