• DocumentCode
    610252
  • Title

    News auto-tagging using Wikipedia

  • Author

    Shams Eldin, Shaimaa ; El-Beltagy, S.R.

  • Author_Institution
    Center for Inf. Sci., Nile Univ., Giza, Egypt
  • fYear
    2013
  • fDate
    17-19 March 2013
  • Firstpage
    158
  • Lastpage
    163
  • Abstract
    This paper presents an efficient method for automatically annotating Arabic news stories with tags using Wikipedia. The idea of the system is to use Wikipedia article names, properties, and re-directs to build a pool of meaningful tags. Sophisticated and efficient matching methods are then used to detect text fragments in input news stories that correspond to entries in the constructed tag pool. Generated tags represent real life entities or concepts such as the names of popular places, known organizations, celebrities, etc. These tags can be used indirectly by a news site for indexing, clustering, classification, statistics generation or directly to give a news reader an overview of news story contents. Evaluation of the system has shown that the tags it generates are better than those generated by MSN Arabic news.
  • Keywords
    Web sites; indexing; natural language processing; pattern clustering; pattern matching; text analysis; MSN Arabic news; Wikipedia article names; Wikipedia article properties; automatic Arabic news story annotation; input news story text fragment detection; matching methods; news auto-tagging; news site classification; news site clustering; news site indexing; news story contents; real life entities; statistics generation; Dictionaries; Electronic publishing; Encyclopedias; Indexing; Internet; Tagging; Arabic text; Disambiguation; Tagging; Wikipedia;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Innovations in Information Technology (IIT), 2013 9th International Conference on
  • Conference_Location
    Abu Dhabi
  • Type

    conf

  • DOI
    10.1109/Innovations.2013.6544411
  • Filename
    6544411