• DocumentCode
    653893
  • Title

    Distributed classification of Persian News (Case study: Hamshahri News dataset)

  • Author

    Esmaeili, Leila ; Akbari, Mohammad Kazem ; Amiry, Vahid ; Sharifian, Saeed

  • Author_Institution
    Comput. Eng. & Inf. Technol. Dept., Amirkabir Univ. of Technol., Tehran, Iran
  • fYear
    2013
  • fDate
    Oct. 31 2013-Nov. 1 2013
  • Firstpage
    46
  • Lastpage
    51
  • Abstract
    Classifying the News specifies the most likely topic that the News content refers to it. In this paper, we use distance detection in vector space model for classifying the News articles. In this method, it is calculated distances between weighted frequency vectors of each category, and the News vector determine its category by finding minimum distance with weighted frequency vector of categories. According to volume of the News articles on each topic, extracting keywords, building weighted frequency vectors and determining vector distances are very time consuming operations. So, in order to increase performance, calculation accuracy and decrease execution time, we use MapReduce, a distributed programming model, to implement and execute distributed classification of the News articles. This research is the first attempt to classifying Persian data in distributed manner and results of this research can be used for other text mining areas in any languages. It is worth mentioning that we have successfully implemented our method on the supercomputer of Amirkabir University of Technology.
  • Keywords
    data mining; information resources; pattern classification; Amirkabir University of Technology; Hamshahri News dataset; MapReduce; Persian News; Persian data classification; distance detection; distributed classification; distributed programming model; keywords extraction; news article classification; news vector; text mining; vector distances; vector space model; weighted frequency vectors; Classification algorithms; Computational modeling; Hardware; Manuals; Open source software; Sorting; Text categorization; Apache Hadoop; Distributed Computing; MapReduce; Text Classification; Vector Space Model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Knowledge Engineering (ICCKE), 2013 3th International eConference on
  • Conference_Location
    Mashhad
  • Print_ISBN
    978-1-4799-2092-1
  • Type

    conf

  • DOI
    10.1109/ICCKE.2013.6682829
  • Filename
    6682829