• DocumentCode
    1718154
  • Title

    Automatic text summarization of Wikipedia articles

  • Author

    Hingu, Dharmendra ; Shah, Deep ; Udmale, Sandeep S.

  • Author_Institution
    Dept. of Comput. Eng. & Inf. Technol., Veermata Jijabai Technol. Inst., Mumbai, India
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    The main objective of a text summarization system is to identify the most important information from the given text and present it to the end users. In this paper, Wikipedia articles are given as input to system and extractive text summarization is presented by identifying text features and scoring the sentences accordingly. The text is first pre-processed to tokenize the sentences and perform stemming operations. We then score the sentences using the different text features. Two novel approaches implemented are using the citations present in the text and identifying synonyms. These features along with the traditional methods are used to score the sentences. The scores are used to classify the sentence to be in the summary text or not with the help of a neural network. The user can provide what percentage of the original text should be in the summary. It is found that scoring the sentences based on citations gives the best results.
  • Keywords
    Web sites; neural nets; text analysis; Wikipedia articles; automatic text summarization; neural network; sentence classification; sentence scoring; sentence tokenization; stemming operations; text feature identification; text preprocessing; Computers; Electronic publishing; Encyclopedias; Feature extraction; Internet; Neural networks; Frequency; Natural Language; Python; Text summarization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communication, Information & Computing Technology (ICCICT), 2015 International Conference on
  • Conference_Location
    Mumbai
  • Print_ISBN
    978-1-4799-5521-3
  • Type

    conf

  • DOI
    10.1109/ICCICT.2015.7045732
  • Filename
    7045732