Title of article :
Farsi Conceptual Text Summarizer: A New Model in Continuous Vector Space
Author/Authors :
Khademi, Mohammad Ebrahim Faculty of Electrical and Computer Engineering - Malek Ashtar University of Technology, Iran , Fakhredanesh, Mohammad Faculty of Electrical and Computer Engineering - Malek Ashtar University of Technology, Iran , Hoseini, Mojtaba Faculty of Electrical and Computer Engineering - Malek Ashtar University of Technology, Iran
Abstract :
Traditional methods of summarization were very costly and time-consuming. This led to the emergence of automatic methods for text summarization. Extractive summarization is an automatic method for generating summary by identifying the most important sentences of a text. In this paper, two innovative approaches are presented for summarizing the Farsi texts. In these methods, using a combination of deep learning and statistical methods (TFIDF), we cluster the concepts of the text and, based on the importance of the concepts in each sentence, we derive the sentences that have the most conceptual burden. In these methods, we have attempted to address the weaknesses of representation in repetition-based statistical methods by exploiting the unsupervised extraction of association between vocabulary through deep learning. In the first unsupervised method, without using any hand-crafted features, we achieved state-of-the-art results on the Pasokh single-document corpus as compared to the best supervised Farsi methods. In order to have a better understanding of the results, we have evaluated the human summaries generated by the contributing authors of the Pasokh corpus as a measure of the success rate of the proposed methods. In terms of recall, these have achieved favorable results. In the second method, by giving the coefficient of title effect and its increase, the average ROUGE-2 values increased to 0.4% on the Pasokh single-document corpus compared to the first method and the average ROUGE-1 values increased to 3% on the Khabir news corpus.
Keywords :
Word Embedding , Language Independent Summarization , Continuous Vector Space , Unsupervised Learning , Extractive Text Summarization
Journal title :
Astroparticle Physics