• DocumentCode
    238196
  • Title

    Sentence generation from a bag of words using N-gram model

  • Author

    Yadav, Arun Kumar ; Borgohain, Samir Kumar

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Nat. Inst. of Technol. Silchar, Silchar, India
  • fYear
    2014
  • fDate
    8-10 May 2014
  • Firstpage
    1771
  • Lastpage
    1776
  • Abstract
    We are presenting in this paper, a method of sentence generation from a given bag of words. The task of sentence generation has its usage in text summarization, question answering system etc. The focus of our task is to generate all possible correct sentences from a given bag of words. The technique that we have applied is N-gram language model. The N-gram model is trained by a text corpus to generate only candidate sequences from a given bag of words. For N input words, instead of considering all possible N! permuted orders as candidate sequence, we have generated only candidate sequences less then N! by applying DFS (Depth First Search) filtering technique at run time. We have two corpora namely text corpus and annotated corpus of POS tags. We have extracted all valid POS trigram tags from the annotated corpus. Each of the generated candidate sequence has a probability score. The candidate sequences were ranked by matching it with valid trigram POS tag signature and probability score. Preliminary experimental work carried out in this direction by using the above mentioned model shows promising results.
  • Keywords
    computational linguistics; natural language processing; probability; speech processing; text analysis; tree searching; DFS filtering technique; POS trigram tag extraction; annotated corpus; bag-of-words; correct-sentence generation method; depth-first search filtering technique; n-gram language model; n-input words; probability score; run time analysis; sequence generation; sequence matching; sequence ranking; text corpus; trigram POS tag signature; Depth First Search; N-gram Language Model; Part of Speech Tagging; Sentence Generation; Syntax;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Communication Control and Computing Technologies (ICACCCT), 2014 International Conference on
  • Conference_Location
    Ramanathapuram
  • Print_ISBN
    978-1-4799-3913-8
  • Type

    conf

  • DOI
    10.1109/ICACCCT.2014.7019414
  • Filename
    7019414