• DocumentCode
    493507
  • Title

    Subsequence-Based Text Segmentation and Labeling

  • Author

    Chen, Xi ; Chen, Shihong

  • Author_Institution
    Comput. Sch., Wuhan Univ., Wuhan
  • Volume
    1
  • fYear
    2009
  • fDate
    7-8 March 2009
  • Firstpage
    582
  • Lastpage
    587
  • Abstract
    Text segmentation is important for many natural language processing tasks, such as passage retrieval and summarization. This paper uses suffix tree model for the text representation and introduces a new measure, subsequence-based coherence, to represent the coherence between sentences and utilize the word order information. This paper also introduces a text segmentation algorithm, subsequence-based maximum cut, and a passage labeling approach based on subsequences. The educational text segmentation results show that our method outperforms some of the existing methods, and the passage labeling result is approving.
  • Keywords
    natural language processing; text analysis; trees (mathematics); natural language processing task; passage labeling approach; subsequence-based coherence; subsequence-based maximum cut; subsequence-based text segmentation; suffix tree model; text labeling; text representation; word order information; Books; Coherence; Computer science; Computer science education; Educational technology; Information retrieval; Intelligent systems; Labeling; Natural language processing; Supervised learning; maximum cut; passage labeling; sentence coherence; subsequence; text segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Education Technology and Computer Science, 2009. ETCS '09. First International Workshop on
  • Conference_Location
    Wuhan, Hubei
  • Print_ISBN
    978-1-4244-3581-4
  • Type

    conf

  • DOI
    10.1109/ETCS.2009.138
  • Filename
    4958841