• DocumentCode
    1076670
  • Title

    Generative and Discriminative Methods Using Morphological Information for Sentence Segmentation of Turkish

  • Author

    Guz, Umit ; Favre, Benoit ; Hakkani-Tür, Dilek ; Tur, Gokhan

  • Author_Institution
    Int. Comput. Sci. Inst. (ICSI), Berkeley, CA
  • Volume
    17
  • Issue
    5
  • fYear
    2009
  • fDate
    7/1/2009 12:00:00 AM
  • Firstpage
    895
  • Lastpage
    903
  • Abstract
    This paper presents novel methods for generative, discriminative, and hybrid sequence classification for segmentation of Turkish word sequences into sentences. In the literature, this task is generally solved using statistical models that take advantage of lexical information among others. However, Turkish has a productive morphology that generates a very large vocabulary, making the task much harder. In this paper, we introduce a new set of morphological features, extracted from words and their morphological analyses. We also extend the established method of hidden event language modeling (HELM) to factored hidden event language modeling (fHELM) to handle morphological information. In order to capture non-lexical information, we extract a set of prosodic features, which are mainly motivated from our previous work for other languages. We then employ discriminative classification techniques, boosting and conditional random fields (CRFs), combined with fHELM, for the task of Turkish sentence segmentation.
  • Keywords
    speech processing; word processing; Turkish word sequences; conditional random fields; discriminative classification techniques; discriminative methods; generative methods; hidden event language modeling; morphological information; sentence segmentation; Automatic speech recognition; Boosting; Computer science; Data mining; Feature extraction; Hidden Markov models; Hybrid power systems; Morphology; Natural languages; Vocabulary; Prosodic and lexical information; Turkish morphology; sentence segmentation;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2009.2016393
  • Filename
    5075771