• DocumentCode
    1849854
  • Title

    Identifying reduplicative words for Vietnamese word segmentation

  • Author

    Ngoc Anh Tran ; Phuong Thai Nguyen ; Thanh Tinh Dao ; Hong Quan Nguyen

  • Author_Institution
    Dept. Inf. Technol., Le Quy Don Tech. Univ., Hanoi, Vietnam
  • fYear
    2015
  • fDate
    25-28 Jan. 2015
  • Firstpage
    77
  • Lastpage
    82
  • Abstract
    This paper proposes a method based on linguistic word-formation rules and dictionaries for determining reduplicative words in Vietnamese. The key idea for identifying whether adjacent syllables in a text can form a reduplicative word based on its formation rules. For 2-syllable reduplicative words, this paper uses rules that describe the repeating and the opposing between pairs of initial consonants, rhymes and tones. Then the method is expanded to identify reduplicative words that have 3 or 4 syllables from 2-syllable ones for the Vietnamese word segmentation task. Experimental results showed that the F1-score was improved to 98.61% and that word segmentation errors were reduced significantly, 1.26%.
  • Keywords
    dictionaries; natural language processing; word processing; 2-syllable reduplicative word; Vietnamese word segmentation; adjacent syllable; dictionary; linguistic word-formation rule; word segmentation error; Automata; Compounds; Dictionaries; Educational institutions; Information technology; Mutual information; Pragmatics; Vietnamese word segmentation; reduplicative rules; reduplicative word;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computing & Communication Technologies - Research, Innovation, and Vision for the Future (RIVF), 2015 IEEE RIVF International Conference on
  • Conference_Location
    Can Tho
  • Print_ISBN
    978-1-4799-8043-7
  • Type

    conf

  • DOI
    10.1109/RIVF.2015.7049878
  • Filename
    7049878