DocumentCode :
163217
Title :
Text corpus for natural language story-telling sentence generation: A design and evaluation
Author :
Limpanadusadee, Worasa ; Punyabukkana, Proadpran ; Suchato, Atiwong ; Poobrasert, Onintra
Author_Institution :
Dept. of Comput. Eng., Chulalongkorn Univ., Bangkok, Thailand
fYear :
2014
fDate :
14-16 May 2014
Firstpage :
80
Lastpage :
85
Abstract :
Automatic generation of narrative sentences from unordered word sets is desirable in Augmentative and Alternative Communication (AAC) systems for children with certain learning disabilities (LD). Regardless of the complexity of the Natural Language Processing deployed in sentence generation procedures, the qualities of language models always affect the generation results. This work compared sentence generation accuracies obtained from a multi-tier N-gram-based procedure trained on BEST2010, a large publicly available text corpus, and a smaller but more specifically designed corpus in the task of Thai simple sentence generation. The latter, a new corpus called TELL-S, was created based on an analysis of the contents belonging to textbooks used in grade 1 and grade 2 for Thai language subjects according to the compulsory curriculum for Thai schools. The original procedure was also modified to incorporate additional constraints based on a story-telling guideline developed for LD children. Evaluated upon test sets of 195 sentences, each of which was composed of 3-6 words with a specific Part-Of-Speech combination, TELL-S was shown to provide better generalization and yielded higher accuracies than BEST2010 in all cases with unbiased word sets. The sentence generation accuracies were 100% and 70% for 3-word and 4-word sentences, respectively. The average accuracy was at 58.8% when longer sentences were also included.
Keywords :
computational linguistics; computer aided instruction; handicapped aids; natural language processing; AAC system; BEST2010; TELL-S; Thai simple sentence generation; augmentative and alternative communication; language model; learning disability; multitier N-gram; narrative sentences; natural language processing; part-of-speech combination; story-telling sentence generation; text corpus; Augmentative and Alternative Communication; Corpus Management; Learning Disabilities; N-Gram Model; Natural Language Generation; Statistical Natural Language Processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Software Engineering (JCSSE), 2014 11th International Joint Conference on
Conference_Location :
Chon Buri
Print_ISBN :
978-1-4799-5821-4
Type :
conf
DOI :
10.1109/JCSSE.2014.6841846
Filename :
6841846
Link To Document :
بازگشت