DocumentCode :
2446059
Title :
Persian Text Normalization using Classification Tree and Support Vector Machine
Author :
Moattar, M.H. ; Homayounpour, M.M. ; Zabihzadeh, D.
Author_Institution :
Dept. of Comput. Eng. & Inf. Technol., Amirkabir Univ. of Technol., Tehran
Volume :
1
fYear :
0
fDate :
0-0 0
Firstpage :
1308
Lastpage :
1311
Abstract :
Text normalization is one of the most important tasks in text processing and text to speech conversion. In this paper, we propose a machine learning method to determine the type of Farsi language non-standard words (NSWs) by only using the structure of these words. Two methods including support vector machines (SVM) and classification and regression trees (CART) were used and evaluated on different training and test sets for NSW type classification in Farsi. The experimental results show that, NSW type classification in Farsi can be efficiently done by using only the structural form of Farsi non-standard words. In addition, the results is compared with a previous work done on normalization using multi-layer perceptron (MLP) neural network and shows that SVM outperforms MLP in both number of efforts and total performance
Keywords :
learning (artificial intelligence); multilayer perceptrons; natural languages; pattern classification; regression analysis; support vector machines; text analysis; trees (mathematics); Farsi language nonstandard words; Persian text normalization; classification and regression trees; classification tree; machine learning; multilayer perceptron neural network; support vector machine; text processing; text to speech conversion; Classification tree analysis; Learning systems; Multilayer perceptrons; Natural languages; Regression tree analysis; Speech synthesis; Support vector machine classification; Support vector machines; Testing; Text processing; Classification and Regression Tree; Multi-Layer Perceptron; Support Vector Machine; Text Normalization; Text to Speech;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Communication Technologies, 2006. ICTTA '06. 2nd
Conference_Location :
Damascus
Print_ISBN :
0-7803-9521-2
Type :
conf
DOI :
10.1109/ICTTA.2006.1684569
Filename :
1684569
Link To Document :
بازگشت