مرکز منطقه ای اطلاع رساني علوم و فناوري - Text2arff: Automatic feature extraction software for Turkish texts

DocumentCode :

3337226

Title :

Text2arff: Automatic feature extraction software for Turkish texts

Author :

Amasyali, M. Fatih ; Davletov, Feruz ; Torayew, Arslan ; Çiftçi, Ümit

fYear :

2010

fDate :

22-24 April 2010

Firstpage :

629

Lastpage :

632

Abstract :

Which features are the most important for the text classification tasks? In the automatic text categorization area, several studies seek answers to this question. In this paper, a feature extraction tool for Turkish texts (Text2arff) is presented. The toolbox automatically extracts several features such as the frequencies of the words and ngrams, word clustering, Latent semantic indexing etc. The features of the texts are saved in arff (WEKA) file format. The arff files can be used easily with WEKA machine learning library.

Keywords :

feature extraction; natural language processing; pattern classification; text analysis; word processing; Turkish; WEKA; arff file format; feature extraction; text categorization; text classification;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Signal Processing and Communications Applications Conference (SIU), 2010 IEEE 18th

Conference_Location :

Diyarbakir

Print_ISBN :

978-1-4244-9672-3

Type :

conf

DOI :

10.1109/SIU.2010.5651686

Filename :

5651686

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3337226