شماره ركورد كنفرانس :
144
عنوان مقاله :
Evaluating preprocessing by Turing Machine in text categorization
پديدآورندگان :
Khotanlou Hassan نويسنده Department of Computer Engineering , Esmaeilpour Mansour نويسنده Department of Computer Engineering, Hamedan Branch, Islamic Azad University, Hamedan, Iran , Abbasi Ghalehtaki Razieh نويسنده
كليدواژه :
Text Categorization , Support vector machines , Turing machine , preprocessing
عنوان كنفرانس :
مجموعه مقالات دوازدهمين كنفرانس سيستم هاي هوشمند ايران
چكيده فارسي :
By developing the World Wide Web, text
categorization becomes a key technology to deal with and organize
a large number of documents. Automatic text categorization is a
method to contrast a massive data. The basic phases of text
categorization include preprocessing, extracting relevant features
against the features in a database, and finally categorizing a set of
documents into predefined categories. In this article, we propose
a new preprocessing method by Turing Machine. All of four steps
in preprocessing such as sentence segmentation, tokenization, stop
word removal, and word stemming are done by Turing Machine.
Aiming to access the importance of the preprocessing by Turing
Machine on the text classification problem, we applied the support
vector machine paradigm to the Reuters and PAGOD dataset.
Searching for the best document representation, we evaluated and
analyzed some known feature reduction, feature subset selection
and term weighting. Experiments show that proposed method is
more accurate than other methods.
شماره مدرك كنفرانس :
3817034