شماره ركورد كنفرانس :
144
عنوان مقاله :
Evaluating preprocessing by Turing Machine in text categorization
پديدآورندگان :
Khotanlou Hassan نويسنده Department of Computer Engineering , Esmaeilpour Mansour نويسنده Department of Computer Engineering, Hamedan Branch, Islamic Azad University, Hamedan, Iran , Abbasi Ghalehtaki Razieh نويسنده
تعداد صفحه :
6
كليدواژه :
Text Categorization , Support vector machines , Turing machine , preprocessing
عنوان كنفرانس :
مجموعه مقالات دوازدهمين كنفرانس سيستم هاي هوشمند ايران
زبان مدرك :
فارسی
چكيده فارسي :
By developing the World Wide Web, text categorization becomes a key technology to deal with and organize a large number of documents. Automatic text categorization is a method to contrast a massive data. The basic phases of text categorization include preprocessing, extracting relevant features against the features in a database, and finally categorizing a set of documents into predefined categories. In this article, we propose a new preprocessing method by Turing Machine. All of four steps in preprocessing such as sentence segmentation, tokenization, stop word removal, and word stemming are done by Turing Machine. Aiming to access the importance of the preprocessing by Turing Machine on the text classification problem, we applied the support vector machine paradigm to the Reuters and PAGOD dataset. Searching for the best document representation, we evaluated and analyzed some known feature reduction, feature subset selection and term weighting. Experiments show that proposed method is more accurate than other methods.
شماره مدرك كنفرانس :
3817034
سال انتشار :
2014
از صفحه :
1
تا صفحه :
6
سال انتشار :
0
لينک به اين مدرک :
بازگشت