DocumentCode :
1795932
Title :
Automated document classification for news article in Bahasa Indonesia based on term frequency inverse document frequency (TF-IDF) approach
Author :
Hakim, Ari Aulia ; Erwin, Alva ; Eng, Kho I. ; Galinium, Maulahikmah ; Muliady, Wahyu
Author_Institution :
Fac. of Eng. & Inf. Technol., Swiss German Univ., Tangerang, Indonesia
fYear :
2014
fDate :
7-8 Oct. 2014
Firstpage :
1
Lastpage :
4
Abstract :
The exponential growth of the data may lead us to the information explosion era, an era where most of the data cannot be managed easily. Text mining study is believed to prevent the world from entering that era. One of the text mining studies that may prevent the explosion era is text classification. It is a way to classify articles into several predefined categories. In this research, the classifier implements TF-IDF algorithm. TF-IDF is an algorithm that counts the word weight by considering frequency of the word (TF) and in how many files the word can be found (IDF). Since the IDF could see the in how many files a term can be found, it can control the weight of each word. When a word can be found in so many files, it will be considered as an unimportant word. TF-IDF has been proven to create a classifier that could classify news articles in Bahasa Indonesia in a high accuracy; 98.3%.
Keywords :
data mining; electronic publishing; pattern classification; text analysis; Bahasa Indonesia; TF-IDF approach; automated document classification; news article classification; term frequency inverse document frequency approach; text mining; Accuracy; Classification algorithms; Computers; Dictionaries; Explosions; Text categorization; Text mining; TF-IDF approach; Text Classification; Text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology and Electrical Engineering (ICITEE), 2014 6th International Conference on
Conference_Location :
Yogyakarta
Print_ISBN :
978-1-4799-5302-8
Type :
conf
DOI :
10.1109/ICITEED.2014.7007894
Filename :
7007894
Link To Document :
بازگشت