DocumentCode :
2260278
Title :
Automatic Evaluation of Document Classification Using N-Gram Statistics
Author :
Choi, Dongjin ; Ko, Byeongkyu ; Lee, Eunji ; Hwang, Myunggwon ; Kim, Pankoo
Author_Institution :
Dept. of Comput. Eng., Chosun Univ., Gwangju, South Korea
fYear :
2012
fDate :
26-28 Sept. 2012
Firstpage :
739
Lastpage :
742
Abstract :
Due to the development of World Wide Web technologies, people are living in the place flooding trillions of web pages in every moment. The amount of web size has been increasing dramatically. For this reason, it is getting more difficult to find relevant web documents corresponding to what users want to read. Classifying documents into predefined categories is one of the most important tasks in Natural Language Processing field. Over the years, many statistical and linguistical approaches have been applied to overcome traditional classification machine. However, it still remains in unsolved problem. There is a no perfect solution to machine understand human language yet. We have to consider every possibility for making machine think like human does. In this paper, we propose a method for classifying textural document using n-gram co-occurrence statistics which have a great possibility to find similarities between given documents. We also compare our proposed method with traditional method suggested by Keselj. This paper only covers simple approaches and still needs more sophisticated experiments. However, the performance using this method is better than the Keselj approach.
Keywords :
Web sites; computational linguistics; natural language processing; pattern classification; statistical analysis; text analysis; Keselj approach; Web documents; Web pages; World Wide Web technologies; classification machine; document classification automatic evaluation; linguistical approach; n-gram co-occurrence statistics; natural language processing field; statistical approach; textural document classification; Bioinformatics; Computer vision; Computers; Data mining; Humans; Semantics; Training; N-gram; Natural Language Processing; document classification; formatting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Network-Based Information Systems (NBiS), 2012 15th International Conference on
Conference_Location :
Melbourne, VIC
Print_ISBN :
978-1-4673-2331-4
Type :
conf
DOI :
10.1109/NBiS.2012.96
Filename :
6354916
Link To Document :
بازگشت