Title :
The relationship of text categorization using Dewey Decimal Classification techniques
Author :
Watthananon, Julaluk
Author_Institution :
Dept. of Math. & Comput. Sci., Rajamangala Univ. of Technol., Thanyaburi, Thailand
Abstract :
Now a day, the massive amount of data and information (recently termed as “Big Data”) causes accessibility and retrieval problems if poorly managed. This is due to their relational structure which is more complicate, unexplainable, and unanalyzable with simple or traditional methods. The uniform display of these data and information is also difficult due to their diversified formats. Bag of Words (BOW), the mostly used data sorting method, is although simple but the significance of synonymity is overlooked. The objective of this research study is to propose method in determining massively scattered data (as electronic documents). The linking of related data is also supported by the application of Dewey Decimal Classification (DDC) technique. DDC was employed in data processing, analyzing, and displaying with appropriate method in form of Mind Map. The accuracy test was performed on the data from the “Wikipedia Selection for schools”, a sub version of Wikipedia, to determine the efficiency among four models: DDC: Dewey decimal classification, SVM: Support Vector Machine, K-Mean Clustering and Hierarchical Clustering. The results indicated that DDC yielded the most accuracy (75.02%), followed by the Hierarchical models (74.66%), while both K-Mean and SVM yielded the similar accuracy (72.66%). And the time in process is K-Mean Clustering was best time more than other models (16.09 second).
Keywords :
pattern classification; pattern clustering; support vector machines; text analysis; DDC technique; Dewey decimal classification technique; SVM; Wikipedia Selection for schools; data analyzing; data displaying; data linking; data processing; electronic documents; hierarchical clustering; hierarchical models; k-mean clustering; mind map; support vector machine; text categorization; Accuracy; Electronic publishing; Encyclopedias; Equations; Mathematical model; Support vector machines; Big Data; Dewey Decimal Classification; Knowledge Management; Mind Map;
Conference_Titel :
ICT and Knowledge Engineering (ICT and Knowledge Engineering), 2014 12th International Conference on
Conference_Location :
Bangkok
Print_ISBN :
978-1-4799-8025-3
DOI :
10.1109/ICTKE.2014.7001538