• DocumentCode
    1787927
  • Title

    The relationship of text categorization using Dewey Decimal Classification techniques

  • Author

    Watthananon, Julaluk

  • Author_Institution
    Dept. of Math. & Comput. Sci., Rajamangala Univ. of Technol., Thanyaburi, Thailand
  • fYear
    2014
  • fDate
    18-21 Nov. 2014
  • Firstpage
    72
  • Lastpage
    77
  • Abstract
    Now a day, the massive amount of data and information (recently termed as “Big Data”) causes accessibility and retrieval problems if poorly managed. This is due to their relational structure which is more complicate, unexplainable, and unanalyzable with simple or traditional methods. The uniform display of these data and information is also difficult due to their diversified formats. Bag of Words (BOW), the mostly used data sorting method, is although simple but the significance of synonymity is overlooked. The objective of this research study is to propose method in determining massively scattered data (as electronic documents). The linking of related data is also supported by the application of Dewey Decimal Classification (DDC) technique. DDC was employed in data processing, analyzing, and displaying with appropriate method in form of Mind Map. The accuracy test was performed on the data from the “Wikipedia Selection for schools”, a sub version of Wikipedia, to determine the efficiency among four models: DDC: Dewey decimal classification, SVM: Support Vector Machine, K-Mean Clustering and Hierarchical Clustering. The results indicated that DDC yielded the most accuracy (75.02%), followed by the Hierarchical models (74.66%), while both K-Mean and SVM yielded the similar accuracy (72.66%). And the time in process is K-Mean Clustering was best time more than other models (16.09 second).
  • Keywords
    pattern classification; pattern clustering; support vector machines; text analysis; DDC technique; Dewey decimal classification technique; SVM; Wikipedia Selection for schools; data analyzing; data displaying; data linking; data processing; electronic documents; hierarchical clustering; hierarchical models; k-mean clustering; mind map; support vector machine; text categorization; Accuracy; Electronic publishing; Encyclopedias; Equations; Mathematical model; Support vector machines; Big Data; Dewey Decimal Classification; Knowledge Management; Mind Map;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    ICT and Knowledge Engineering (ICT and Knowledge Engineering), 2014 12th International Conference on
  • Conference_Location
    Bangkok
  • ISSN
    2157-0981
  • Print_ISBN
    978-1-4799-8025-3
  • Type

    conf

  • DOI
    10.1109/ICTKE.2014.7001538
  • Filename
    7001538