• DocumentCode
    1979968
  • Title

    Dynamic semantic textual document clustering using frequent terms and named entity

  • Author

    Yafooz, Wael M. S. ; Abidin, Siti Z. Z. ; Omar, Normaliza ; Halim, Rosenah A.

  • Author_Institution
    Fac. of Comput. & Math. Sci., Univ. Teknol. MARA, Shah Alam, Malaysia
  • fYear
    2013
  • fDate
    19-20 Aug. 2013
  • Firstpage
    336
  • Lastpage
    340
  • Abstract
    Data is mostly stored in digital format rather than hard copy because the former is safer, more secure, smaller in size, and faster to retrieve than the latter. With the increasing number of electronic documents to be organized for users to obtain knowledge and integrate information, document clustering has been applied by grouping textual documents based on their similarities. Many attempts have been made to perform textual document clustering with highly accurate results (i.e., close to nature classes) and high processing performance. However, such proposed techniques work in batch (or static) mode in which performance tend to be sacrificed with the use of all the terms in the document, at times resulting in overlapping or scalability issues. Few studies that focus on dynamic clustering also reported on performance issues. This paper contributes in the investigation of textual document clustering approaches and highlights the importance of using dynamic clustering in mining frequent terms with included named entity. This method is used to achieve high efficiency and high-quality data clustering. The method is also beneficial to be used in textual document clustering algorithms for many text domain applications.
  • Keywords
    data mining; pattern clustering; text analysis; data storage; document similarities; dynamic semantic textual document clustering; electronic documents; frequent term mining; named entity; text domain applications; textual document grouping; Algorithm design and analysis; Clustering algorithms; Conferences; Data mining; Partitioning algorithms; Semantics; Systems engineering and theory; document clustering; dynamic textual clustering; frequent term; named entity;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    System Engineering and Technology (ICSET), 2013 IEEE 3rd International Conference on
  • Conference_Location
    Shah Alam
  • Print_ISBN
    978-1-4799-1028-1
  • Type

    conf

  • DOI
    10.1109/ICSEngT.2013.6650195
  • Filename
    6650195