Title :
Ontology-based structured cosine similarity in document summarization: with applications to mobile audio-based knowledge management
Author :
Yuan, Soe-Tsyr ; Sun, Jerry
Author_Institution :
MIS Dept., Nat. Chengchi Univ., Taipei, Taiwan
Abstract :
Development of algorithms for automated text categorization in massive text document sets is an important research area of data mining and knowledge discovery. Most of the text-clustering methods were grounded in the term-based measurement of distance or similarity, ignoring the structure of the documents. In this paper, we present a novel method named structured cosine similarity (SCS) that furnishes document clustering with a new way of modeling on document summarization, considering the structure of the documents so as to improve the performance of document clustering in terms of quality, stability, and efficiency. This study was motivated by the problem of clustering speech documents (of no rich document features) attained from the wireless experience oral sharing conducted by mobile workforce of enterprises, fulfilling audio-based knowledge management. In other words, this problem aims to facilitate knowledge acquisition and sharing by speech. The evaluations also show fairly promising results on our method of structured cosine similarity.
Keywords :
classification; data mining; electronic commerce; knowledge management; mobile computing; ontologies (artificial intelligence); pattern clustering; speech recognition; text analysis; B2E M-Commerce; automated text categorization; automatic speech recognition; business-to-employee mobile commerce; data mining; document clustering; document summarization; knowledge acquisition; knowledge discovery; mobile audio-based knowledge management; ontology-based structured cosine similarity; spherical K-means clustering; Automatic speech recognition; Business; Customer service; Data mining; Knowledge acquisition; Knowledge management; Ontologies; Speech recognition; Sun; Text categorization; B2E M-Commerce; spherical K-means clustering; structured cosine similarity (SCS); text categorization; text summarization; Algorithms; Artificial Intelligence; Cluster Analysis; Databases, Factual; Documentation; Information Storage and Retrieval; Natural Language Processing; Online Systems; Pattern Recognition, Automated; Speech Recognition Software; Telecommunications; User-Computer Interface; Vocabulary, Controlled;
Journal_Title :
Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on
DOI :
10.1109/TSMCB.2005.850153