• DocumentCode
    2200226
  • Title

    Semi-supervised document clustering using Seeds affinity propagation and consensus algorithm in multi-domain settings

  • Author

    Radha, R. ; Mirnalinee, T.T. ; Trueman, T.E.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Anna Univ. of Technol., Chennai, India
  • fYear
    2012
  • fDate
    19-21 April 2012
  • Firstpage
    85
  • Lastpage
    90
  • Abstract
    Domain adaptation is the process of transferring the knowledge to a different domain from a source domain but they are related. In this paper, we first apply `Consensus Regularization´ based algorithm to merge multiple source domain to a single source domain. Then we propose multi-domain adaptation in document clustering using Seeds affinity propagation and Consensus Regularization Algorithm. A semi-supervised document clustering algorithm, called Seeds Affinity Propagation (SAP) is applied based on an effective clustering algorithm Affinity Propagation (AP). The labeled and unlabeled documents are preprocessed through various processes such as stop words removal, word stemming and finding word frequency and given as the input. After pre-processing, structured documents are obtained. Tri-set Computation, a feature extraction technique is used to find out the features through Co-feature set, Unilateral feature set and Significant Co-feature set methods. Then calculate the similarity measure of the documents and assigning the label to the documents if they are matched. Finally clustered documents are obtained through seeds affinity propagation via similarity measurement. Further the performance of the algorithm can be evaluated and improved.
  • Keywords
    document handling; learning (artificial intelligence); merging; pattern clustering; clustering algorithm affinity propagation; co-feature set method; consensus regularization based algorithm; domain adaptation process; domain merging; feature extraction technique; multidomain setting; multiple source domain; seeds affinity propagation; semi-supervised document clustering; significant co-feature set method; similarity measurement; single source domain; stop words removal process; tri-set computation technique; unilateral feature set method; word frequency finding process; word stemming process; Algorithm design and analysis; Clustering algorithms; Computer science; Convergence; Educational institutions; Entropy; Feature extraction; Consensus Regularization; Document Clustering; Multi-domain adaptation; Seeds Affinity Propagation (SAP); Tri set Computation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Recent Trends In Information Technology (ICRTIT), 2012 International Conference on
  • Conference_Location
    Chennai, Tamil Nadu
  • Print_ISBN
    978-1-4673-1599-9
  • Type

    conf

  • DOI
    10.1109/ICRTIT.2012.6206802
  • Filename
    6206802