• DocumentCode
    3455602
  • Title

    Term Clustering and Confidence Measurement in Document Clustering

  • Author

    Csorba, Kristóf ; Vajk, Istváin

  • Author_Institution
    Dept. of Autom. & Appl. Inf., Budapest Univ. of Technol. & Econ., Budapest
  • fYear
    2006
  • fDate
    20-22 Aug. 2006
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    A novel topic based document clustering technique is presented in the paper for situations, where there is no need to assign all the documents to the clusters. Under such conditions the clustering system can provide a much cleaner result by rejecting the classification of documents with ambiguous topic. This is achieved by applying a confidence measurement for every classification result and by discarding documents with a confidence value less than a predefined lower limit. This means that our system returns the classification for a document only if it feels sure about it If not, the document is marked as "unsure". Beside this ability the confidence measurement allows the use of a much stronger term filtering, performed by a novel, supervised term cluster creation and term filtering algorithm, which is presented in this paper as well.
  • Keywords
    classification; document handling; information filtering; learning (artificial intelligence); pattern clustering; ambiguous topic; confidence measurement; document classification; document clustering system; supervised term cluster creation; supervised term filtering algorithm; Automation; Feature extraction; Filtering algorithms; Frequency; Informatics; Information filtering; Information filters; Paper technology; Performance evaluation; Supervised learning; confidence; document clustering; supervised learning; term cluster creation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Cybernetics, 2006. ICCC 2006. IEEE International Conference on
  • Conference_Location
    Budapest
  • Print_ISBN
    1-4244-0071-6
  • Electronic_ISBN
    1-4244-0072-4
  • Type

    conf

  • DOI
    10.1109/ICCCYB.2006.305694
  • Filename
    4097655