• DocumentCode
    1597880
  • Title

    E-VSM: Novel text representation model to capture contex-based closeness between two text documents

  • Author

    Bhakkad, Ankit ; Dharmadhikari, S.C. ; Emmanuel, M. ; Kulkarni, Parag

  • Author_Institution
    Dept. of IT, Pune Institute of Computer Technology, India
  • fYear
    2013
  • Firstpage
    345
  • Lastpage
    348
  • Abstract
    In many applications of Information Retrieval and Text Mining, there is need for an intelligent system to calculate the closeness between two text documents. In this, representation of text document in terms of mathematical object plays vital role. Vector Space Model is most popular method to represent text document in mathematical form but it is lossy, loses ordering of terms in text document in turn the context of it. Existing measures of closeness between two text documents are Cosine Similarity, Euclidean Distance etc. which are efficient but lacks in consideration of context of document. Through this paper we propose E-VSM: Enhanced-Vector Space Model to overcome limitations of original Vector Space Model and new ‘Density-based Clustering’ approach to calculate context-based closeness between two text documents which outperforms state of art in terms of accuracy. Experiments show good results specially when text document to be compared is very much close to a particular region of target text document.
  • Keywords
    Integrated optics; Noise; Optical imaging; Optical noise; Context-Based Closeness; Density-Based Clustering; Intelligent System; Vector Space Model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems and Control (ISCO), 2013 7th International Conference on
  • Conference_Location
    Coimbatore, Tamil Nadu, India
  • Print_ISBN
    978-1-4673-4359-6
  • Type

    conf

  • DOI
    10.1109/ISCO.2013.6481176
  • Filename
    6481176