• DocumentCode
    3095713
  • Title

    A document comparison approach using hybrid keyword and structured full text vocabulary searches

  • Author

    Boonsuk, Kudachamai ; Sophatsathit, Peraphon

  • Author_Institution
    Technopreneurship & Innovation Manage. Program, Chulalongkorn Univ., Bangkok, Thailand
  • Volume
    1
  • fYear
    2011
  • fDate
    11-13 March 2011
  • Firstpage
    252
  • Lastpage
    257
  • Abstract
    This paper proposes a systematic full text search on document using a combined keyword and structural similarity of documents under consideration. The approach operates in two steps. The first step uses a set of designated keywords to acquire potential desired documents by means of an open source tool. The second step builds a suffix tree of frequently used vocabulary to retrieve the most similar documents from the acquired documents. In so doing, variations on contextual matching of full text search can be mitigated, wherein the resulting performance turns out to be quite acceptable. The ultimate goal is to arrive at a platform independent full text search technique that can be realized. The benefits for this scheme are two folds. On the one hand, relevant document can be retrieved as close to the desired document as possible. On the other hand, suspect plagiarism can be identified to some extent, which is dependent on the effectiveness of the proposed approach with plenty of rooms for future improvement. The proposed work will eventually be put to real use for database retrieval in a small business enterprise.
  • Keywords
    query formulation; relevance feedback; text analysis; word processing; contextual matching; document comparison approach; keyword similarity; open source tool; relevant document retrieval; structured full text vocabulary search; suffix tree; systematic full text search; Keyword search; Libraries; Plagiarism; Search engines; Vegetation; Vocabulary; Weight measurement; contextual matching; full text search; plagiarism; structural similarity; suffix tre;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Research and Development (ICCRD), 2011 3rd International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-61284-839-6
  • Type

    conf

  • DOI
    10.1109/ICCRD.2011.5764014
  • Filename
    5764014