DocumentCode :
2916933
Title :
Efficiency of data structures for detecting overlaps in digital documents
Author :
Monostori, Krisztih ; Zaslavsky, Arkady ; Schmidt, Heim
Author_Institution :
Sch. of Comput. Sci. & Software Eng., Monash Univ., Melbourne, Vic., Australia
fYear :
2001
fDate :
2001
Firstpage :
140
Lastpage :
147
Abstract :
This paper analyses the efficiency of different data structures for detecting overlap in digital documents. Most existing approaches use some hash function to reduce the space requirements for their indices of chunks. Since a hash function can produce the same value for different chunks, false matches are possible. In this paper we propose an algorithm that can be used for eliminating those false matches. This algorithm uses a suffix tree structure, which is space consuming. We define a modified suffix tree that only considers chunks starting at the beginning of words and we show how the algorithm can work on this structure. We can alternatively reduce space requirements of a suffix tree by converting it to a directed acyclic graph. We show that suffix link information can be preserved in this new structure and the matching statistics algorithm still works with those modifications that we propose
Keywords :
copy protection; directed graphs; text analysis; tree data structures; data structures; digital documents; directed acyclic graph; hash function; matching statistics algorithm; modified suffix tree; overlap detection; space requirements; suffix link information; suffix tree structure; Computer science; Data structures; Hardware; Plagiarism; Software engineering; Software libraries; Statistical distributions; Tree data structures; Tree graphs; Watermarking;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science Conference, 2001. ACSC 2001. Proceedings. 24th Australasian
Conference_Location :
Gold Coast, Qld.
ISSN :
1530-0900
Print_ISBN :
0-7695-0963-0
Type :
conf
DOI :
10.1109/ACSC.2001.906635
Filename :
906635
Link To Document :
بازگشت