DocumentCode :
2345707
Title :
Similarity patterns in language
Author :
Helfman, Jonathan Isaac
Author_Institution :
AT&T Bell Labs., Murray Hill, NJ, USA
fYear :
1994
fDate :
4-7 Oct 1994
Firstpage :
173
Lastpage :
175
Abstract :
Dotplot is a technique for visualizing patterns of string matches in millions of lines of text and code. Patterns may be explored interactively or detected automatically. Applications include text analysis (author identification, plagiarism detection, translation alignment, etc.), software engineering (module and version identification, subroutine categorization, redundant code identification, etc.), and information retrieval (identification of similar records in results of queries). Patterns are interpreted though a visual language. Squares identify unordered matches (documents with lots of matching words or subroutines with lots of matching symbols), while diagonals identify ordered matches (copies, versions, and translations). Patterns of squares and diagonals have more complex interpretations that identify subtler relationships
Keywords :
information retrieval; linguistics; pattern recognition; visual languages; visual programming; word processing; dotplot technique; information retrieval; ordered matches; similarity patterns; software engineering; string matches; text analysis; unordered matches; visual language; Concatenated codes; Displays; Information retrieval; Pattern matching; Reconstruction algorithms; Text analysis; Visualization; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Visual Languages, 1994. Proceedings., IEEE Symposium on
Conference_Location :
St. Louis, MO
Print_ISBN :
0-8186-6660-9
Type :
conf
DOI :
10.1109/VL.1994.363626
Filename :
363626
Link To Document :
بازگشت