DocumentCode
2456961
Title
Clustering of Short Strings in Large Databases
Author
Kazimianec, Michail ; Mazeika, Arturas
Author_Institution
Fac. of Comput. Sci., Free Univ. of Bozen-Bolzano, Bolzano, Italy
fYear
2009
fDate
Aug. 31 2009-Sept. 4 2009
Firstpage
368
Lastpage
372
Abstract
A novel method CLOSS intended for textual databases is proposed. It successfully identifies misspelled string clusters, even if the cluster border is not prominent. The method uses q-gram approach to represent data and a string proximity graph to find the cluster. Contribution refers to short string clustering in text mining, when the proximity graph has multiple horizontal lines or the line is not present.
Keywords
data mining; pattern clustering; string matching; text analysis; very large databases; CLOSS; cluster border; clustering of short strings; large databases; q-gram approach; string proximity graph; text mining; textual databases; Application software; Clustering methods; Computer science; Databases; Detection algorithms; Expert systems; Robustness; Smoothing methods; Tagging; Text mining; clustering; q-grams; short strings;
fLanguage
English
Publisher
ieee
Conference_Titel
Database and Expert Systems Application, 2009. DEXA '09. 20th International Workshop on
Conference_Location
Linz
ISSN
1529-4188
Print_ISBN
978-0-7695-3763-4
Type
conf
DOI
10.1109/DEXA.2009.73
Filename
5337105
Link To Document