• DocumentCode
    2456961
  • Title

    Clustering of Short Strings in Large Databases

  • Author

    Kazimianec, Michail ; Mazeika, Arturas

  • Author_Institution
    Fac. of Comput. Sci., Free Univ. of Bozen-Bolzano, Bolzano, Italy
  • fYear
    2009
  • fDate
    Aug. 31 2009-Sept. 4 2009
  • Firstpage
    368
  • Lastpage
    372
  • Abstract
    A novel method CLOSS intended for textual databases is proposed. It successfully identifies misspelled string clusters, even if the cluster border is not prominent. The method uses q-gram approach to represent data and a string proximity graph to find the cluster. Contribution refers to short string clustering in text mining, when the proximity graph has multiple horizontal lines or the line is not present.
  • Keywords
    data mining; pattern clustering; string matching; text analysis; very large databases; CLOSS; cluster border; clustering of short strings; large databases; q-gram approach; string proximity graph; text mining; textual databases; Application software; Clustering methods; Computer science; Databases; Detection algorithms; Expert systems; Robustness; Smoothing methods; Tagging; Text mining; clustering; q-grams; short strings;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Application, 2009. DEXA '09. 20th International Workshop on
  • Conference_Location
    Linz
  • ISSN
    1529-4188
  • Print_ISBN
    978-0-7695-3763-4
  • Type

    conf

  • DOI
    10.1109/DEXA.2009.73
  • Filename
    5337105