• DocumentCode
    710263
  • Title

    Using the Web 1T 5-Gram Database for Attribute Selection in Formal Concept Analysis to Correct Overstemmed Clusters

  • Author

    Hall, Guymon R. ; Taghva, Kazem

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Nevada, Las Vegas, Las Vegas, NE, USA
  • fYear
    2015
  • fDate
    13-15 April 2015
  • Firstpage
    651
  • Lastpage
    654
  • Abstract
    As part of information retrieval processes, words are often stemmed to a common root. The Porter Stemming Algorithm operates as a rule-based suffix-removal process. Stemming can be viewed as a way to cluster related words together according to one common stem. Sometimes Porter includes words in a cluster that are un-related. This experiment attempts to correct this using Formal Concept Analysis (FCA). FCA is the process of formulating formal concepts from a given formal context. A formal context consists of objects and attributes, and a binary relation that indicates the attributes possessed by each object. A formal concept is formed by computing the closure of subsets of objects and attributes. Using the Cranfield document collection, this experiment crafted a comparison measure between each word in the stemmed cluster using the Google Web 1T 5-gram data set. Using FCA to correct the clusters, the results showed a varying level of success dependent upon the error threshold allowed.
  • Keywords
    Internet; formal concept analysis; information retrieval; search engines; FCA; Google Web 1T 5-gram data set; Porter stemming algorithm; Web 1T 5-gram database; attribute selection; binary relation; cranfield document collection; error threshold; formal concept analysis; information retrieval processes; overstemmed cluster correction; rule-based suffix-removal process; Algorithm design and analysis; Clustering algorithms; Context; Formal concept analysis; Standards; Testing; Training; formal concept analysis; information retrieval; stemming;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology - New Generations (ITNG), 2015 12th International Conference on
  • Conference_Location
    Las Vegas, NV
  • Print_ISBN
    978-1-4799-8827-3
  • Type

    conf

  • DOI
    10.1109/ITNG.2015.109
  • Filename
    7113548