• DocumentCode
    537583
  • Title

    Optimal Hash List for Word Frequency Analysis

  • Author

    Sheng-Lan Peng

  • Author_Institution
    Dept. of Inf. Eng., JDZ Ceramic Inst., Jingdezhen, China
  • Volume
    1
  • fYear
    2010
  • fDate
    23-24 Oct. 2010
  • Firstpage
    242
  • Lastpage
    245
  • Abstract
    Word frequency analysis plays an essential role in many data mining tasks of large-scale data set based on text corpus, and hash list is a very simple but efficient structure for frequent pattern discovering. In this paper, a Poisson approximation approach is exploited to analyze the space efficiency of hash list under different parameters on probability. Based on our theoretical model, an optimal parameter setting for hash list is given. Experimental result of real data shows that hash list with the optimal parameter can reach minimum or nearly minimum memory cost.
  • Keywords
    approximation theory; stochastic processes; text analysis; word processing; Poisson approximation approach; data mining tasks; frequent pattern discovery; hash list; text corpus; word frequency analysis; Poisson approximation; hash list; space efficiency; word frequency;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Information Systems and Mining (WISM), 2010 International Conference on
  • Conference_Location
    Sanya
  • Print_ISBN
    978-1-4244-8438-6
  • Type

    conf

  • DOI
    10.1109/WISM.2010.59
  • Filename
    5662319