DocumentCode
537583
Title
Optimal Hash List for Word Frequency Analysis
Author
Sheng-Lan Peng
Author_Institution
Dept. of Inf. Eng., JDZ Ceramic Inst., Jingdezhen, China
Volume
1
fYear
2010
fDate
23-24 Oct. 2010
Firstpage
242
Lastpage
245
Abstract
Word frequency analysis plays an essential role in many data mining tasks of large-scale data set based on text corpus, and hash list is a very simple but efficient structure for frequent pattern discovering. In this paper, a Poisson approximation approach is exploited to analyze the space efficiency of hash list under different parameters on probability. Based on our theoretical model, an optimal parameter setting for hash list is given. Experimental result of real data shows that hash list with the optimal parameter can reach minimum or nearly minimum memory cost.
Keywords
approximation theory; stochastic processes; text analysis; word processing; Poisson approximation approach; data mining tasks; frequent pattern discovery; hash list; text corpus; word frequency analysis; Poisson approximation; hash list; space efficiency; word frequency;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Information Systems and Mining (WISM), 2010 International Conference on
Conference_Location
Sanya
Print_ISBN
978-1-4244-8438-6
Type
conf
DOI
10.1109/WISM.2010.59
Filename
5662319
Link To Document