Title :
DH-TRIE frequent pattern mining on Hadoop using JPA
Author :
Yang, Lai ; Shi, Zhongzhi ; Xu, Li D. ; Liang, Fan ; Kirsh, Ilan
Author_Institution :
Key Lab. of Intell. Inf. Process., Inst. of Comput. Technol., Beijing, China
Abstract :
The FPgrowth is a famous frequent pattern´s algorithm in data mining when working with high-dimensional, large-scale data sets. It is also known as great complexity on memory for the recursively processing. In general, FPgrowth cannot handle large-scale data set unless dividing a whole data set into small blocks. Based on Hadoop, the open cloud computing model, a distributed DH-TRIE frequent pattern algorithm using JPA is proposed, which solved the three problems (globalization, random-write and duration). The algorithm is shown good flexibility and scalability by comparisons to mahout project. By applied to a virtualization platform Vega Cloud, the algorithm will be used in far-ranging situations.
Keywords :
Java; application program interfaces; cloud computing; data mining; pattern clustering; FPgrowth; Hadoop; JPA; Vega cloud; data mining; distributed DH-TRIE frequent pattern algorithm; duration problem; far-ranging situations; globalization problem; high dimensional large scale data sets; open cloud computing model; random write problem; recursive processing; scalability; virtualization platform; Cloud computing; Data mining; Data models; Indexing; Java; Programming; Cloud computing; Data Mining; FPgrowth; Hadoop; JPA; ORM; virtual machine;
Conference_Titel :
Granular Computing (GrC), 2011 IEEE International Conference on
Conference_Location :
Kaohsiung
Print_ISBN :
978-1-4577-0372-0
DOI :
10.1109/GRC.2011.6122552