DocumentCode :
2846585
Title :
Parallelization and Characterization of Probabilistic Latent Semantic Analysis
Author :
Hong, Chuntao ; Chen, Yurong ; Zheng, Weimin ; Shan, Jiulong ; Yurong Chen ; Zhang, Yimin
Author_Institution :
Tsinghua Univ., Tsinghua
fYear :
2008
fDate :
9-12 Sept. 2008
Firstpage :
628
Lastpage :
635
Abstract :
Probabilistic Latent Semantic Analysis (PLSA) is one of the most popular statistical techniques for the analysis of two-model and co-occurrence data. It has applications in information retrieval and filtering, nature language processing, machine learning from text, and other related areas. However, PLSA is rarely applied to large datasets due to its high computational complexity.This paper presents an optimized and parallelized implementation of PLSA which is capable of processing datasets with 10000 documents in seconds. Compared to the baseline program, our parallelized program can achieve speedup of more than six on an eight-processor machine. The characterization of the parallel program is also presented. The performance analysis of the parallel program indicates that this program is memory intensive and the limited memory bandwidth is the bottleneck for better speedup.
Keywords :
parallel programming; statistical analysis; co-occurrence data; information filtering; information retrieval; limited memory bandwidth; machine learning; nature language processing; parallel program; parallelized program; probabilistic latent semantic analysis; statistical techniques; two-model data; Bandwidth; Computational complexity; Computer science; Costs; Information retrieval; Machine learning; Parallel processing; Parallel programming; Performance analysis; Scheduling algorithm; PLSA; characterization; multi-core; parallelization; tempered EM;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing, 2008. ICPP '08. 37th International Conference on
Conference_Location :
Portland, OR
ISSN :
0190-3918
Print_ISBN :
978-0-7695-3374-2
Electronic_ISBN :
0190-3918
Type :
conf
DOI :
10.1109/ICPP.2008.8
Filename :
4625902
Link To Document :
بازگشت