Title :
Segmentation of Chinese Web Text Based on Spark
Author_Institution :
Univ. of Electron. Sci. &
Abstract :
Massive amounts of data generated by network to be analysed and processed on a computer takes plenty of time. It can not meet people´s needs. In order to break through the bottleneck of the speed of segmentation, this paper uses the spark cluster, and applies the spark programming ideas to the processing of Chinese word segmentation, so that the Chinese word segmentation technology is implemented in the distributed platform. The research can be based on the guarantee of the accuracy of the original word segmentation and improve the processing speed of Chinese word segmentation significantly, and it is feasible and effective to deal with large amount of Chinese information.
Keywords :
"Sparks","Data processing","Dictionaries","Programming","Distributed databases","Internet","Computer architecture"
Conference_Titel :
Computational Intelligence and Design (ISCID), 2015 8th International Symposium on
Print_ISBN :
978-1-4673-9586-1
DOI :
10.1109/ISCID.2015.250