Title :
Research of Massive Internet Text Data Real-Time Loading and Index System
Author :
Han, Weihong ; Jia, Yan ; Yang, Shuqiang
Author_Institution :
Comput. Sch., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
With rapid development of the Internet and communication technology, massive text data has been accumulated in Internet, including text data on network pages, emails, instant messengers and etc. Requirements on increasing data volume, real-time data-loading and creating text indexes pose enormous challenges to data-loading techniques. This paper presents a data loading system in real time, text-loader that is used in ITSR (Internet text data real-time storage and retrieval system). Text-loader consists of an efficient algorithm for bulk data loading and exchange partition mechanism, increasing text index creation algorithm, optimized parallelism, and guidelines for system tuning. Performance studies show the positive effects of these techniques with loading speed of every Cluster, increasing from 220 million records per day to 1.2 billion per day, and achieving the top loading speed of 6TB data when 10 Clusters are in parallel. This framework offers a promising approach for loading other large and complex text databases.
Keywords :
information networks; information retrieval; Internet text data real-time storage; index system; real-time data-loading; retrieval system; Clustering algorithms; Computer networks; IP networks; Indexes; Information retrieval; Internet; Optimization methods; Partitioning algorithms; Real time systems; Relational databases; data loading; exchange partition; massive data; parallel schedule; text index;
Conference_Titel :
INC, IMS and IDC, 2009. NCM '09. Fifth International Joint Conference on
Conference_Location :
Seoul
Print_ISBN :
978-1-4244-5209-5
Electronic_ISBN :
978-0-7695-3769-6
DOI :
10.1109/NCM.2009.414