DocumentCode :
1940348
Title :
Towards automatic column-based data object clustering for multilingual databases
Author :
Yafooz, Wael M S ; Abidin, Siti Z Z ; Omar, Nasiroh
Author_Institution :
Fac. of Comput. & Math. Sci., Univ. Teknol. MARA, Shah Alam, Malaysia
fYear :
2011
fDate :
25-27 Nov. 2011
Firstpage :
415
Lastpage :
420
Abstract :
The amount of data in all computer applications is growing tremendously. As a result, the organization of the huge data is crucial. Recently, many researchers consider clustering as one of the important approaches in handling data for wide range of research domains. The examples include Topic Detection and Tracking (TDT), Multilingual Document Clustering, Multilingual News Clustering, Text Clustering and Web Record. Normally, data clustering is time consuming and challenging since they involve heavy programming or scripting. In online news, data clustering analysis is very much needed as the nature of the news across the globe is dynamically changing in every second. The news can come from any web sources in the form of multilingual news. This paper proposes system architecture for an automatic data object clustering in multilingual database for online news, web record and text mining. The architecture provides an overview of a virtual scheme that handles data objects within the database tables as part of the database management system. The proposed technique architecture will provide the platform for quick extraction, data arrangement, data grouping based on pattern similarities. Thus, it will improve query processing performance in multilingual databases without the need to code or script for interface programming. This is the first attempt to apply the data clustering technique prior to data extraction in any database application in the form of semi-structured and structured data (web record).
Keywords :
Internet; data analysis; data mining; database management systems; electronic publishing; natural language processing; pattern clustering; query processing; text analysis; Web record; Web sources; automatic column-based data object clustering; data arrangement; data clustering analysis; data extraction; data grouping; data handling; database management system; database tables; interface programming; multilingual database; multilingual news clustering; online news; pattern similarities; query processing performance; semistructured data; structured data; text mining; virtual scheme; Computer architecture; Data mining; Distributed databases; Object oriented modeling; Query processing; Web pages; Attributes Data; Clustering; Column-based; Database; Multilingual; Online news; Web Record;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Control System, Computing and Engineering (ICCSCE), 2011 IEEE International Conference on
Conference_Location :
Penang
Print_ISBN :
978-1-4577-1640-9
Type :
conf
DOI :
10.1109/ICCSCE.2011.6190562
Filename :
6190562
Link To Document :
بازگشت