Architecture of efficient word processing using Hadoop MapReduce for big data applications

Author

Bichitra Mandal;Srinivas Sethi;Ramesh Kumar Sahoo

Author_Institution

Dept. of CSEA, IGIT Sarang India

fYear

2015

Firstpage

1

Lastpage

6

Abstract

Understanding the characteristics of MapReduce workloads in a Hadoop, is the key in making optimal and efficient configuration decisions and improving the system efficiency. MapReduce is a very popular parallel processing framework for large-scale data analytics which has become an effective method for processing massive data by using cluster of computers. In the last decade, the amount of customers, services and information increasing rapidly, yielding the big data analysis problem for service systems. To keep up with the increasing volume of datasets, it requires efficient analytical capability to process and analyze data in two phases. They are mapping and reducing. Between mapping and reducing phases, MapReduce requires a shuffling to globally exchange the intermediate data generated by the mapping. In this paper, it is proposed a novel shuffling strategy to enable efficient data movement and reduce for MapReduce shuffling with number of consecutive words and their count in the word processor. To improve its scalability and efficiency of word processor in big data environment, repetition of consecutive words count with shuffling is implemented on Hadoop. It can be implemented in a widely-adopted distributed computing platform and also in single word processor big documents using the MapReduce parallel processing paradigm.

Keywords

"Big data","File systems","Text processing","Monitoring","Videos","Databases","Electronic mail"

Publisher

ieee

Conference_Titel

Man and Machine Interfacing (MAMI), 2015 International Conference on

Type

conf

DOI

10.1109/MAMI.2015.7456612

Filename

7456612