مرکز منطقه ای اطلاع رساني علوم و فناوري - Perldoop: Efficient execution of Perl scripts on Hadoop clusters

DocumentCode :

272649

Title :

Perldoop: Efficient execution of Perl scripts on Hadoop clusters

Author :

Abuin, Jose M. ; Pichel, Juan C. ; Pena, Tomas F. ; Gamallo, Pablo ; GarciÌa, Marcos

Author_Institution :

Centro de Investig. en Tecnoloxias da Informacion, Univ. de Santiago de Compostela, Santiago de Compostela, Spain

fYear :

2014

fDate :

27-30 Oct. 2014

Firstpage :

766

Lastpage :

771

Abstract :

Hadoop is one of the most important implementations of the MapReduce programming model. It is written in Java and most of the programs that run on Hadoop are also written in this language. Hadoop also provides an utility to execute applications written in other languages, known as Hadoop Streaming. However, the ease of use provided by Hadoop Streaming comes at the expense of a noticeable degradation in the performance. In this work, we introduce Perldoop, a new tool that automatically translates Hadoop-ready Perl scripts into its Java counterparts, which can be directly executed on Hadoop while improving their performance significantly. We have tested our tool using several Natural Language Processing (NLP) modules, which consist of hundreds of regular expressions, but Perldoop could be used with any Perl code ready to be executed with Hadoop Streaming. Performance results show that Java codes generated using Perldoop execute up to 12x faster than the original Perl modules using Hadoop Streaming. In this way, the new NLP modules are able to process the whole Wikipedia in less than 2 hours using a Hadoop cluster with 64 nodes.

Keywords :

Internet; Java; data handling; natural language processing; parallel processing; Hadoop Streaming; Hadoop clusters; Hadoop-ready Perl scripts; Java codes; MapReduce programming model; NLP modules; Perl code; Perl modules; Perldoop; Wikipedia; natural language processing; Arrays; Internet; Java; Natural language processing; Pragmatics; Programming; Reactive power;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Big Data (Big Data), 2014 IEEE International Conference on

Conference_Location :

Washington, DC

Type :

conf

DOI :

10.1109/BigData.2014.7004303

Filename :

7004303

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=272649