Title :
HETA: Hadoop environment for text analysis
Author :
Nicolas, Vincent ; Da Silva, Alzennyr ; Picard, Marie-Luce
Author_Institution :
Lumiere Univ. Lyon 2, Lyon, France
Abstract :
As a leading energy player, EDF (Électricité de France) actively works on new techniques to better understand customers´ voice. In order to process unstructured and semi-structured massive text data, we have developed HETA, an application based on open source solutions which offers different text processing steps (document engineering, text analysis, clustering and visualization) on top of Hadoop. It is based on Mahout and uses sigma.js library for the visualization of the results as interactive graphs. HETA presents an ergonomic Web interface and is able to analyze any kind of unstructured (blog comments) and semi-structured (tweets, articles, etc.) massive text data. Being a modular and extensible application, HETA can be easily enhanced with the addition of advanced text mining methods.
Keywords :
data handling; data mining; ergonomics; graphical user interfaces; interactive systems; parallel processing; pattern clustering; public domain software; social networking (online); software libraries; text analysis; Électricité de France; EDF; HETA; Hadoop environment; Mahout; TEXT ANALYSIS; Web articles; blog comments; customer voice; document engineering; ergonomic Web interface; interactive graphs; open source solutions; semistructured-massive text data processing; sigma.js library; text analysis; text clustering; text mining methods; text visualization; tweets; unstructured-massive text data processing; Clustering algorithms; Data visualization; Electronic publishing; Information services; Internet; Text analysis; Text mining; Hadoop; Mahout; Text Mining;
Conference_Titel :
Computational Intelligence for Multimedia Understanding (IWCIM), 2014 International Workshop on
Conference_Location :
Paris
DOI :
10.1109/IWCIM.2014.7008803