مرکز منطقه ای اطلاع رساني علوم و فناوري - Robust Language Identification of Noisy Texts: Proposal of Hybrid Approaches

DocumentCode :

174895

Title :

Robust Language Identification of Noisy Texts: Proposal of Hybrid Approaches

Author :

Abainia, K. ; Ouamour, S. ; Sayoud, H.

Author_Institution :

USTHB Univ., Algiers, Algeria

fYear :

2014

fDate :

1-5 Sept. 2014

Firstpage :

228

Lastpage :

232

Abstract :

This paper deals with the problem of automatic language identification of noisy texts, which represents an important task in natural language processing. Actually, there exist several works in this field, which are based on statistical and machine learning approaches for different categories of texts. Unfortunately, most of the proposed methods work fine on clean texts and/or long texts, but often present a failure when the text is corrupted or too short. In this research work, we use a typical dataset consisting of short texts collected from several discussion forums containing several types of noises. Our dataset contains 32 different languages, where we notice that some languages are quite different while some others are too closed. In this investigation, we propose two types of methods to identify the text language: term-based method and character-based method. Moreover, we propose two hybrid methods to enhance the performances of those techniques. Experiments show that the proposed hybrid methods are quite interesting and present good language identification performances in noisy texts.

Keywords :

natural language processing; text analysis; automatic language identification; character-based method; natural language processing; noisy texts; term-based method; Conferences; Databases; Expert systems; Automatic Language Identification; Hybrid Approach; Natural Language Processing; Noisy Text; Text categorizationn;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Database and Expert Systems Applications (DEXA), 2014 25th International Workshop on

Conference_Location :

Munich

ISSN :

1529-4188

Print_ISBN :

978-1-4799-5721-7

Type :

conf

DOI :

10.1109/DEXA.2014.55

Filename :

6974854

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=174895