Title :
Topic identification of Arabic noisy texts based on KNN
Author :
Abainia, Kheireddine ; Ouamour, Siham ; Sayoud, Halim
Author_Institution :
USTHB Univ., Algiers, Algeria
Abstract :
This paper deals with the problem of topic identification of Arabic noisy texts, which is an important research field, regarding the growing amount of shared textual information in the world. The dataset used in this survey is constructed by collecting several corrupted Arabic texts from different discussion forums related to six different topics. The proposed algorithms use the k-nearest neighbor classifier based on the Tf-Idf to identify the texts topics. Furthermore, two training schemes are proposed for the creation of the reference profiles. Moreover, several distance measures are proposed and employed to compute the similarity between texts/topics. Results show that the proposed distance measures are quite interesting in topic identification.
Keywords :
learning (artificial intelligence); natural language processing; pattern classification; text analysis; Arabic noisy texts; KNN; Tf-Idf; corrupted Arabic texts; distance measures; k-nearest neighbor classifier; texts similarity; texts topic identification; textual information; topics similarity; Accuracy; Histograms; Neural networks; Noise measurement; Text categorization; Training; Arabic Text Categorization; K Nearest Neighbor; KNN; Natural Language Processing; TF-IDF; Text Categorization; Topic Identification;
Conference_Titel :
Information and Communication Technology Research (ICTRC), 2015 International Conference on
Conference_Location :
Abu Dhabi
DOI :
10.1109/ICTRC.2015.7156429