Title :
A Simple Study of Webpage Text Classification Algorithms for Arabic and English Languages
Author :
Al-Ghuribi, Sumaia Mohammed ; Alshomrani, Saleh
Author_Institution :
Fac. of Comput. & Inf. Technol., King Abdulaziz Univ., Jeddah, Saudi Arabia
Abstract :
Webpage text Classification is an important problem that has been studied through different approaches and algorithms. It aims to assign a predefined category to a Webpage based on its content and linguistic features. It has many applications such as word sense disambiguation, document indexing, text filtering, Webpages hierarchical categorization and document organization. This study is a part of a work in progress, in which we are targeting to develop Bi-languages algorithm for classifying Arabic and English Webpage text and can perform accurate and efficient in both languages. It aims at providing a simple overview of many approaches that constructed for classifying Arabic and English Webpage documents. In this survey, the widely used algorithms for text classification are given with a comparison of the recent researches in classification field for Arabic and English languages to conclude which is the best algorithm that we can apply for both Arabic and English Languages.
Keywords :
Internet; classification; indexing; natural language processing; text analysis; Arabic Webpage text; Arabic language; English Language; English Webpage text; Webpage text classification algorithm; Webpages hierarchical categorization; bilanguages algorithm; document indexing; document organization; text filtering; word sense disambiguation; Accuracy; Classification algorithms; Decision trees; Niobium; Support vector machines; Text categorization; Web pages;
Conference_Titel :
IT Convergence and Security (ICITCS), 2013 International Conference on
Conference_Location :
Macao
DOI :
10.1109/ICITCS.2013.6717784