DocumentCode :
1721543
Title :
Enhancement tools for Arabic web search
Author :
Yahya, Adnan H. ; Salhi, Ali Y.
Author_Institution :
Dept. of Comput. Syst. Eng., Birzeit Univ., Birzeit, Palestinian Authority
fYear :
2011
Firstpage :
71
Lastpage :
76
Abstract :
The Arabic web content is growing rapidly and the need for its efficient management is gaining importance and the morphological complexity of Arabic raises many challenges in this regard. This paper reports on some of our work aimed at designing text mining and query pre-processing tools that are able to efficiently process and search large quantities of Arabic web data. In our research we try to address the challenges Arabic poses for natural language processing (NLP) and information retrieval: root extraction, language detection, and Arabic query correction, suggestion and expansion. While not reported in detail here, we are also developing tools for automatic Arabic document categorization. All through, we employ a statistical/Corpus-based approach based on data obtained from a variety of sources. Based on corpus statistics we constructed databases of words and their frequencies as single, double and triple expressions and used that as the infrastructure for the well structured search aid tools that are able to handle the sophisticated nature of Arabic, and capable of being integrated into existing web search engines and document processing systems. We also utilize context analysis and spellchecking of the user queries to enable a more complete and efficient search. While the results reported here are promising, they must be viewed as work in progress, still in need of testing, refining, integration and deployment in real life settings.
Keywords :
Internet; computational complexity; data mining; natural language processing; query processing; statistical analysis; text analysis; Arabic Web search; Arabic query correction; corpus based approach; enhancement tools; information retrieval; language detection; morphological complexity; natural language processing; query preprocessing tools; root extraction; statistical approach; text mining; Databases; Keyboards; Natural language processing; Search engines; Shape; Testing; Web search; Arabic query correction; Information retrieval; Language detection; Natural Language Processing; Root extraction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Innovations in Information Technology (IIT), 2011 International Conference on
Conference_Location :
Abu Dhabi
Print_ISBN :
978-1-4577-0311-9
Type :
conf
DOI :
10.1109/INNOVATIONS.2011.5893871
Filename :
5893871
Link To Document :
بازگشت