DocumentCode
2169960
Title
Identifying the Dominant Language of Web Page Using Supervised N-grams
Author
Choon-Ching Ng ; Siau-Chuin Liew ; Hussin, W.M.S.W. ; Herawan, Tutut
Author_Institution
Fac. of Comput. Syst. & Software Eng., Univ. Malaysia Pahang, Pekan, Malaysia
fYear
2012
fDate
26-28 Nov. 2012
Firstpage
344
Lastpage
348
Abstract
Natural language processing is an emerging technology in linguistic industry and an aid to human-computer interaction in computer science. Language identification, on the other hand, is a form of pattern recognition that helps to identify predefined language of a web page and to predict the unknown language of one particular text. Written texts are constructed by common features such as character, word and n-gram and these characteristics are unique among languages. From the experiment result, the performance of the supervised n-gram produces an accurate identification value and outperforms the distance measurement on Arabic script web pages.
Keywords
Web sites; natural language processing; support vector machines; text analysis; Arabic script Web page; Web page dominant language identification; computer science; distance measurement; human-computer interaction; linguistic industry; natural language processing; pattern recognition; supervised N-grams; support vector machine; text language; written text; Arabic script; Support vector machine; language identification; supervised N-grams;
fLanguage
English
Publisher
ieee
Conference_Titel
Advanced Computer Science Applications and Technologies (ACSAT), 2012 International Conference on
Conference_Location
Kuala Lumpur
Print_ISBN
978-1-4673-5832-3
Type
conf
DOI
10.1109/ACSAT.2012.74
Filename
6516378
Link To Document