DocumentCode :
671719
Title :
Domain adaptation for cross-lingual query classification using search query logs and document classification
Author :
Hady, Mohamed Farouk Abdel ; Ibrahim, Roliana ; Ashour, Ahmed
Author_Institution :
Microsoft Res. Adv. Technol. Lab., Cairo, Egypt
fYear :
2013
fDate :
4-9 Aug. 2013
Firstpage :
1
Lastpage :
8
Abstract :
Query Intent classifier is used by a search engine to classify an online search query whether having a certain type of intent such as adult intent or commercial intent. Training such classifiers for each language is a supervised machine learning task that requires a large amount of labeled training queries. The manual annotation of training queries for each new emerging language using human judges is expensive, error-prone and time consuming. In this paper, we leverage the existing query classifiers in a source language and the abundant unlabeled queries in the query log of the underserved target language to reduce the cost and automate the training data annotation process. The most clicked search results of a query are used to predict the intent of this query instead of human judges. Document classifier is trained on hidden topics extracted by latent semantic indexing from the translation of source language documents into the target language. The experimental results, using English as the source language and Arabic as the target one, show that the proposed unsupervised method has trained support vector machines as Arabic query classifiers to detect both commercial and health intent without need for human-judged Arabic queries. The unsupervised classifiers outperform the classifiers based on direct query translation and the decision fusion of both classifier is superior.
Keywords :
classification; indexing; learning (artificial intelligence); natural languages; query processing; search engines; support vector machines; Arabic query classifier; English; cross-lingual query classification; decision fusion; direct query translation; document classification; latent semantic indexing; online search query; query Intent classifier; search engine; search query logs; source language document; supervised machine learning task; support vector machine; training data annotation process; Business; Indexing; Large scale integration; Search engines; Semantics; Support vector machines; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks (IJCNN), The 2013 International Joint Conference on
Conference_Location :
Dallas, TX
ISSN :
2161-4393
Print_ISBN :
978-1-4673-6128-6
Type :
conf
DOI :
10.1109/IJCNN.2013.6707061
Filename :
6707061
Link To Document :
بازگشت