DocumentCode :
3110543
Title :
Extracting the multilingual terminology from a web-based encyclopedia
Author :
Fatiha, Sadat
Author_Institution :
Dept. of Comput. Sci., Univ. du Quebec a Montreal, Montreal, QC, Canada
fYear :
2011
fDate :
19-21 May 2011
Firstpage :
1
Lastpage :
5
Abstract :
Multilingual linguistic resources are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. This article seeks to explore and to exploit the idea of using multilingual web-based encyclopedias such as Wikipedia as comparable corpora for bilingual terminology extraction. We propose an approach to extract terms and their translations from different types of Wikipedia link information and data. The next step will be using a linguistic-based information to re-rank and filter the extracted term candidates in the target language. Preliminary evaluations using the combined statistics-based and linguistic-based approaches were applied on different pairs of languages including Japanese, French and English. These evaluations showed a real open improvement and a good quality of the extracted term candidates for building or enriching multilingual ontologies, dictionaries or feeding a cross-language information retrieval system with the related expansion terms of the source query.
Keywords :
Internet; dictionaries; encyclopaedias; ontologies (artificial intelligence); query processing; statistical analysis; English language; French language; Japanese language; Web-based encyclopedia; Wikipedia link information; bilingual terminology extraction; cross-language information retrieval system; dictionaries; multilingual linguistic resources; multilingual ontologies; multilingual terminology extraction; parallel corpora; source query; statistics-based approaches; Dictionaries; Electronic publishing; Encyclopedias; Internet; Ontologies; Cross-Language Information Retrieval; comparable corpora; linguistics-based information; terminology; translation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Research Challenges in Information Science (RCIS), 2011 Fifth International Conference on
Conference_Location :
Gosier
ISSN :
2151-1349
Print_ISBN :
978-1-4244-8670-0
Electronic_ISBN :
2151-1349
Type :
conf
DOI :
10.1109/RCIS.2011.6006865
Filename :
6006865
Link To Document :
بازگشت