MT on and for the Web

Author

Boitet, Christian ; Blanchon, Hervé ; Seligman, Mark ; Bellynck, Valérie

Author_Institution

GETALP, UPMF, Grenoble, France

fYear

2010

fDate

21-23 Aug. 2010

Firstpage

Lastpage

Abstract

A Systran MT server became available on the minitel network in 1984, and on Internet in 1994. Since then we have come to a better understanding of the nature of MT systems by separately analyzing their linguistic, computational, and operational architectures. Also, thanks to the CxAxQ metatheorem, the systems´ inherent limits have been clarified, and design choices can now be made in an informed manner according to the translation situations. MT evaluation has also matured: tools based on reference translations are useful for measuring progress; those based on subjective judgments for estimating future usage quality; and task-related objective measures (such as post-editing distances) for measuring operational quality. Moreover, the same technological advances that have led to “Web 2.0” have brought several futuristic predictions to fruition. Free Web MT services have democratized assimilation MT beyond belief. Speech translation research has given rise to usable systems for restricted tasks running on PDAs or on mobile phones connected to servers. New man-machine interface techniques have made interactive disambiguation usable in large-coverage multimodal MT. Increases in computing power have made statistical methods workable, and have led to the possibility of building low-linguistic-quality but still useful MT systems by machine learning from aligned bilingual corpora (SMT, EBMT). In parallel, progress has been made in developing interlingua-based MT systems, using hybrid methods. Unfortunately, many misconceptions about MT have spread among the public, and even among MT researchers, because of ignorance of the past and present of MT R&D. A compensating factor is the willingness of end users to freely contribute to building essential parts of the linguistic knowledge needed to construct MT systems, whether corpus-related or lexical. Finally, some developments we anticipated fifteen years ago have not yet materialized, such as online writing - - tools equipped with interactive disambiguation, and as a corollary the possibility of transforming source documents into self-explaining documents (SEDs) and of producing corresponding SEDs fully automatically in several target languages. These visions should now be realized, thanks to the evolution of Web programming and multilingual NLP techniques, leading towards a true Semantic Web, “Web 3.0”, which will support ubilingual (ubiquitous multilingual) computing.

Keywords

language translation; learning (artificial intelligence); natural language processing; semantic Web; CxAxQ metatheorem; Systran MT server; Web 2.0; machine learning; machine translation; man-machine interface techniques; multilingual NLP techniques; self-explaining documents; semantic Web; speech translation research; Computer architecture; Dictionaries; Humans; Internet; Pragmatics; Speech; Speech recognition; MT; Semantic Web MT; computational architecture; interactive disambiguation; linguistic architecture; operational architecture; self-explaining documents; speech MT; task-related evaluation;

fLanguage

English

Publisher

ieee

Conference_Titel

Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on

Conference_Location

Beijing

Print_ISBN

978-1-4244-6896-6

Type

conf

DOI

10.1109/NLPKE.2010.5587865

Filename

5587865

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=2349287