Title :
Research and Implementation on Machine Translation System with Online Corpora Extraction Technology
Author_Institution :
Changsha Aeronaut. Vocational & Tech. Coll., Changsha, China
Abstract :
Bilingual parallel sentence pairs are important resources of machine translation. Due to the limitation of obtaining ways, sentence levelled parallel corpora are not only limited in quantity, but they also concentrate in specific field. So they are difficult to be adapted to genuine application requirements. This paper introduces a Web-based automatic acquisition system of bilingual parallel sentence pairs. The system integrates the advantages of current system and improves its key technologies. We proposes a URL naming method in automatic discovery bilingual network and improves the extraction technology of bilingual parallel sentence pairs. Experimental results show that the methods in this paper greatly improves recalling rate of candidate bilingual network discovery. Its recall rate of obtaining bilingual parallel sentence pairs is 93% as well as accuracy rate is 96%, which proves its effectiveness. In addition, this paper also studies bilingual parallel sentence pairs inside bilingual network and obtains some primary result. Multi-group experiments of statistical machine translation prove that our method can improve the performance of machine translation system so that it can play a part in practical application of online corpora.
Keywords :
Internet; language translation; statistical analysis; URL naming method; Web-based automatic acquisition system; automatic discovery bilingual network; bilingual parallel sentence pairs; candidate bilingual network discovery; extraction technology; online corpora extraction technology; recalling rate; statistical machine translation; Accuracy; Feature extraction; Training; Uniform resource locators; Web pages; MTS; bilingual parallel; corpora; extraction; webpages;
Conference_Titel :
Intelligent Systems Design and Engineering Applications (ISDEA), 2014 Fifth International Conference on
Conference_Location :
Hunan
Print_ISBN :
978-1-4799-4262-6
DOI :
10.1109/ISDEA.2014.172