مرکز منطقه ای اطلاع رساني علوم و فناوري - An autoencoder with bilingual sparse features for improved statistical machine translation

DocumentCode :

180179

Title :

An autoencoder with bilingual sparse features for improved statistical machine translation

Author :

Bing Zhao ; Yik-Cheung Tam ; Jing Zheng

Author_Institution :

SRI Int., Menlo Park, CA, USA

fYear :

2014

fDate :

4-9 May 2014

Firstpage :

7103

Lastpage :

7107

Abstract :

Though sparse features have produced significant gains over traditional dense features in statistical machine translation, careful feature selection and feature engineering are necessary to avoid over-fitting in optimizations. However, many sparse features are highly overlapping with each other; that is, they cover the same or similar information of translational equivalence from slightly different points of view, and eventually overfit easily with only very feature training samples in given bilingual stochastic context-free grammar (SCFG) rules. We propose a natural autoencoder that maps all the discrete and overlapping sparse features for each SCFG rule into a continuous vector, so that the information encoded in sparse feature vectors becomes a dense vector that may enjoy more samples during training and avoid overfitting. Our experiments showed that for a 33-million bilingual SCFG rules statistical machine translation system, the autoencoder generalizes much better than sparse features alone using the same optimization framework.

Keywords :

context-free grammars; encoding; feature selection; language translation; natural language processing; optimisation; statistical analysis; SCFG rules; autoencoder; bilingual sparse features; bilingual stochastic context-free grammar; feature engineering; feature selection; feature training sample; improved statistical machine translation; optimization; overfitting avoidance; sparse feature vector; translational equivalence; Computational linguistics; Neural networks; Optimization; Principal component analysis; Training; Tuning; Vectors; PRO; SCFG grammar induction; autoencoder; machine translation; optimization; sparse features;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location :

Florence

Type :

conf

DOI :

10.1109/ICASSP.2014.6854978

Filename :

6854978

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=180179