Title :
Integrated phrase segmentation and alignment algorithm for statistical machine translation
Author :
Zhang, Ying ; Vogel, Stephan ; Waibel, Alex
Author_Institution :
Language Technol. Inst., Carnegie Mellon Univ., Pittsburgh, PA, USA
Abstract :
We present an integrated phrase segmentation/alignment algorithm (ISA) for statistical machine translation. Without the need of building an initial word-to-word alignment or initially segmenting the monolingual text into phrases as other methods do, this algorithm segments the sentences into phrases and finds their alignments simultaneously. For each sentence pair, ISA builds a two-dimensional matrix to represent a sentence pair where the value of each cell corresponds to the point-wise mutual information (MI) between the source and target words. Based on the similarities of MI values among cells, we identify the aligned phrase pairs. Once all the phrase pairs are found, we know both how to segment one sentence into phrases and also the alignments between the source and target sentences. We use monolingual bigram language models to estimate the joint probabilities of the identified phrase pairs. The joint probabilities are then normalized to conditional probabilities, which are used by the decoder. Despite its simplicity, this approach yields phrase-to-phrase translations with significant higher precisions than our baseline system where phrase translations are extracted from the HMM word alignment. When we combine the phrase-to-phrase translations generated by this algorithm with the baseline system, the improvement on translation quality is even larger.
Keywords :
computational linguistics; hidden Markov models; language translation; linguistics; natural languages; probability; decoder; integrated phrase segmentation; monolingual bigram language model; phase alignment algorithm; phrase-to-phrase translation; point-wise mutual information; probability; statistical machine translation; Context modeling; Data mining; Decoding; Greedy algorithms; Hidden Markov models; Instruction sets; Joining processes; Natural languages; Surface-mount technology; Viterbi algorithm;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
Conference_Location :
Beijing, China
Print_ISBN :
0-7803-7902-0
DOI :
10.1109/NLPKE.2003.1275970