مرکز منطقه ای اطلاع رساني علوم و فناوري - An English part-of-speech tagger for machine translation in business domain

DocumentCode :

3141907

Title :

An English part-of-speech tagger for machine translation in business domain

Author :

Ma, Jianjun ; Huang, Degen ; Liu, Haixia ; Sheng, Wenfeng

Author_Institution :

Sch. of Comput. Sci. & Technol., Dalian Univ. of Technol., Dalian, China

fYear :

2011

fDate :

27-29 Nov. 2011

Firstpage :

183

Lastpage :

189

Abstract :

Part-of-speech tagging is a crucial preprocessing step for machine translation. Current studies mainly focus on the methods, linguistic, statistic, machine learning or hybrid. But so far not many serious attempts have been performed to test the reported accuracy of taggers on different, perhaps domain-specific, corpora. Therefore, this paper presents an English POS tagger for English-Chinese machine translation in business domain, demonstrating how a present tagger can be adapted to learn from a small amount of data and handle unknown words for the purpose of machine translation. A small size of 998k English annotated corpus in business domain is built semi-automatically based on a new tagset, the maximum entropy model is adopted and rule-based approach is used in post-processing. Experiments show that our tagger achieves an accuracy of 99.08% in closed test and 98.14% in open test, which is a quite satisfactory result, compared with the reported best open test result of 97.18% of Stanford English tagger.

Keywords :

business data processing; knowledge based systems; language translation; maximum entropy methods; natural language processing; English part-of-speech tagger; English-Chinese machine translation; business domain; maximum entropy model; rule-based approach; Hidden Markov models; Indium phosphide; English POS tagging; business domain; machine translation; maximum entropy; rule-based approach;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Natural Language Processing andKnowledge Engineering (NLP-KE), 2011 7th International Conference on

Conference_Location :

Tokushima

Print_ISBN :

978-1-61284-729-0

Type :

conf

DOI :

10.1109/NLPKE.2011.6138191

Filename :

6138191

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3141907