Title :
A method of mining bilingual resources from Web Based on Maximum Frequent Sequential Pattern
Author :
Zhang, Guiping ; Luo, Yang ; Ji, Duo
Author_Institution :
Knowledge Eng. Res. Center, Shenyang Aerosp. Univ., Shenyang, China
Abstract :
The bilingual resources are indispensable and vital resources in the NPL fields, such as machine translation, etc. A large amount of electronic information is embedded in the Internet, which can be used as a potential information source of large-scale multi-language corpus, so it is a potential and feasible way to mine a great capacity of true bilingual resources from the Web. This paper proposes a method of mining bilingual resources from the Web based on Maximum Frequent Sequential Pattern. The method uses the heuristic approach to search and filter the candidate bilingual web pages, then mines patterns using maximum frequent sequential, and uses a machine learning method for extending the pattern base and verifying bilingual resources in accordance with the Japanese to Chinese word proportion. The experimental results indicate that the method could extract bilingual resources efficiently, with the precision rate over 90%.
Keywords :
Internet; data mining; language translation; natural language processing; Internet; Japanese to Chinese word proportion; NPL fields; bilingual Web pages; bilingual resources mining; machine translation; maximum frequent sequential pattern; multilanguage corpus; Aerospace engineering; Artificial neural networks; Information filters; Knowledge engineering; Bilingual corpus; Maximum Frequent Sequential Pattern; Pattern base; Web mining;
Conference_Titel :
Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-6896-6
DOI :
10.1109/NLPKE.2010.5587831