DocumentCode
1867084
Title
Development of a name translation system using CRAY T94
Author
Cai, Wentong ; Xu, Peng ; Wu, Paul ; Jyh, Horng
Author_Institution
Sch. of Appl. Sci., Nanyang Technol. Inst., Singapore
fYear
1997
fDate
28 Apr-2 May 1997
Firstpage
295
Lastpage
300
Abstract
Natural language processing (NLP) is an important research direction, since it addresses the needs of the approaching information age. In this paper, we report our study on the problem of translating people´s English names into their corresponding Chinese Pinyin names. A name translation system (NTS) has been developed based on statistical approaches. There are two components in the NTS: dictionary creation and name translation. The dictionary is generated using a statistics-based dictionary generator (SBDG), and the name translation is done by using a modified address normalization system (ANS). As in many other NLP applications, the SBDG and ANS suffer the drawback of requiring extremely large computational resources, both in terms of computation time and memory. To make the NTS fast and feasible, therefore, the use of a high-performance computer becomes necessary. The CRAY T94 is a powerful large-scale and general-purpose parallel-vector supercomputer. In this paper, we first describe the system design of the NTS, and then explain how the NTS is optimized to execute on the CRAY T94. The results we obtained are also discussed. Our experience shows that algorithms and data structures are very important in obtaining optimal performance. The performance monitoring/analysis tools provided by The CRAY T94 programming environment are also proved to be very useful in making optimization decisions. In addition, our study also demonstrates that using the CRAY T94, performance improvements can be achieved not only in the traditional areas of scientific computation but also in NLP applications
Keywords
Cray computers; glossaries; language translation; natural languages; parallel processing; statistics; vector processor systems; CRAY T94 computer; Chinese Pinyin names; English names; address normalization system; algorithms; data structures; dictionary creation; high-performance computer; large-scale general-purpose parallel-vector supercomputer; name translation system; natural language processing; optimal performance; performance analysis tools; performance monitoring tools; programming environment; statistical approaches; statistics-based dictionary generator; system design; system optimization; Application software; Data structures; Design optimization; Dictionaries; Large-scale systems; Monitoring; Natural language processing; Performance analysis; Supercomputers; System analysis and design;
fLanguage
English
Publisher
ieee
Conference_Titel
High Performance Computing on the Information Superhighway, 1997. HPC Asia '97
Conference_Location
Seoul
Print_ISBN
0-8186-7901-8
Type
conf
DOI
10.1109/HPC.1997.592163
Filename
592163
Link To Document