• DocumentCode
    1867084
  • Title

    Development of a name translation system using CRAY T94

  • Author

    Cai, Wentong ; Xu, Peng ; Wu, Paul ; Jyh, Horng

  • Author_Institution
    Sch. of Appl. Sci., Nanyang Technol. Inst., Singapore
  • fYear
    1997
  • fDate
    28 Apr-2 May 1997
  • Firstpage
    295
  • Lastpage
    300
  • Abstract
    Natural language processing (NLP) is an important research direction, since it addresses the needs of the approaching information age. In this paper, we report our study on the problem of translating people´s English names into their corresponding Chinese Pinyin names. A name translation system (NTS) has been developed based on statistical approaches. There are two components in the NTS: dictionary creation and name translation. The dictionary is generated using a statistics-based dictionary generator (SBDG), and the name translation is done by using a modified address normalization system (ANS). As in many other NLP applications, the SBDG and ANS suffer the drawback of requiring extremely large computational resources, both in terms of computation time and memory. To make the NTS fast and feasible, therefore, the use of a high-performance computer becomes necessary. The CRAY T94 is a powerful large-scale and general-purpose parallel-vector supercomputer. In this paper, we first describe the system design of the NTS, and then explain how the NTS is optimized to execute on the CRAY T94. The results we obtained are also discussed. Our experience shows that algorithms and data structures are very important in obtaining optimal performance. The performance monitoring/analysis tools provided by The CRAY T94 programming environment are also proved to be very useful in making optimization decisions. In addition, our study also demonstrates that using the CRAY T94, performance improvements can be achieved not only in the traditional areas of scientific computation but also in NLP applications
  • Keywords
    Cray computers; glossaries; language translation; natural languages; parallel processing; statistics; vector processor systems; CRAY T94 computer; Chinese Pinyin names; English names; address normalization system; algorithms; data structures; dictionary creation; high-performance computer; large-scale general-purpose parallel-vector supercomputer; name translation system; natural language processing; optimal performance; performance analysis tools; performance monitoring tools; programming environment; statistical approaches; statistics-based dictionary generator; system design; system optimization; Application software; Data structures; Design optimization; Dictionaries; Large-scale systems; Monitoring; Natural language processing; Performance analysis; Supercomputers; System analysis and design;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing on the Information Superhighway, 1997. HPC Asia '97
  • Conference_Location
    Seoul
  • Print_ISBN
    0-8186-7901-8
  • Type

    conf

  • DOI
    10.1109/HPC.1997.592163
  • Filename
    592163