• DocumentCode
    2280268
  • Title

    Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval

  • Author

    Meng, Helen M. ; Lo, Wai-Kit ; Chen, Berlin ; Tang, Karen

  • Author_Institution
    Chinese Univ. of Hong Kong, Shatin, China
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    311
  • Lastpage
    314
  • Abstract
    We have developed a technique for automatic transliteration of named entities for English-Chinese cross-language spoken document retrieval (CL-SDR). Our retrieval system integrates machine translation, speech recognition and information retrieval technologies. An English news story forms a textual query that is automatically translated into Chinese words, which are mapped into Mandarin syllables by pronunciation dictionary lookup. Mandarin radio news broadcasts form spoken documents that are indexed by word and syllable recognition. The information retrieval engine performs matching in both word and syllable scales. The English queries contain many named entities that tend to be out-of-vocabulary words for machine translation and speech recognition, and are omitted in retrieval. Names are often transliterated across languages and are generally important for retrieval. We present a technique that takes in a name spelling and automatically generates a phonetic cognate in terms of Chinese syllables to be used in retrieval. Experiments show consistent retrieval performance improvement by including the use of named entities in this way.
  • Keywords
    dictionaries; indexing; information retrieval; language translation; natural language interfaces; pattern matching; speech processing; speech recognition; text analysis; Chinese words; English news story; English-Chinese spoken document retrieval; Mandarin radio news broadcasts; Mandarin syllables; automatic transliteration; cross-language spoken document retrieval; indexing; information retrieval; machine translation; matching; name spelling; named entities; out-of-vocabulary words; performance improvement; phonetic cognate generation; pronunciation dictionary lookup; speech recognition; syllable recognition; textual query; word recognition; Audio recording; Broadcast technology; Dictionaries; Digital multimedia broadcasting; Engines; Indexing; Information retrieval; Natural languages; Radio broadcasting; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding, 2001. ASRU '01. IEEE Workshop on
  • Print_ISBN
    0-7803-7343-X
  • Type

    conf

  • DOI
    10.1109/ASRU.2001.1034649
  • Filename
    1034649