DocumentCode :
2876023
Title :
Named entity recognition from spoken documents using global evidences and external knowledge sources with applications on Mandarin Chinese
Author :
Pan, Yi-Cheng ; Liu, Yu-Ying ; Lee, Lin-shan
Author_Institution :
Graduate Inst. of Comput. Sci. & Inf. Eng., Nat. Taiwan Univ., Taipei
fYear :
2005
fDate :
27-27 Nov. 2005
Firstpage :
296
Lastpage :
301
Abstract :
In this paper, we propose two efficient approaches for named entity recognition (NER) from spoken documents. The first approach used a very efficient data structure, the PAT trees, to extract global evidences from the whole spoken documents, to be used with the well-known local (internal and external) evidences popularly used by conventional approaches. The basic idea is that a named entity (NE) may not be easily recognized in certain contexts, but may become much more easily recognized when its repeated occurrences in all the different sentences in the same spoken document are considered jointly. This approach is equally useful for NER from text and spoken documents. The second approach is to try to recover some named entities (NEs) which are out-of-vocabulary (OOV) words and thus can´t be obtained in the transcriptions. The basic idea is to use reliable and important words in the transcription to construct queries to retrieve relevant text documents from external knowledge sources (such as Internet). Matching the NEs obtained from these retrieved relevant text documents with some selected sections of the phone lattice of the spoken document can recover some NEs which are OOV words. The experiments were performed on Mandarin Chinese by incorporating these two approaches to a conventional hybrid statistic/rule based NER system for Chinese language. Very significant performance improvements were obtained
Keywords :
document handling; natural languages; speech processing; Mandarin Chinese; external knowledge sources; global evidences; named entity recognition; out-of-vocabulary words; spoken documents; Application software; Automatic speech recognition; Computer science; Data mining; Internet; Knowledge engineering; Lattices; Natural languages; Statistics; Tree data structures;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding, 2005 IEEE Workshop on
Conference_Location :
San Juan
Print_ISBN :
0-7803-9478-X
Electronic_ISBN :
0-7803-9479-8
Type :
conf
DOI :
10.1109/ASRU.2005.1566535
Filename :
1566535
Link To Document :
بازگشت