Title :
Merging information by discourse processing for information extraction
Author :
Kitani, Tsuyoshi
Author_Institution :
Center for Machine Translation, Carnegie Mellon Univ., Pittsburgh, PA, USA
Abstract :
In information extraction tasks, a finite-state pattern matcher is widely used to identify individual pieces of information in a sentence. Merging related pieces of information scattered throughout a text is usually difficult, however, since semantic relations across sentences cannot be captured by the sentence level processing. The purpose of the discourse processing described in this paper is to link individual pieces of information identified by the sentence level processing. In the Tipster information extraction domains, correct identification of company names is the key to achieving a high level of system performance. Therefore, the discourse processor in the Textract information extraction system keeps track of missing, abbreviated, and referenced company names in order to correlate individual pieces of information throughout the text. Furthermore, the discourse is segmented, so that data can be extracted from relevant portions of the text containing information of interest related to a particular tie-up relationship
Keywords :
commerce; information analysis; information retrieval; merging; natural languages; Japanese GNU AWK; Japanese morphological analyzer; Majesty; Textract; Tipster; abbreviated names; company name identification; discourse processing; finance; finite-state pattern matcher; information correlation; information extraction; information merging; missing names; natural language processing; newspaper articles; referenced names; segmented discourse; semantic relations; sentence level processing; system performance; tie-up relationship; Artificial intelligence; Assembly; Data mining; Finance; International collaboration; Merging; Natural language processing; Natural languages; Scattering; System performance;
Conference_Titel :
Artificial Intelligence for Applications, 1994., Proceedings of the Tenth Conference on
Conference_Location :
San Antonia, TX
Print_ISBN :
0-8186-5550-X
DOI :
10.1109/CAIA.1994.323646