Title :
Extracting company names from text
Author_Institution :
GE Res. & Dev. Center, Schenectady, NY, USA
Abstract :
A detailed description is given of an implemented algorithm that extracts company names automatically from financial news. Extracting company names from text is one problem; recognizing subsequent references to a company is another. The author addresses both problems in an implemented, well-tested module that operates as a detachable process from a set of natural language processing tools. She implements a good algorithm by combining heuristics, exception lists and extensive corpus analysis. The algorithm generates the most likely variations that those names may go by, for use in subsequent retrieval. Tested on over one million words of naturally occurring financial news, the system has extracted thousands of company names with over 95% accuracy (precision) compared to a human, and succeeded in extracting 25% more companies than were indexed by a human
Keywords :
computerised pattern recognition; financial data processing; information retrieval; natural languages; word processing; company names; corpus analysis; detachable process; exception lists; financial news; heuristics; natural language processing tools; naturally occurring financial news; retrieval; well-tested module; Artificial intelligence; Databases; Frequency; Humans; Laboratories; Natural language processing; Natural languages; Research and development; Testing; Text recognition;
Conference_Titel :
Artificial Intelligence Applications, 1991. Proceedings., Seventh IEEE Conference on
Conference_Location :
Miami Beach, FL
Print_ISBN :
0-8186-2135-4
DOI :
10.1109/CAIA.1991.120841