Title :
The challenge of virginia banks: an evaluation of named entity analysis in a 19th-century newspaper collection
Author :
Crane, Gregory ; Jones, Alison
Author_Institution :
Perseus Project, Tufts Univ., Medford, MA
Abstract :
This paper evaluates automatic extraction of ten named entity classes from a 19th century newspaper, the Civil War years of the Richmond Times Dispatch, digitized with IMLS support by the University of Richmond. This paper analyzes success with ten categories of entities prominent in these newspapers and the particular problems that these classes of named entities raise. Personal and place names are familiar but some more important categories (such as ship names and military units) illustrate some of the challenges that named entity identification confronts as it evolves into a fundamental tool not only for automatic metadata generation but also for searching and browsing as well. We conclude by suggesting the kinds of knowledge sources that digital libraries need to assemble as part of their machine readable reference collections to support named entity identification as a core service
Keywords :
digital libraries; history; information analysis; information retrieval; meta data; 19th-century newspaper collection; Civil War years; IMLS; Richmond Times Dispatch; Virginia Banks; automatic extraction; automatic metadata generation; digital library; machine readable reference collections; named entity analysis; Abstracts; Assembly; Cranes; Encyclopedias; Information retrieval; Job listing service; Marine vehicles; Oceans; Permission; Software libraries; digital libraries; historical newspapers; named entity recognition;
Conference_Titel :
Digital Libraries, 2006. JCDL '06. Proceedings of the 6th ACM/IEEE-CS Joint Conference on
Conference_Location :
Chapel Hill, NC
Print_ISBN :
1-59593-354-9
DOI :
10.1145/1141753.1141759