DocumentCode
1909427
Title
High accuracy citation extraction and named entity recognition for a heterogeneous corpus of academic papers
Author
Powley, Brett ; Dale, Robert
Author_Institution
Centre for Language Technol., Macquarie Univ., Sydney, NSW
fYear
2007
fDate
Aug. 30 2007-Sept. 1 2007
Firstpage
119
Lastpage
124
Abstract
Citation indices are increasingly being used not only as navigational tools for researchers, but also as the basis for measurement of academic performance and research impact. This means that the reliability of tools used to extract citations and construct such indices is becoming more critical; however, existing approaches to citation extraction still fall short of the high accuracy required if critical assessments are to be based on them. In this paper, we present techniques for high accuracy extraction of citations from academic papers, designed for applicability across a broad range of disciplines and document styles. We integrate citation extraction, reference parsing, and author named entity recognition to significantly improve performance in citation extraction, and demonstrate this performance on a cross-disciplinary heterogeneous corpus. Applying our algorithm to previously unseen documents, we demonstrate high F-measure performance of 0.98 for author named entity recognition and 0.97 for citation extraction.
Keywords
citation analysis; information retrieval; text analysis; academic papers; document styles; heterogeneous corpus; high accuracy citation extraction; named entity recognition; reference parsing; textual citation indices; Australia; Automation; Citation analysis; Computer science; Data mining; Hidden Markov models; Intersymbol interference; Navigation; Paper technology; Prototypes;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4244-1610-3
Electronic_ISBN
978-1-4244-1611-0
Type
conf
DOI
10.1109/NLPKE.2007.4368021
Filename
4368021
Link To Document