DocumentCode :
1909427
Title :
High accuracy citation extraction and named entity recognition for a heterogeneous corpus of academic papers
Author :
Powley, Brett ; Dale, Robert
Author_Institution :
Centre for Language Technol., Macquarie Univ., Sydney, NSW
fYear :
2007
fDate :
Aug. 30 2007-Sept. 1 2007
Firstpage :
119
Lastpage :
124
Abstract :
Citation indices are increasingly being used not only as navigational tools for researchers, but also as the basis for measurement of academic performance and research impact. This means that the reliability of tools used to extract citations and construct such indices is becoming more critical; however, existing approaches to citation extraction still fall short of the high accuracy required if critical assessments are to be based on them. In this paper, we present techniques for high accuracy extraction of citations from academic papers, designed for applicability across a broad range of disciplines and document styles. We integrate citation extraction, reference parsing, and author named entity recognition to significantly improve performance in citation extraction, and demonstrate this performance on a cross-disciplinary heterogeneous corpus. Applying our algorithm to previously unseen documents, we demonstrate high F-measure performance of 0.98 for author named entity recognition and 0.97 for citation extraction.
Keywords :
citation analysis; information retrieval; text analysis; academic papers; document styles; heterogeneous corpus; high accuracy citation extraction; named entity recognition; reference parsing; textual citation indices; Australia; Automation; Citation analysis; Computer science; Data mining; Hidden Markov models; Intersymbol interference; Navigation; Paper technology; Prototypes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-1610-3
Electronic_ISBN :
978-1-4244-1611-0
Type :
conf
DOI :
10.1109/NLPKE.2007.4368021
Filename :
4368021
Link To Document :
بازگشت