Title :
Reference metadata extraction from scientific papers
Author :
Zhixin Guo ; Hai Jin
Author_Institution :
Cluster & Grid Comput. Lab., Huazhong Univ. of Sci. & Technol., Wuhan, China
Abstract :
Bibliographical information of scientific papers is of great value since the Science Citation Index is introduced to measure research impact. Most scientific documents available on the web are unstructured or semi-structured, and the automatic reference metadata extraction process becomes an important task. This paper describes a framework for automatic reference metadata extraction from scientific papers. Our system can extract title, author, journal, volume, year, and page from scientific papers in PDF. We utilize a document metadata knowledge base to guide the reference metadata extraction process. The experiment results show that our system achieves a high accuracy.
Keywords :
Internet; citation analysis; document handling; information retrieval; knowledge based systems; meta data; natural sciences computing; Bibliographical Information; Web; automatic reference metadata extraction process; document metadata knowledge base; science citation index; scientific documents; scientific papers; semistructured metadata extraction process; unstructured metadata extraction process; Accuracy; Data mining; Hidden Markov models; Knowledge based systems; Libraries; Portable document format; Semantics; metadata extraction; reference; rule-based approach;
Conference_Titel :
Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2011 12th International Conference on
Conference_Location :
Gwangju
Print_ISBN :
978-1-4577-1807-6
DOI :
10.1109/PDCAT.2011.72