DocumentCode :
3432151
Title :
GATE framework based metadata extraction from scientific papers
Author :
Huynh, Tin ; Hoang, Kiem
Author_Institution :
Dept. of Comput. Sci., Univ. of Inf. Technol., Ho Chi Minh City, Vietnam
fYear :
2010
fDate :
2-4 Nov. 2010
Firstpage :
188
Lastpage :
191
Abstract :
In this paper we propose a method to extract automatically metadata (title, authors, affiliation, email, references, etc) from science papers by combining the layout information of papers with rules which are defined by using JAPE Grammar rules of GATE. After metadata extracted automatically from digital documents, user can interact and correct them before they are exported to XML files. Developing a tool to extract metadata from digital documents is a very necessary and useful task for building collections, organizing and searching documents in digital libraries. The extraction method is tested on computer science paper collections selected from international journals, proceedings downloaded from digital libraries such as ACM, IEEE, Springer and CiteSeer.
Keywords :
data mining; digital libraries; document handling; GATE framework; JAPE grammar rules; digital document; digital libraries; metadata extraction; scientific paper; Data mining; Electronic mail; Layout; Libraries; Logic gates; Machine learning; Ontologies; Information extraction; automation; metadata;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Education and Management Technology (ICEMT), 2010 International Conference on
Conference_Location :
Cairo
Print_ISBN :
978-1-4244-8616-8
Electronic_ISBN :
978-1-4244-8618-2
Type :
conf
DOI :
10.1109/ICEMT.2010.5657675
Filename :
5657675
Link To Document :
بازگشت