DocumentCode :
2639245
Title :
Regular expression-based reference metadata extraction from the web
Author :
Tang, Xiaoyu ; Zeng, Qingtian ; Cui, Tingting ; Wu, Zeze
Author_Institution :
Coll. of Inf. Sci. & Eng., Shandong Univ. of Sci. & Technol., Qingdao, China
fYear :
2010
fDate :
16-17 Aug. 2010
Firstpage :
346
Lastpage :
350
Abstract :
Accurate reference metadata extraction becomes an intriguing task to researchers who want to collect data of scientific publications. In this paper, we introduce an approach to extracting the reference metadata based on regular expressions. A prototype system named “Goldrusher” is created which automatically extracts data from the website of Association for Computing Machinery (ACM). The experimental results show that, by using our regular expression-based method, we can effectively extract author names, article titles, journal titles, DIOs, etc.
Keywords :
Internet; Web sites; information retrieval; meta data; Association for Computing Machinery; Goldrusher; Web site; World Wide Web; accurate reference metadata extraction; regular expression; Books; Crawlers; Data mining; HTML; Libraries; Machinery; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Society (SWS), 2010 IEEE 2nd Symposium on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-6356-5
Type :
conf
DOI :
10.1109/SWS.2010.5607427
Filename :
5607427
Link To Document :
بازگشت