مرکز منطقه ای اطلاع رساني علوم و فناوري - Regular expression-based reference metadata extraction from the web

DocumentCode :

2639245

Title :

Regular expression-based reference metadata extraction from the web

Author :

Tang, Xiaoyu ; Zeng, Qingtian ; Cui, Tingting ; Wu, Zeze

Author_Institution :

Coll. of Inf. Sci. & Eng., Shandong Univ. of Sci. & Technol., Qingdao, China

fYear :

2010

fDate :

16-17 Aug. 2010

Firstpage :

346

Lastpage :

350

Abstract :

Accurate reference metadata extraction becomes an intriguing task to researchers who want to collect data of scientific publications. In this paper, we introduce an approach to extracting the reference metadata based on regular expressions. A prototype system named “Goldrusher” is created which automatically extracts data from the website of Association for Computing Machinery (ACM). The experimental results show that, by using our regular expression-based method, we can effectively extract author names, article titles, journal titles, DIOs, etc.

Keywords :

Internet; Web sites; information retrieval; meta data; Association for Computing Machinery; Goldrusher; Web site; World Wide Web; accurate reference metadata extraction; regular expression; Books; Crawlers; Data mining; HTML; Libraries; Machinery; Web pages;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Web Society (SWS), 2010 IEEE 2nd Symposium on

Conference_Location :

Beijing

Print_ISBN :

978-1-4244-6356-5

Type :

conf

DOI :

10.1109/SWS.2010.5607427

Filename :

5607427

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2639245