Title :
A framework for integrating bibliographical data of computer science publications
Author :
Tien Do ; Dao Lam ; Tin Huynh
Author_Institution :
Univ. of Inf. Technol. - Vietnam, Ho Chi Minh City, Vietnam
Abstract :
In this paper, we propose a framework to integrate bibliographical data of computer science publications from heterogeneous digital libraries. The framework consists of three key components: publication collector, bibliographical parser and duplicated checker. In order to analyze efficiency of our framework in integrating data from heterogeneous sources, we conduct experiment with three different digital libraries: Microsoft Academic Search, CiteSeerX and DBLP. At this time, our integrated dataset contains 5.320.539 publications and 1.723.148 authors and their metadata. Our dataset increases quantity of rows and columns compared with the others. Thus, it could be published for other studies related to bibliographical data such as searching literature, ranking publications, identifying the research trend, mining the linking of articles.
Keywords :
bibliographic systems; data integration; digital libraries; electronic publishing; meta data; CiteSeerX; DBLP; Microsoft Academic Search; article linking mining; bibliographical data integration; bibliographical parser; computer science publications; duplicated checker; framework efficiency analysis; heterogeneous digital libraries; heterogeneous sources; literature search; meta data; publication collector; publication ranking; research trend identification; Computer science; Crawlers; Data mining; Databases; IEEE Xplore; Libraries; Metasearch; Bibliographical Data; Data Integration; Digital Library; Focused Crawler; OAI-PMH;
Conference_Titel :
Computing, Management and Telecommunications (ComManTel), 2014 International Conference on
Conference_Location :
Da Nang
Print_ISBN :
978-1-4799-2904-7
DOI :
10.1109/ComManTel.2014.6825612