Title :
HotDataSpider, an ETL tool for supplementary data of biomedical journals
Author :
Xu, Qing-Wei ; Guo, Jian
Author_Institution :
Comput. Sci. & Technol., HUBEI Univ. of Educ., Wuhan, China
Abstract :
Journal´s supplementary data play an important role in data analysis and text mining, which need to be kept in public repositories. This kind of supplementary material will be named HotData. In this paper, we developed HotDataSpider which investigated how to extract, annotate and load these HotData from the 15 international top-cited biomedical journals. As of April 2009, the size of HotData repository is over 420GB (~28,000 items). HotDataSpider is licensed under Apache License Version 2.0, and is accessible via the website: http://lifecenter.sgst.cn/hotdata/.
Keywords :
data analysis; data mining; medical information systems; text analysis; ETL tool; HotDataSpider; biomedical journals; data analysis; public repositories; supplementary data; text mining; Bioinformatics; Biological materials; Biomedical engineering; Computer science; Data analysis; Data engineering; Data mining; Genomics; Licenses; Text mining; Biomedicine; ETL; HotData; Supplementary Data;
Conference_Titel :
BioMedical Information Engineering, 2009. FBIE 2009. International Conference on Future
Conference_Location :
Sanya
Print_ISBN :
978-1-4244-4690-2
Electronic_ISBN :
978-1-4244-4692-6
DOI :
10.1109/FBIE.2009.5405918