DocumentCode :
606370
Title :
Scaling SeerSuite in the Cloud
Author :
Teregowda, P. ; Giles, C. Lee
Author_Institution :
Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
fYear :
2013
fDate :
25-27 March 2013
Firstpage :
146
Lastpage :
155
Abstract :
The Seer Suite digital library search engine framework is used to build tools such as CiteSeerx. It includes a complex metadata extraction system capable of extracting elements, such as author name, title, citations and citation contexts that are crucial bibliometric data and for building a citation graph. The workload faced by the exractor is dynamic in nature and this variability makes CiteSeerx attractive for hosting in a cloud computing environment. Given its application binary dependencies and its reliance on a specialized infrastructure, the current extractor has several limitations. These limitations motivated the design and implementation of the metadata extraction system proposed in this study. A message oriented middleware architecture is used with a publish/subscribe pattern to build a scalable, flexible system that can be deployed across a range of cloud infrastructure. To demonstrate the broad applicability of the proposed system, we evaluate it in terms of its reference implementation across different scenarios of deployment and in regard to its scalability.
Keywords :
citation analysis; cloud computing; digital libraries; meta data; middleware; search engines; CiteSeerx; SeerSuite digital library search engine framework; application binary dependencies; bibliometric data; citation graph; cloud computing environment; complex metadata extraction system; message oriented middleware architecture; publish-subscribe pattern; Context; Crawlers; Data mining; Feature extraction; Message-oriented middleware; Portable document format; Cloud Computing; Information Extraction; Information Retrieval; Message Oriented Middleware;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud Engineering (IC2E), 2013 IEEE International Conference on
Conference_Location :
Redwood City, CA
Print_ISBN :
978-1-4673-6473-7
Type :
conf
DOI :
10.1109/IC2E.2013.41
Filename :
6529279
Link To Document :
بازگشت