DocumentCode :
2217808
Title :
Storage characterization for unstructured data in online services applications
Author :
Sankar, Sriram ; Vaid, Kushagra
Author_Institution :
Global Found. Services (GFS), Microsoft Corp., CA, USA
fYear :
2009
fDate :
4-6 Oct. 2009
Firstpage :
148
Lastpage :
157
Abstract :
Mega datacenters hosting large scale Web services have unique workload attributes that need to be taken into account for optimal service scalability. Provisioning compute and storage resources to provide a seamless user experience is challenging since customer traffic loads vary widely across time and geographies, and the servers hosting these applications have to be rightsized to provide both performance within a single server and across a scale-out cluster. Typical user-facing Web services have a three tiered hierarchy - front-end Web servers, middle-tier application logic, and back-end data storage and processing layer. In this paper, we address the challenge of disk subsystem design for back-end servers hosting large amounts of unstructured (also called blob) data. Examples of typical content hosted on such servers include user generated content such as photos, email messages, videos, and social networking updates. Specific server applications analyzed in this paper correspond to the message store of a large scale email application, image tile storage for a large scale geo-mapping application, and user content storage for Web 2.0 type applications. We analyze the storage subsystems for these Web services in a live production environment and provide an overview of the disk traffic patterns and access characteristics for each of these applications. We then explore time-series characteristics and derive probabilistic models showing state transitions between locations on the data volumes for these applications. We then explore how these probabilistic models could be extended into a framework for synthetic benchmark generation for such applications. Finally, we discuss how this framework can be used for storage subsystem rightsizing for optimal scalability of such backend storage clusters.
Keywords :
Web services; disc storage; electronic mail; file servers; online front-ends; probability; social networking (online); time series; Web 2.0; Web service; back-end data storage; blob data; customer traffic load; disk subsystem design; front-end Web server; geomapping application; image tile storage; large scale email application; middle-tier application logic; online services application; optimal service scalability; probabilistic model; social networking update; state transition; storage characterization; three tiered hierarchy; time-series characteristics; unstructured data; user content storage; workload attribute; Geography; Image storage; Large-scale systems; Logic; Memory; Network servers; Scalability; Telecommunication traffic; Web server; Web services;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on
Conference_Location :
Austin, TX
Print_ISBN :
978-1-4244-5156-2
Electronic_ISBN :
978-1-4244-5157-2
Type :
conf
DOI :
10.1109/IISWC.2009.5306786
Filename :
5306786
Link To Document :
بازگشت