DocumentCode
704733
Title
Where does the time go? characterizing tail latency in memcached
Author
Blake, Geoffrey ; Saidi, Ali G.
fYear
2015
fDate
29-31 March 2015
Firstpage
21
Lastpage
31
Abstract
To function correctly Online, Data-Intensive (OLDI) services require low and consistent service times. Maintaining predictable service times entails requiring 99th or higher percentile latency targets across hundreds to thousands of servers in the data-center. However, to maintain the 99th percentile targets servers are routinely run well below full utilization. The main difficulty in optimizing a server to run closer to peak utilization and maintain predictable 99th percentile response latencies is identifying and mitigating the causes of a request missing the target service time. In practice this analysis is challenging as requests and responses overlap their execution with respect to one another and traverse multiple layers of software, user/kernel protection boundaries, and the hardware/software divide. Traditional profiling methods that record the time being spent in each function usually yield few clues as to where a bottleneck may be present due to the many layers of software each consuming only a small fraction of time each. In this work we analyze the end-to-end sources of latency in a Memcached server from the wire through the kernel into the application and back again. To do so, we develop a tool that utilizes the Linux SystemTap infrastructure to measure latency throughout the many software layers that make up the complete request and response path for Memcached. While memory copies and the Linux networking stack are often suggested as major contributors to latency, we find that the main cause of missing response latency guarantees is the formation of standing queues and the application´s inability to detect and remedy this situation.
Keywords
Linux; cache storage; computer centres; parallel processing; storage management; Linux systemtap infrastructure; Memcached server; OLDI service; data center; end-to-end latency source; online data-intensive service; tail latency; Hardware; IP networks; Kernel; Linux; Probes; Servers;
fLanguage
English
Publisher
ieee
Conference_Titel
Performance Analysis of Systems and Software (ISPASS), 2015 IEEE International Symposium on
Conference_Location
Philadelphia, PA
Type
conf
DOI
10.1109/ISPASS.2015.7095781
Filename
7095781
Link To Document