Title :
Meld: A Real-Time Message Logic Debugging System for Distributed Systems
Author :
Tu, Xuping ; Jin, Hai ; Fan, Xuepeng ; Ye, Jiang
Author_Institution :
Services Comput. Technol. & Syst. Lab., Huazhong Univ. of Sci. & Technol., Wuhan, China
Abstract :
The largest difference between a distributed and a non-distributed system is that the former introduces network messages to the system. Network messages bring the scalability to a distributed system as well as complexity to it. Testing large-scale distributed systems is a great challenge, because some errors happen after a distributed sequence of events that involves machine and network failures. Meld is a checker that allows developers to specify expected message logic on a deployed distributed system, and that verifies these logics while the system is running. When Meld finds a problem it starts collecting more information that led to the problem, allowing developers to quickly find the root cause. Developers write message logics on Meld and Meld verifies them through analyzing the collected abstract of messages. By using binary instrumentation, Meld works almost transparently with debugged systems and can change logics to be checked at runtime. An evaluation with a deployed system shows that Meld can detect non-trivial correctness at runtime.
Keywords :
distributed programming; program debugging; real-time systems; system recovery; Meld; binary instrumentation; distributed event sequence; large-scale distributed system; machine failure; network failure; network message; real-time message logic debugging system; Computer bugs; Debugging; Monitoring; Payloads; Receivers; Runtime; Servers; distributed debugging; distributed system; message logic;
Conference_Titel :
Services Computing Conference (APSCC), 2010 IEEE Asia-Pacific
Conference_Location :
Hangzhou
Print_ISBN :
978-1-4244-9396-8
DOI :
10.1109/APSCC.2010.104