Title :
Leveraging thermal dynamics in sensor placement for overheating server component detection
Author :
Wang, Xiaodong ; Wang, Xiaorui ; Xing, Guoliang ; Lin, Cheng-Xian
Author_Institution :
Ohio State Univ., Columbus, OH, USA
Abstract :
Server overheating has become a well-known issue in today´s data centers that host a large number of high-density servers. The current practice of server overheating detection is to monitor the server inlet temperature with the temperature sensor on the server enclosure, or the CPU temperature with on-die thermal sensors. However, this is in contrast to the fact that different components in a server may have different overheating thresholds, which are closely related to their respective thermal failure rates and expected lifetimes. Moreover, the thermal correlation between the inlet (or CPU) and other server components can be different for every server model. As a result, relying on the single inlet or CPU temperature for server overheating detection is over-simplistic, which may lead to either degraded detection performance or false alarms that can result in excessive cooling power, leading to unnecessarily low inlet temperature. In this paper, we propose a model-based approach that leverages thermal dynamics to intelligently choose sensor placement locations for precise overheating server component detection. We first formulate the detection problem as a constrained optimization problem. We then adopt Computational Fluid Dynamics (CFD) to establish the thermal model and analyze the thermal status of the server enclosure under various overheating conditions, such as inlet overheating, fan failures and CPU overloading. Based on the CFD analysis, we apply data fusion and advanced optimization techniques to find a near-optimal solution for sensor placement locations, such that the probability of detecting different overheating components is significantly improved. Our empirical results on a real rack server testbed demonstrate the detection performance of our solution. Extensive simulation results also show that the proposed solution outperforms other commonly used overheating monitoring solutions in terms of detection probability and error rate.
Keywords :
computational fluid dynamics; computer centres; cooling; intelligent sensors; multiprocessing systems; performance evaluation; power aware computing; probability; sensor fusion; sensor placement; temperature sensors; CFD analysis; CPU overloading; CPU temperature; computational fluid dynamics; constrained optimization problem; cooling power; data centers; data fusion; detection performance; detection probability; error rate; expected lifetime; fan failures; high-density servers; inlet overheating; intelligent sensor; model-based approach; on-die thermal sensors; overheating server component detection; overheating threshold; real rack server testbed; sensor placement location; server enclosure; server inlet temperature monitoring; temperature sensor; thermal dynamics leveraging; thermal failure rate; thermal status analysis; Computational fluid dynamics; Mathematical model; Monitoring; Servers; Temperature measurement; Temperature sensors;
Conference_Titel :
Green Computing Conference (IGCC), 2012 International
Conference_Location :
San Jose, CA
Print_ISBN :
978-1-4673-2155-6
Electronic_ISBN :
978-1-4673-2153-2
DOI :
10.1109/IGCC.2012.6322273