مرکز منطقه ای اطلاع رساني علوم و فناوري - Computing with stochastic processors: Revisiting the correctness contract between software and hardware

Abstract :

Moore´s Law, the primary driver for the astonishing advances in computing, is being threatened due to manufacturing and environmental variations and the resulting non-determinism in the circuit behavior. The non-determinism is exacerbated by the dynamic voltage and timing variations caused by the unprecedented increase in the power density of the chips and the time and context-dependent variation in temperature and utilization across the chip. The most immediate impact of such non-determinism is on chip yields and manufacturing costs. Recent predictions suggest that unless chip yields improve or manufacturing costs are tamed, process shrinks beyond 18nm may become infeasible. Clearly the status quo cannot continue, and we must find a solution to the non-determinism problem´ if the semiconductor industry has to remain a viable driver of technological innovation and capabilities for the future. In this talk, I will argue that the problem is not non-determinism per se, but the way computer system designers treat it. The chip components have become stochastic, yet, the basic approach to designing and operating computing machines has remained unchanged. Software continues to expect hardware to behave flawlessly for all inputs under all conditions, while hardware is overdesigned to meet this software mindset. I will argue that the cost of maintaining the abstraction of flawless hardware will soon become prohibitive and that we need to fundamentally rethink the correctness contract between hardware and software. Instead of computing machines where the hardware variations are hidden from the software through over-design, I will present a vision of computing machines where a) these variations are fully exposed to the highest layers of software in form of hardware errors, and b) errors are managed through architectural and design techniques to maximize power savings afforded by relaxed correctness. The hardware would be deliberately under-designed with relaxed constraints t- - o allow errors, especially for rare computation, and produce stochastically correct results even under nominal conditions. The software would be aware of hardware errors and proactively self-adapt. We call such under-designed processors that produce only stochastically correct results even under nominal conditions and rely on software adaptability and architectural resilience for tolerating errors, stochastic processors. We call the applications that have been implemented to be adaptively error-tolerant, stochastic applications. I will discuss approaches to architect and design stochastic processors and stochastic applications.