Author :
Esmaeilzadeh, Hadi ; Blem, Emily ; Amant, Renée St ; Sankaralingam, Karthikeyan ; Burger, Doug
Author_Institution :
Univ. of Washington, Seattle, WA, USA
Abstract :
Since 2005, processor designers have increased core counts to exploit Moore´s Law scaling, rather than focusing on single-core performance. The failure of Dennard scaling, to which the shift to multicore parts is partially a response, may soon limit multicore scaling just as single-core scaling has been curtailed. This paper models multicore scaling limits by combining device scaling, single-core scaling, and multicore scaling to measure the speedup potential for a set of parallel workloads for the next five technology generations. For device scaling, we use both the ITRS projections and a set of more conservative device scaling parameters. To model single-core scaling, we combine measurements from over 150 processors to derive Pareto-optimal frontiers for area/performance and power/performance. Finally, to model multicore scaling, we build a detailed performance model of upper-bound performance and lower-bound core power. The multicore designs we study include single-threaded CPU-like and massively threaded GPU-like multicore chip organizations with symmetric, asymmetric, dynamic, and composed topologies. The study shows that regardless of chip organization and topology, multicore scaling is power limited to a degree not widely appreciated by the computing community. Even at 22 nm (just one year from now), 21% of a fixed-size chip must be powered off, and at 8 nm, this number grows to more than 50%. Through 2024, only 7.9× average speedup is possible across commonly used parallel workloads, leaving a nearly 24-fold gap from a target of doubled performance per generation.
Keywords :
graphics processing units; microprocessor chips; multiprocessing systems; network topology; parallel processing; Dennard scaling; ITRS projections; Moore´s law scaling; Pareto-optimal frontiers; asymmetric topology; chip topology; composed topology; computing community; dark silicon; device scaling parameters; dynamic topology; fixed-size chip; lower-bound core power; massively threaded GPU-like multicore chip organizations; multicore designs; multicore parts; multicore scaling limits; parallel workloads; performance model; processor designers; single-core performance; single-core scaling; single-threaded CPU-like multicore chip organizations; speedup potential; technology generations; upper-bound performance; Instruction sets; Microarchitecture; Multicore processing; Organizations; Performance evaluation; Topology; Transistors; Dark Silicon; Modeling; Multicore; Power; Technology Scaling;