• DocumentCode
    2283566
  • Title

    A Domain-Specific On-Chip Network Design for Large Scale Cache Systems

  • Author

    Jin, Yuho ; Kim, Eun Jung ; Yum, Ki Hwan

  • Author_Institution
    Dept. of Comput. Sci., Texas A&M Univ., College Station, TX
  • fYear
    2007
  • fDate
    10-14 Feb. 2007
  • Firstpage
    318
  • Lastpage
    327
  • Abstract
    As circuit integration technology advances, the design of efficient interconnects has become critical. On-chip networks have been adopted to overcome scalability and the poor resource sharing problems of shared buses or dedicated wires. However, using a general on-chip network for a specific domain may cause underutilization of the network resources and huge network delays because the interconnects are not optimized for the domain. Addressing these two issues is challenging because in-depth knowledges of interconnects and the specific domain are required. Non-uniform cache architectures (NUCAs) use wormhole-routed 2D mesh networks to improve the performance of on-chip L2 caches. We observe that network resources in NUCAs are underutilized and occupy considerable chip area (52% of cache area). Also the network delay is significantly large (63% of cache access time). Motivated by our observations, we investigate how to optimize cache operations and and design the network in large scale cache systems. We propose a single-cycle router architecture that can efficiently support multicasting in on-chip caches. Next, we present fast-LRU replacement, where cache replacement overlaps with data request delivery. Finally we propose a deadlock-free XYX routing algorithm and a new halo network topology to minimize the number of links in the network. Simulation results show that our networked cache system improves the average IPC by 38% over the mesh network design with multicast promotion replacement while using only 23% of the interconnection area. Specifically, multicast fast-LRU replacement improves the average IPC by 20% compared with multicast promotion replacement. A halo topology design additionally improves the average IPC by 18% over a mesh topology
  • Keywords
    cache storage; logic design; multiprocessor interconnection networks; network routing; network topology; system-on-chip; cache replacement; circuit integration technology; deadlock-free XYX routing algorithm; domain-specific on-chip network design; fast-LRU replacement; halo network topology; large scale cache systems; nonuniform cache architectures; on-chip L2 caches; single-cycle router architecture; wormhole-routed 2D mesh networks; Delay effects; Integrated circuit interconnections; Large-scale systems; Mesh networks; Network topology; Network-on-a-chip; Resource management; Scalability; System-on-a-chip; Wires;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computer Architecture, 2007. HPCA 2007. IEEE 13th International Symposium on
  • Conference_Location
    Scottsdale, AZ
  • Print_ISBN
    1-4244-0805-9
  • Electronic_ISBN
    1-4244-0805-9
  • Type

    conf

  • DOI
    10.1109/HPCA.2007.346209
  • Filename
    4147672