Title :
Synthesis of heterogeneous distributed architectures for memory-intensive applications
Author :
Huang, Chao ; Ravi, Srivaths ; Raghunathan, Anand ; Jha, Niraj K.
Author_Institution :
Dept. of Electr. Eng., Princeton Univ., NJ, USA
Abstract :
Memory-intensive applications present unique challenges to an ASIC designer in terms of the choice of memory organization, memory size requirements, bandwidth and access latencies, etc. The high potential of single-chip distributed logic-memory architectures in addressing many of these issues has been recognized in general-purpose computing, and more recently in ASIC design. However, such architectures will be adopted widely by designers only when general techniques and tools for efficient high-level synthesis (HLS) of multi-partitioned ASICs become available. The techniques presented in this paper are motivated by the fact that many memory-intensive applications exhibit irregular array data access patterns (due to conditionals in loop nests, etc.). Synthesis should, therefore, be capable of determining a partitioned architecture, wherein array data and computations may have to be heterogeneously distributed for achieving the best performance speedup. Furthermore, the synthesis methodology should not be restricted by the nature of array index functions (affine or otherwise) in a behavior. Therefore, our methodology employs simulation to provide information about the access patterns of array data references in a behavior, which is used by the rest of our analysis. We use a combination of clustering and min-cut style partitioning techniques to partition the behavior into sub-behaviors while considering various factors including data access locality, balanced workloads, inter-partition communication, etc. Finally, we also employ an iterative improvement strategy to determine the best way of distributing array data into physical memory in each partition. Our experiments with several benchmark applications show that the proposed techniques can yield partitioned architectures that can achieve up to 2.2X performance speed-up over conventional HLS solutions, while achieving up to 1.6A" performance speedup over the best homogeneous partitioning solution feasible.
Keywords :
application specific integrated circuits; data flow graphs; distributed memory systems; high level synthesis; memory architecture; ASIC design; ASIC designer; HLS; access latencies; array data access patterns; array index functions; balanced workloads; bandwidth; benchmark applications; clustering techniques; data access locality; heterogeneous distributed architectures; high level synthesis; interpartition communication; memory intensive applications; memory organization; memory size requirements; min-cut style partitioning techniques; physical memory; single chip distributed logic memory architectures; Analytical models; Application specific integrated circuits; Bandwidth; Computational modeling; Computer architecture; Delay; Distributed computing; High level synthesis; Information analysis; Memory architecture;
Conference_Titel :
Computer Aided Design, 2003. ICCAD-2003. International Conference on
Conference_Location :
San Jose, CA, USA
Print_ISBN :
1-58113-762-1
DOI :
10.1109/ICCAD.2003.159669