Title : 
Level-3 BLAS on the TI C6678 Multi-core DSP
         
        
            Author : 
Ali, Murtaza ; Stotzer, Eric ; Igual, Francisco D. ; Van de Geijn, Robert A.
         
        
        
        
        
        
            Abstract : 
Digital Signal Processors (DSP) are commonly employed in embedded systems. The increase of processing needs in cellular base-stations, radio controllers and industrial/medical imaging systems, has led to the development of multi-core DSPs as well as inclusion of floating point operations while maintaining low power dissipation. The eight-core DSP from Texas Instruments, codenamed TMS320C6678, provides a peak performance of 128 GFLOPS (single precision) and an effective 32 GFLOPS(double precision) for only 10 watts. In this paper, we present the first complete implementation and report performance of the Level-3 Basic Linear Algebra Subprograms(BLAS) routines for this DSP. These routines are first optimized for single core and then parallelized over the different cores using OpenMP constructs. The results show that we can achieve about 8 single precision GFLOPS/watt and 2.2double precision GFLOPS/watt for General Matrix-Matrix multiplication (GEMM). The performance of the rest of theLevel-3 BLAS routines is within 90% of the corresponding GEMM routines.
         
        
            Keywords : 
digital signal processing chips; embedded systems; linear algebra; matrix multiplication; message passing; power aware computing; OpenMP construct; TI C6678 multicore DSP; TMS320C6678; Texas Instruments; cellular base-station; digital signal processor; embedded system; floating point operation; general matrix-matrix multiplication; industrial imaging system; level-3 basic linear algebra subprograms routine; low power dissipation; medical imaging system; power 10 W; radio controller; Computer architecture; Digital signal processing; Kernel; Libraries; Linear algebra; Random access memory; System-on-a-chip; BLAS; DSPs; linear algebra;
         
        
        
        
            Conference_Titel : 
Computer Architecture and High Performance Computing (SBAC-PAD), 2012 IEEE 24th International Symposium on
         
        
            Conference_Location : 
New York, NY
         
        
        
            Print_ISBN : 
978-1-4673-4790-7
         
        
        
            DOI : 
10.1109/SBAC-PAD.2012.26