• DocumentCode
    769342
  • Title

    Vectorized transforms in scalar processors

  • Author

    Trelewicz, J.Q. ; Mitchell, Joan L. ; Brady, Michael T.

  • Author_Institution
    IBM Almaden Res. Center, San Jose, CA, USA
  • Volume
    19
  • Issue
    4
  • fYear
    2002
  • fDate
    7/1/2002 12:00:00 AM
  • Firstpage
    22
  • Lastpage
    31
  • Abstract
    We disclose a generalized approach to creating efficient implementations of linear, orthogonal transforms, with specific examples discussed for the 8 x 8 DCT used in image compression. We connect this with a method for performing signed, parallel processing in scalar, off-the-shelf processors for integer transforms. Uniform data precision may be used, but is not required for the method. The coefficients resulting from the new algorithm converge more quickly than the approximation made to the coefficients. Furthermore, the new algorithm allows more control of the specific representation chosen for the coefficients, as is detailed below. The methods described were designed for addressing this need with two\´s-complement arithmetic. Data that can be processed in parallel, because of the algorithm structure, are packed in a "vector" format, described, into registers. Many signed arithmetic operations can be performed on these vectors, including addition, subtraction, multiplication by scalars, shifting, and others. When the parallel processing is completed, the vectors can be unpacked into scalar values for storage or subsequent processing. The importance of these methods lies in their handling of carries and borrows in the packed vector format. The generalized method is described. Notation is given at the beginning to establish consistency through the article. We discuss a generalized approach to integer transforms, using the DCT as a specific example. Then we detail the vector format, which allows vector computation in scalar processors of parallelizable algorithms. The IDCT is used as a numerical example in the discussion of the vector format. The results were developed for high-end printers (e.g., more than 100 pages per minute), where image compression and decompression must be performed in real time, either in FPGAs, or in embedded processors; however, the methods are applicable to a broad range of signal processing systems
  • Keywords
    data compression; digital arithmetic; discrete cosine transforms; embedded systems; field programmable gate arrays; image coding; inverse problems; parallel algorithms; parallel architectures; transform coding; DCT; FPGA; addition; algorithm structure; coefficients convergence; embedded processors; high-end printers; image compression; image decompression; integer transforms; linear orthogonal transforms; multiplication; parallelizable algorithms; registers; scalar off-the-shelf processors; signal processing systems; signed arithmetic operations; signed parallel processing; subtraction; two´s-complement arithmetic; uniform data precision; vector format; vectorized transforms; Approximation algorithms; Arithmetic; Concurrent computing; Design methodology; Discrete cosine transforms; Image coding; Image converters; Parallel processing; Printers; Signal processing algorithms;
  • fLanguage
    English
  • Journal_Title
    Signal Processing Magazine, IEEE
  • Publisher
    ieee
  • ISSN
    1053-5888
  • Type

    jour

  • DOI
    10.1109/MSP.2002.1012347
  • Filename
    1012347