مرکز منطقه ای اطلاع رساني علوم و فناوري - Design and VLSI Implementation of a Concurrent Solver for N-Coupled Least-Squares Fitting Problems

Abstract :

Most algorithms for high,quality modeling and coding of stochastic sequences (speech or images) make extensive use of matrix operations. Because of the high computational complexity of these operations, the use of conventional implementation techniques and architecture designs would almost certainly rule out such algorithms as candidates for real-time signal processing. In this paper, we present an algorithm and its mapping on a VLSI architecture for the solution of $N (n +1)$ by $(n +1)$ systems of linear equations, which arise from a speech coding algorithm. The systems of equations form an ordered set of equations and they mutually exhibit rank 1 differences. This property is exploited to obtain concurrently the solution of all equations. Via an analysis of the algebraic structure of the systems of equations, we succeed in reducing the complexity to a single matrix inversion, while enhancing the regularity of the algorithm, e.g., by including the back substitution in the main factorization loop. Next, we proceed to map the algorithm on VLS1 hardware, using a very systematic hierarchical temporal/structural decomposition/ partitioning approach. To achieve high throughput, we make extensive use of pipelining and show how a pipelined CORDIC processor element supports the desired operations. The complete equation solver is build around two pipelined CORDIC processor elements and two FIFO-type memories. The solver fits on three VLSI chips of size 6.5*6.5 mm²in a standard-slow-NMOS technology. The chips are of medium complexity and the resulting floorplan is shown. The resulting architecture achieves a very high throughput with minimal dataflow-oriented hardware.