The architecture for Centar’s FFT circuits for which the transform size is a power-of-two is based on a new matrix equation form of the discreet Fourier transform (DFT) which is derived using decimation in time and frequency. For a transform length M, this form is
Y=Wb•CM1 X
Z = CM2 Yt
where Wb is an M/4 x M/4 coefficient matrix, CM1 is an M/4 x 4 coefficient matrix, X is a 4 x M/4 input matrix, CM2 is a 4 x M/4 coefficient matrix, Z is a 4 x M/4 output matrix containing the transform outputs, and “ •” means element-by-element multiply. Above, the matrix-matrix products CM1X and CM2Yt involve only exchanges of real and imaginary parts plus additions because the elements of CM1 and CM2 contain only ±1 or ±j. Also, the usual “Z=CX” form of the DFT requires M2 complex multiplications, vs. the (M/4)2 in above. (For more details about the algorithmic basis see the Technology section.)
Functionally, the implementation of this matrix form equation can be mapped to a simple systolic (pipelined) structure containing two 4×4 arrays of adders for the matrix-matrix products and a linear array of multipliers to do the element-by-element multiplies as shown in Fig. 1(a) and (b). Here, CM1 and Z are stored internally (in RAMs) and X and CM2 flow into the array in pipelined fashion from the boundaries. The length of the array is M/4 and therefore scales with transform size M in the north-south direction as shown in the figure. Each of the small boxes (processing elements or “PEs”) contain either a complex adder or multiplier, a couple of registers, a small memory and communicates locally with its neighbor.
Fig.1 (a) Functional operation of base-4 architecture (b=4) and (b) circuit implementation for M=16
showing inputs at different times t and internal matrix values.
The DFT transform is computed using the arrays in Fig. 1 based on the well-known row/column factorization, N= Nr Nc , where N is the desired transform length and Nc and Nr are the number of columns/rows. This approach requires calculation of two sets of DFTs (using the formula above), Nc transforms of length Nr (referred to as “column” transforms) and Nr transforms of length Nc (referred to as “row” transforms). In between column and row transforms it is necessary to multiply each of the N intermediate values by a corresponding twiddle factor,(WN)nk, n=0,1,..,Nc–1, k=0,1,..Nr–1. (Without the twiddle multiplication a 2‑D DFT is performed.) The column and row DFTs are performed sequentially using the same hardware array with M=Nr for the column DFTs and M=Nc for the row DFTs.