To compute the discreet Fourier transform (DFT) the traditional “row/column” approach is used. This factorization assumes the transform size N can be written the product N =N_{1}*N_{2} and requires computation of two sets of smaller DFTs, N_{2} transforms of length N_{1} (referred to as “column” transforms) and N_{1} transforms of length N_{2} (referred to as “row” transforms). The N_{1}xN_{2} “DFT matrix” X contains input samples x_{i} that are arranged x_{1},x_{2},…,x_{N2} on row 1, x_{N2+}_{1},x_{N}_{2+2},…,x_{2*}_{N2}, on row 2, etc. In between column and row transforms it is necessary to multiply each of the N points by the usual twiddle factor, W_{N}^{i,k}, i=0,1,..,N_{1}-1, k=0,1,..N_{2}-1. After the row transforms the DFT output Z resides in the matrix in column major order.
It remains to be shown how the N_{2}-point row and N_{1}-point column DFTs are computed. Centar’s approach is very different from the usual mapping of decimation-in-time and decimation-in-frequency “signal flow” graphs onto linear and parallel circuit structures. Rather a new matrix based algorithm is used to compute the DFT which naturally maps into a succession of radix-b matrix-matrix multiplications that can be done at high speed with minimal complexity using small, locally connected systolic arrays. This derivation starts with the definition of the DFT as
(1) |
where M is the transform size (M=N_{2} for the row DFTs and M=N_{1} for the column DFTs), X(m) are the time domain input values, Z(k) are the frequency domain outputs and W_{M}=e^{-j(2π/M)}. In matrix terms (1) can be represented as
(2) |
where C is a coefficient matrix containing elements W_{M}^{mk}. If M can be factored as M=N_{3}*N_{4}, then using the reindexings m=m_{1}+N_{3}m_{2} and k=k_{1}+N_{3}k_{2} with m_{1}=0,1,…,N_{3}-1, k_{1}=0,1,…,N_{3}-1, m_{2}=0,1,…,N_{4}-1, k_{2}=0,1,…,N_{4}-1 , (1) becomes
(3) |
This expression can be considerably simplified by imposing the restriction that N_{3}/N_{4} be an integer value so that (W_{N4})^{m2k2}^{N3}=exp(-2πm_{2}k_{2}N_{3}/N_{4})=1. For any particular value of m_{1},k_{1} the value of the inner parenthesis in (3), Y(k_{1},m_{1}) can be evaluated from the dot product
so that (3) becomes
(4) |
All the (N_{3})^{2} dot product values Y(k_{1},m_{1}) can be collected in the N_{3}xN_{3} matrix Y_{b} by performing the matrix multiplication
(5) |
where W_{M} is an N_{3}xN_{3} matrix with elements W_{M}[k_{1},m_{1}]=W_{M}^{m1k1}, C_{M1} is an N_{3}xN_{4} coefficient matrix with elements C_{M1}[k_{1},m_{2}]=W_{M}^{m2k1} , X_{b} is an N_{4}xN_{3} matrix with elements X_{b}[m_{2},m_{1}]=X(m_{1}+N_{3}m_{2}), Y_{b} is a N_{3}xN_{3} matrix with elements Y_{b}[k_{1},m_{1}]=Y(k_{1},m_{1}) and “•” means element-by-element multiply.
In a similar way for a particular k_{1},k_{2} the corresponding Z(k_{1},N_{3}k_{2}) can be calculated from the dot product
(6) |
and by collecting the dot products as before, a matrix expression for calculating Z is obtained as
(7) |
where C_{M2} is an N_{4}xN_{3} coefficient matrix with elements C_{M2}[k_{2},m_{1}]=W_{M}^{m1k2} , and Z_{b} is an N_{4}xN_{3} matrix containing the transform outputs Z_{b}[k_{2},k_{1}]=Z(k_{1}+k_{2}N_{3}).
The character of (7) is determined largely by the value of N_{4} or the “base” b (b=N_{4}) because this sets the reachable values of M and the structure of the coefficient matrices C_{M1} and C_{M2}. In (7) C_{M1} and C_{M2} contain M/b^{2} sub-matrices C_{B}=[c_{1}|c_{2}|...|c_{b}] with the form C_{M1}=[C_{B}^{t}|C_{B}^{t}|...|C_{B}^{t}] ^{t } and C_{M2}=[C_{B}|C_{B}|...|C_{B}] due to the periodicity of b or W_{N4}. Also, values of M are constrained to be integer multiples of b^{2}, since it was assumed in (3) that N_{3}/N_{4} is an integer. Although the choice of b is application dependent, for power-of-two designs only “base-4” (b=4) designs are chosen because this choice provides good architectures that are arithmetically efficient. This selection results in
and
(8) |
where C_{B} in (8) is the coefficient matrix for a 4-point DFT and also describes a radix-4 decimation in time butterfly. Therefore in (7) Y_{b} can be seen as resulting from a series of 4-point transforms of a bit-reversed input X followed by a twiddle multiplication and Z_{b} is obtained from summations of the results of 4-point transforms of (Y_{b})^{t}.