A “variable” FFT is a circuit that can compute a number of different transform sizes, whose size is chosen at “runtime”. For example 4G wireless protocols usually require the ability to compute sizes from 128points to 2048points.
Here performance and resource usage data is provided on a 2048point variable FFT example and compared to an equivalent Intel radix2^{2} single delay feedback version (IP v10.1), modified to permit extraction of smaller transform sizes results earlier in the pipeline. Both the Centar and Intel circuits were compiled using Intel’s software tools (Quartus II) and the same target hardware (Stratix III EP3SE50F484C2 FPGA).
The TimeQuest static timing analyzer was used to determine maximum clock frequencies (Fmax) at 1.1V and 85C (worst case settings). Both circuits support data “streaming” or continuous normal order input/output of data. The signaltoquantizationnoiseratio (SQNR) estimates were obtained from Matlab bitaccurate RTL simulations based on “single tone” full scale (real) input data sets with random frequencies and phase. Intel’s Matlab models were used for their circuits.
Intel’s radix2^{2} single delay feedback architecture uses natural word growth to accommodate dynamic range. Simulations showed that a 16bit fixedpoint input with a 30bit fixedpoint output results in equivalent SQNR compared to the Centar design with a 16bit input, 16bit output and 4bit exponent.
Table 1. Comparison of variable FFT, streaming (normal order in/out) circuits using a Stratix III EP3SE50F484C2 FPGA as target device.
Centar  Intel  
Technology 
65nm 
65nm 
FPGA 
Stratix III 
Stratix III 
ALMs 
4331 
3826 
LUTs 
6426 
4101 
Registers 
6718 
6549 
Memory (Kbits) 
290 
208 
Memory (M9K) 
42 
30 
Multipliers (18bits) 
33 
36 
Fmax (data rate, MHz) 
523 
315 
SQNR (average, 5 transform sizes) 
83.9 
77.3 
In Table 1 the adaptive logic module (ALM) is the basic unit of a Stratix III FPGA (one 8input LUT, two registers plus other logic). Comparison with Xilinx Virtex 5 devices can be made by noting that two M9K memories are equivalent to a Xilinx BRAM and that an ALM is equivalent to between 1.2 and 1.8 LEs, since benchmark studies show 1 ALM=1.2 LEs (Xilinx white paper WP284 v1.0, December 19, 2007) and 1 ALM=1.8 LEs (Intel white paper), respectively.
The memory size (Kbits) indicates the total used memory in the M9Ks and is a measure of how fully utilized they are.
Operation at 500MHz has been verified using an Intel Stratix III development kit, which included an EP3SL150F1152C2 FPGA.
The Fmax values were based on the best of ~20 seeds for each circuit.
As can be seen in Table 1 the Centar circuit provides much higher speeds, >500MHz vs. ~300MHz data rates . It also generates a much smaller 16bit output (vs. 30bits for the Intel circuit) so that “downstream” processing will use less memory, run potentially faster and consume less logic hardware. The Centar circuit can be programmed to do any poweroftwo computation. Also, nonpoweroftwo sizes can be included as well, something that isn’t possible with traditional FFT circuits.
Another set of resource usage, plus a comparison to an Intel equivalent (IP v17.1), is shown in Table 2 below for a Stratix IV EP4SGX530KH40C3 FPGA (3 speed grade). It’s important to note in this table that the Centar Fmax of 490MHz is limited not by the FPGA fabric, but is a result of the maximum operating speed of the simple dual port embedded RAMs used. The TimeQuest generated SA Fmax associated with the LUT/register fabric is >500MHz.
Table 2. Comparison of variable FFT, streaming (normal order in/out) circuits using a Stratix IV EP4SGX530KH40C3 FPGA as target device.
Centar 
Intel 

Technology 
40nm 
40nm 
FPGA 
Stratix IV 
Stratix IV 
ALMs 
4785 
6089 
LUTs 
7020 
5453 
Registers 
7044 
9752 
Memory (Kbits) 
290 
203 
Memory (M9K) 
42 
28 
Multipliers (18bits) 
33 
68 
Fmax (data rate, MHz) 
490 
283 
SQNR (average, 5 transform sizes) 
84 
90 
FFT 2048pts (µs) 
4.2 
7.2 