N=2n: Fixed Size FFT

Here performance and resource usage data are provided for two fixed-point FFT examples and compared to equivalent Intel versions. Both the Centar and Intel circuits were compiled using Intel’s software tools (Quartus II) and the same target hardware (Stratix III EP3SE50F484C2 FPGA). Circuit comparisons are with Intel’s radix-4 20-bit pipelined FFTs (v10.1) that use BFP (single exponent per FFT block) to achieve a similar signal-to-quantization-noise-ratio (SQNR) as the Centar FFT circuits and keep the input/output word lengths the same. The TimeQuest static timing analyzer was used to determine maximum clock frequencies (Fmax) at 1.1V and 85C (worst case settings).

Both circuits support data “streaming” or continuous normal order input/output of data. The SQNR estimates were obtained from Matlab bit-accurate RTL simulations based on “single tone” full scale (real) input data sets with random frequencies and phase. Intel’s Matlab models were used for their circuits.

Table 1. Comparison of fixed point, streaming (normal order in/out) circuits using a Stratix III EP3SE50F484C2 FPGA as target device.

Intel Centar v1 Centar v2 Intel Centar
20 bits 16 bits 16 bits 20 bits 16 bits
Transform Size 256pts 1024pts
ALMs 4262 3982 4339 4394 4357
Memory (Kbits) 49 40.6 31.6 195 145
M9Ks 38 31 15 38 31
Multipliers (18-bits) 24 33 33 24 33
Fmax (data rate,MHz) 387 533 566 382 533
SQNR 87.8 86.7 86.7 81.3 82.9
µJ/FFT 1.29 1.12   6.36 4.31

In Table 1 the adaptive logic module (ALM) is the basic unit of a Stratix III FPGA (one 8-input LUT, two registers plus other logic). Comparison with Xilinx Virtex 5 devices can be made by noting that two M9K memories are equivalent to a Xilinx BRAM and that an ALM is equivalent to between 1.2 and 1.8 LEs, since benchmark studies show 1 ALM=1.2 LEs (Xilinx white paper WP284 v1.0, December 19, 2007) and 1 ALM=1.8 LEs (Intel white paper), respectively.

The memory size (Kbits) indicates the total used memory in the M4Ks and is a measure of how fully utilized they are. For 256-point transforms two design examples are provided, one of which (v2) uses LUTs to replace some inefficiently populated M9Ks.

Power estimates (microjoules/FFT) were obtained from the PowerPlay analyzer tool using value change dump (vcd) files from Modelsim simulations to obtain accurate toggle rates.

Circuit operation at 500MHz has been verified using an Intel Stratix III development kit, which included an EP3SL150F1152C2 FPGA.

The Fmax values were based on the best of ~20 seeds for each circuit.