Centar
c
N=2n: Variable FFT

A “variable” FFT is a circuit that can compute a number of different transform sizes, whose size is chosen at “run-time”.  For example 4G wireless protocols usually require the ability to compute sizes from 128-points to 2048-points.

Here performance and resource usage data is provided on a 2048-point variable FFT example and compared to an equivalent Intel radix-22 single delay feedback version (IP v10.1), modified to permit extraction of smaller transform sizes results earlier in the pipeline. Both the Centar and Intel circuits were compiled using Intel’s software tools (Quartus II) and the same target hardware (Stratix III EP3SE50F484C2 FPGA).

The TimeQuest static timing analyzer was used to determine maximum clock frequencies (Fmax) at 1.1V and 85C (worst case settings). Both circuits support data “streaming” or continuous normal order input/output of data. The signal-to-quantization-noise-ratio (SQNR) estimates were obtained from Matlab bit-accurate RTL simulations based on “single tone” full scale (real) input data sets with random frequencies and phase. Intel’s Matlab models were used for their circuits.

Intel’s radix-22 single delay feedback architecture uses natural word growth to accommodate dynamic range. Simulations showed that a 16-bit fixed-point input with a 30-bit fixed-point output results in equivalent SQNR compared to the Centar design with a 16-bit input, 16-bit output and 4-bit exponent.

Table 1. Comparison of variable FFT, streaming (normal order in/out) circuits using a Stratix III EP3SE50F484C2 FPGA as target device.

Centar Intel

Technology

65nm

65nm

FPGA

Stratix III

Stratix III

ALMs

4331

3826

LUTs

6426

4101

Registers

6718

6549

Memory (Kbits)

290

208

Memory (M9K)

42

30

Multipliers (18-bits)

33

36

Fmax (data rate, MHz)

523

315

SQNR (average, 5 transform sizes)

83.9

77.3

In Table 1 the adaptive logic module (ALM) is the basic unit of a Stratix III FPGA (one 8-input LUT, two registers plus other logic). Comparison with Xilinx Virtex 5 devices can be made by noting that two M9K memories are equivalent to a Xilinx BRAM and that an ALM is equivalent to between 1.2 and 1.8 LEs, since benchmark studies show 1 ALM=1.2 LEs (Xilinx white paper WP284 v1.0, December 19, 2007) and 1 ALM=1.8 LEs (Intel white paper), respectively.

The memory size (Kbits) indicates the total used memory in the M9Ks and is a measure of how fully utilized they are.

Operation at 500MHz has been verified using an Intel Stratix III development kit, which included an EP3SL150F1152C2 FPGA.

The Fmax values were based on the best of ~20 seeds for each circuit.

As can be seen in Table 1 the Centar circuit provides  much higher speeds, >500MHz vs. ~300MHz data rates .  It also generates a much smaller 16-bit output (vs. 30-bits for the Intel circuit) so that “downstream” processing will use less memory, run potentially faster and consume less logic hardware.   The Centar circuit can be programmed to do any power-of-two computation. Also, non-power-of-two sizes can be included as well, something that isn’t possible with traditional FFT circuits.

Another set of resource usage, plus a comparison to an Intel equivalent (IP v17.1), is shown in Table 2 below for a Stratix IV EP4SGX530KH40C3 FPGA (-3 speed grade). It’s important to note in this table that the Centar Fmax of 490MHz is limited not by the FPGA fabric, but is a result of the maximum operating speed of the simple dual port embedded RAMs used. The TimeQuest generated SA Fmax associated with the LUT/register fabric is >500MHz.  

Table 2. Comparison of variable FFT, streaming (normal order in/out) circuits using a Stratix IV EP4SGX530KH40C3 FPGA as target device.

Centar

Intel

Technology

40nm

40nm

FPGA

Stratix IV

Stratix IV

ALMs

4785

6089

LUTs

7020

5453

Registers

7044

9752

Memory (Kbits)

290

203

Memory (M9K)

42

28

Multipliers (18-bits)

33

68

Fmax (data rate, MHz)

490

283

SQNR (average, 5 transform sizes)

84

90

FFT 2048pts (µs)

4.2

7.2