FFT Circuitry for a 4G Age
Centar LLC is a provider of fast Fourier transform (FFT) intellectual property (IP) for use in FPGA and ASIC-based embedded applications. It has developed a novel parallel matrix-based formulation of the discreet Fourier transform (DFT), which decomposes it into structured sets of b-point discreet Fourier transforms. All FFT circuits are constructed from synchronous, fine-grained, locally connected, regular arrays of small processing elements (PEs), consisting of a few registers, some multiplexors and an arithmetic element. Salient features of this technology are:
  • Speed: The only FPGA FFT circuits with clock rates >500MHz using 65nm technology (e.g., Altera Stratix III).
  • Throughput: Data rates as high as ~10G complex samples per second
  • Dynamic Range: Combined block floating point and floating point architecture means smaller word lengths can be used for post processing operations such as equalization.
  • Programmability: Easy customization of FFT properties, functionality and I/O interface.
  • Non-powers-of-two transform sizes: A single ROM memory can store control parameters to support any number or size FFTs
  • Scalability: Faster transforms can be implemented without architectural changes by increasing the array size along one dimension or duplicating the array structure
  • Power: Interconnects are entirely local, reducing parasitic routing capacitances to keep power dissipation low and speed high
  • Cyclic Prefix: Circuit architecture is designed to support any prefix value (most FFT circuits require additional circuits to perform this function)
  • Floating Point: IEEE754 single precision floating point fixed-size streaming circuits use much less memory and far fewer LUTs (1024-point uses half the number of LUTs compared to the Altera's equivalent)


  • Power-of-two FFT: Fixed and variable (run-time selectable) size, 16 to 16,384 points.
  Altera Centar v1 Centar v2 Altera Centar Altera Centar Altera Centar
Transform Size 256pts (fixed point) 1024pts (fixed point) 256pts (IEEE754) 1024pts (IEEE754)
ALMs/M9Ks 4414/38 4024/31 5063/15 4770/38 4357/31 10545/57 7424/30 12883/90 7019/62
Transform Time (usec) 0.68 0.48 0.45 2.72 1.92 0.83 0.49 3.4 2.17
SQNR or mean error 76.6 86.7 86.7 81.3 82.9 8.60E-08 3.10E-08 8.80E-08 4.20E-08
uJ/FFT 1.29 1.12   6.36 4.31        
FPGA Stratix III Stratix III Stratix III Stratix III Stratix III Stratix IV Stratix IV Stratix IV Stratix IV

Example: comparative metrics for "streaming" circuits targeted to Stratix III and IV FPGAs (C2 speed grade).
  • Non-power-of-two FFT: The reachable transform sizes for this class of designs are those that can be factored into a composite form based on small integers up to sizes of ~10. For example, the SC-FDMA LTE requirements would use the integers {2,3,5} to compute all 35 transform sizes, e.g., N=2n3m5q, where n,m, and q are integers. These can be fixed-size or variable and selectable at run-time.
Design FPGA LUT Registers Block RAM (9/18K) Fmax (MHz) RB Average Throughput (cycles) Throughput  (Normalized)
Centar Virtex-6 2915 2581 19 401 16.6N 1
Xilinx Virtex-6 3851 4326 10 403 23.4N 0.71
Centar Stratix III 3816 3188 29 417 16.6N 1
Altera Stratix III 2600 N/A. 17 260 32.9N 0.33

Example: LTE SC-FDMA relative performance, average over 35 different DFT sizes. (For details click here.)


Because there are always a large number of (parallel) circuit architectures that can be obtained from an algorithm specification, Centar has developed an automated CAD tool, Symbolic Parallel Algorithm Development Environment (SPADE), to make the best choices. SPADE is the only such tool in existence that can find latency optimal circuits.