FFT (1D) Off-Chip Design Example

This benchmark demonstrates an OpenCL implementation of a 1D Fast Fourier Transform (1D FFT) on Intel® FPGAs. The benchmark can process up to 16 Million complex single-precision floating point values and supports dynamically changing data size.

The algorithm used to process such large data sets has six stages. For example, assume we want to process 1 Million points:

  1. Treating 1M points as 1K x 1K matrix, read it from external memory and transpose it on the fly.
  2. Run 1K 1D FFT on all the rows (of transposed matrix).
  3. Multiply resulting values by adjustment twiddle factors.
  4. Transpose the matrix and write to temporary buffer in external memory.
  5. Run 1K 1D FFT on all the rows.
  6. Transpose the matrix and write output to external memory.

The whole system consists of three kernels connected by channels. The set of three kernels is enqueued twice by the host to do the full computation. First enqueue performs steps 1-4 above, the second enqueue does steps 5-6. This is essentially a 2D FFT core with extra transposition and a twiddle multiplication.

The code is easily parameterized to support different FFT sizes as well as different performance requirements.

FFT Performance

The performance of the core depends on number of points processed in parallel, data layout used, and number and speed of external memory. Measurements below were done on BittWare S5-PCIe-HQ D8 with two DDR3-1600s. Measurements were done on 1M point FFT for 8 points in parallel and 4M FFT for 4 points in parallel.

Points processed in parallel Input/Output Data Layout
Natural Optimized
4 117 MSPS 217 MSPS
8 292 MSPS 457 MSPS

MSPS is “millions of samples per second.”


  • Single work-item kernels
  • Kernel channels
  • Optimized matrix transposition


The design example provides source code for the OpenCL device (.cl) as well as the host application. For compiling the host application, the Linux package includes a Makefile and the Windows package includes a Microsoft Visual Studio 2010 project.

The following downloads are provided for this example:

The use of this design is governed by, and subject to, the terms and conditions of the hardware reference design license agreement.

Software and Hardware Requirements

This design example requires the following tools:

  • Intel® FPGA Software v16.1 or later
  • Intel® FPGA SDK for OpenCL™ v16.1 or later
  • On Linux: GNU Make and gcc
  • On Windows: Microsoft Visual Studio 2010

To download the Intel design tools, visit the OpenCL download page. Only the Linux operating system is supported by this design example

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.

* Product is based on a published Khronos Specification, and has passed the Khronos Conformance Testing Process. Current conformance status can be found at www.khronos.org/conformance.

Design Examples Disclaimer

These design examples may only be used within Altera Corporation devices and remain the property of Altera. They are being provided on an “as-is” basis and as an accommodation; therefore, all warranties, representations, or guarantees of any kind (whether express, implied, or statutory) including, without limitation, warranties of merchantability, non-infringement, or fitness for a particular purpose, are specifically disclaimed. Altera expressly does not recommend, suggest, or require that these examples be used in combination with any other product not provided by Altera.