Intel® FPGA SDK for OpenCL™ provides the tools you need to get started with developing your designs. Browse through the developer zone to find examples, development platforms, and partners that can make your design a success.
Optical Flow and Pedestrian Detection Implemented with OpenCL™
Object Detection and Recognition with Neural Networks
Achieve Power Efficiency in a Single Chip Solution Using OpenCL™ on Our SoCs
Unified Heterogeneous Programmability of OpenCL™
Watch how OpenCL™ provides a unified platform for heterogeneous computing. In this demo, we retarget NVIDIA code written for a graphics processing unit (GPU) to a Stratix® V FPGA.
Accelerating Algorithm Performance with OpenCL™ by Offloading to an FPGA
Watch how OpenCL™ accelerates the performance of the Mandelbrot algorithm ̶ an iterative, arithmetically intensive floating-point algorithm.
The Intel FPGA OpenCL Application Developers provide services, software, and other intellectual property (IP), which can enhance your OpenCL-based product development. Intel's development partners have expertise in a wide range of application areas including:
- High-performance computing
- Financial trading, simulations, and analysis
- Defense, intelligence, and aerospace
- Image processing, medical systems, and bio-informatics
- Computing for oil and gas exploration
The following examples demonstrate how to describe various applications in OpenCL along with their respective host applications, which you can compile and execute on a host with an FPGA board that supports the Intel FPGA SDK for OpenCL.
||This simple design example demonstrates a basic OpenCL kernel containing a printf call and its corresponding host program.|
||This simple design example demonstrates a basic vector addition OpenCL kernel and its corresponding host program.|
||Two host threads launch two simultaneous kernels.|
||Example designs that use OpenCL libraries containing Verilog and VHDL code to implement custom functions.|
Network Platform Examples
|OPRA FAST Parser||
||This design example demonstrates a streaming parser commonly used in high-frequency trading algorithms. The parser accepts an OPRA FAST data stream and decompresses the fields for use upstream. It illustrates how you can process streaming messages efficiently to achieve 10G link saturation.|
HPC Platform Examples
||This design example demonstrates a high-performance channelizer design using OpenCL. The channelizer combines a polyphase filter bank (PFB) with a fast Fourier transform to reduce the effects of spectral leakage on the resulting frequency spectrum.|
||This design example demonstrates use of Bloom filter for high-performance document filtering.|
|Finite Difference Computation (3D)||
||This design example demonstrates a high-performance 3D finite-difference stencil-only computation using OpenCL. It shows how to efficiently describe a sliding window data reuse pattern.|
||This design example demonstrates a high-performance 1D radix-4 complex fast Fourier transform (FFT) or inverse fast Fourier transform (IFFT) engine using OpenCL. This example takes advantage of the efficient sliding window data reuse pattern.|
|FFT Off-Chip (1D)||
||This design example is a high-performance implementation of a one million point FFT. Such large FFTs cannot be done completely on the FPGA and this example demonstrates how to efficiently manage the memory accesses.|
||This design example demonstrates a high-performance 2D radix-4 complex FFT/IFFT engine using OpenCL. This engine is targeted at large problem sizes (1024x1024 by default) and uses global memory to store the intermediate transposition. One aspect highlighted by this example is how to efficiently perform matrix transposition in global memory.|
||This design example showcases a high performance Gzip compression implementation using OpenCL for Intel FPGAs.|
||This design example showcases a higher-performance JPEG decoding solution.|
|Mandelbrot Fractal Rendering||
||This design example includes a kernel that implements the Mandelbrot fractal convergence algorithm and displays the results to the screen.|
||This example shows the optimization of the fundamental matrix multiplication operation using loop tiling to take advantage of the data reuse inherent in the computation.|
|Monte Carlo Black-Scholes Asian Options Pricing||
||This design example implements the Monte Carlo Black-Scholes simulation for Asian option pricing. This example shows how to run multiple kernels simultaneously, with each performing different parts of the simulation (random number generation, path simulation, and accumulation) and communicating using our channels vendor extension.|
||This design example demonstrates a seamless software solution of a Sobel filter in OpenCL to perform edge detection on an image and display the resulting filtered image on the screen.|
|Time-Domain FIR Filter||
||This design implements the time-domain finite impulse response (FIR) filter benchmark from the HPEC Challenge Benchmark Suite. This design example is a great example of how FPGAs can provide far better performance than a GPU architecture for floating-point FIR filters.|
||This design example implements a video downscaler that takes 1080p input video and outputs 720p video at 110 frames per second. This example uses multiple kernels to efficiently read from and write to global memory.|
Cyclone SoC Platform Examples
|Multifunction Printer Error Diffusion||
||This design is part of core printer pipeline. It implements a variant of Floyd Steinberg error diffusion algorithm The kernel takes a CMYK image and produces an equivalent image with every pixel half-toned. Such an output is the final stage of image processing inside a printer before it is send to the laser system. A whitepaper, “FPGA Acceleration of Multifunction Printer Image Processing Using OpenCL”, is also available for this example.|
||This design example is a OpenCL implementation of the Lucas Kanade optical flow algorithm. A dense, non-iterative and non-pyramidal version with a window size of 52x52 is shown to run at over 80 frames per second on the Cyclone V SoC Development Kit.
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.
1 Product is based on a published Khronos Specification, and has passed the Khronos Conformance Testing Process. Current conformance status can be found at www.khronos.org/conformance.
To leverage the flexibility of FPGAs and the variety of customizable hardware architectures that can be implemented, we have introduced the concept of reference platforms to provide a better out-of-the-box experience for performing OpenCL evaluations, creating custom OpenCL applications, and creating custom boards with an FPGA accelerator on it. The reference platforms contain both the hardware and software layers for our reference designs to communicate with the board.
The traditional OpenCL model has a host that passes data to the accelerator system over PCI Express* (PCIe*). For the High-Performance Computing (HPC) platform, the system requires a large amount of local bulk storage for processing the data that the host sends to the accelerator. These applications require large amounts of memory bandwidth and are systems where computing power is of most importance. This platform is the standard platform for OpenCL accelerators.
To get started evaluating the standard HPC platform architecture, you can:
- Download a reference design that runs on the HPC platform and learn the OpenCL application development flow
- Download an HPC platform by one of our Intel FPGA Preferred Board vendor boards
- Purchase a commercial off-the-shelf (COTS) board that supports the HPC platform from one of our Intel FPGA Preferred Board vendors by clicking their logo below
To learn how to use the reference platforms to create your own custom board, refer to the "Custom" tab below.
The Network platform deviates from the traditional OpenCL model by extracting the datapath from the PCIe command and status path. Data is now streamed into the kernels using I/O channels, without host interaction over two 10 Gb user datagram protocol (UDP) ports. This streaming architecture allows the host to configure the datapath pipeline and then step out of the picture for a much lower latency data processing path that traditional FPGA developers are used to. Applications using this platform are much more concerned with achieving a lower latency result.
To start evaluating the low-latency Network platform architecture, you can:
- Download a reference design that runs on the Network platform and learn the OpenCL application development flow
- Download a Network platform by one of our Intel FPGA Preferred Board vendors boards
- Purchase a COTS board that supports the Network platform from one of our Intel FPGA Preferred Board vendors by clicking on their logo below
The SoC platform resembles the traditional OpenCL model with a shared global memory that is used to pass data between the ARM host and the FPGA accelerator, which in this case is the same package. There is also an optional version of the architecture that adds a scratch DDR3 SDRAM interface on the FPGA accelerator side.
To start evaluating the Cyclone V SoC platform, you can:
- Download a reference design that runs on the network SoC platform and learn the OpenCL application development flow
- Download the Intel FPGA SDK for OpenCL, which ships with the SoC reference platform for the Cyclone V SoC
- Purchase a COTS board that supports the network platform from one of our Intel FPGA Preferred Board vendors listed below
While it is convenient if the architecture of the FPGA accelerator you want falls into one of these existing categories, it is not required. These reference platforms are a starting point to aid in building your own custom FPGA. Start with the existing SoC or Network platform, and simply remove or modify the component interfaces for the ones you desire and rebuild it. This uses traditional FPGA design to create the “I/O ring” for the OpenCL kernels to communicate with the I/O interfaces that will be on your custom board.
In order to build your own custom FPGA accelerator board, you will need a few things. To start building a custom board support package from a blank template, start with the custom platform toolkit.
- Raw template for a platform
- Board Test kernels to exercise the I/O interfaces
- MMD header file to get started building drivers
- HPC platform migration text file (from version 13.1)
To start with an existing platform and modify it, here are the current reference platforms available.
Arria 10 GX FPGA Development Kit Reference Platform
Stratix V Network Reference Platform: s5_net (w/ PLDA UDP stack):
Cyclone V SoC Reference Platform
- Intel FPGA SDK for OpenCL Cyclone V SoC Getting Started Guide (PDF)
- Cyclone V SoC Development Board Reference Manual (PDF)
- Cyclone V SoC Development Kit Reference Platform User Guide (PDF)
- Cyclone V SoC Development Board Getting Started Video
Arria 10 Custom Platform for OpenCL
Several of Intel's partners have already created boards and ported the reference platforms to their boards for purchase, in either evaluation mode or full production. These third-party production boards are tested according to the strict requirements of the Intel FPGA Preferred Board for OpenCL Partner Program. Only preferred boards are optimized for the most current Intel FPGA device architectures and design software. Intel works closely with these selected partners to ensure that their boards continually meet these standards by running over 9,000 regression tests. An Intel FPGA Preferred Board for OpenCL typically contains the following items:
- Specific OpenCL board design files (including the platform)
- Quartus® Prime Development Kit Edition software (one-year evaluation license)
- A license for Intel FPGA SDK for OpenCL
- Reference designs
Third-party preferred boards are purchased directly from Intel FPGA Preferred Board partners. The terms and conditions for the license of each certified board may vary from partner to partner. Intel FPGA Preferred Boards for OpenCL are carefully developed by third-party partners to ensure the highest possible quality. If a problem is traced to a preferred board, the partner is responsible for resolving the problem. If a problem arises in Intel FPGA SDK for OpenCL, Intel will provide the appropriate engineering support.
Third-party preferred boards for Intel FPGA SDK for OpenCL are provided without warranty from Intel. Intel disclaims all warranties, express and implied, with respect to the board supplied by the partner, including, but not limited to, implied warranties of merchantability, fitness for a particular purpose, title and non-infringement. The Intel FPGA Preferred Board partners may offer guarantees or warranties for design performance or functionality. Please contact the individual partners for details.
This training provides a simple overview of the optimization methodology one would take when trying to optimize their OpenCL implementation for an FPGA, using the Secure Hash Algorithm (SHA-1) as an example.
This training provides a simple overview of an architectural optimization approach for targeting OpenCL on an FPGA for image processing algorithms.
Single-Threaded vs. Multi-Threaded Kernels (17 minutes)
Understand the differences between loop pipelining and parallel threads, and know when to use single-threaded (Task) and multi-threaded (NDRange) pipelining.
See how you can optimize your FPGA-accelerated applications with the emulator and detailed optimization report features.
How to Do Reductions (PDF)
This Instructor led training focuses on writing kernel functions that are optimized for Intel FPGAs, including hands-on exercises.
OpenCL Training Courses
This training describes ways that you can use OpenCL to target an FPGA to create custom accelerated systems with an average of one fifth the power of competing accelerators, trends that are making FPGAs an important resource for accelerating software execution, and how OpenCL makes them accessible to software developers.
FPGA vs GPGPU (21 minutes)
Watch this short video to learn how FPGAs provide power efficient acceleration with far less restrictions and far more flexibility than GPGPUs. We will compare and contrast the approach to solving problems by leveraging this flexibility compared to the fixed architecture of the GPGPU.
Part 1 – Tools Download and Setup (5 minutes)
Part 4 – Setup of the Runtime Environment (7 minutes)
These training courses walk you through getting started with OpenCL on an SoC in a Linux environment.
Introduction to Parallel Computing with OpenCL (30 minutes)
Get an overview of the OpenCL standard and the advantages of using Intel's OpenCL solution.
Understand the basics of the OpenCL standard and learn to write simple programs.
Running OpenCL on Altera FPGAs (30 minutes)
Get to know the Intel FPGA SDK for OpenCL and learn to compile and run OpenCL programs on Intel FPGAs.
Learn how to create a custom board support package for use with your board and the Intel FPGA SDK for OpenCL.
Get an overview of parallel computing, the OpenCL standard, and the OpenCL for FPGA design flow in this instructor-led training. The focus of the training is not on writing kernels, but rather going over the FPGA specific portion of creating an OpenCL environment for hardware acceleration.