Creating Heterogeneous Memory Systems in Intel FPGA SDK for OpenCL Custom Platforms
Creating Heterogeneous Memory Systems in Intel FPGA SDK for OpenCL Custom Platforms
This application note provides guidance on creating heterogeneous memory systems in a Custom Platform for use with the Intel® FPGA SDK for OpenCL™ 1. Intel® assumes that you are an experienced FPGA designer who is developing Custom Platforms that contains heterogeneous memory systems.
Prior to creating the heterogeneous memory systems, familiarize yourself with the Intel® FPGA SDK for OpenCL™ documents specified below.
Verifying the Functionality of the FPGA Board and the EMIF Interfaces
- Verify each memory interface using hardware designs that can test the speed and stability of each interface.
-
Instantiate your Custom Platform using global memory.
For example, if you have three DDR interfaces, one of them must be mapped as heterogeneous memory. In this case, verify the functionality of the OpenCL stack with each DDR interface independently.
Alternatively, if you have two DDR interfaces and one quad data rate (QDR) interface, verify the functionality of the OpenCL stack of the two DDR interfaces and the QDR interface independently.
Modifying the board_spec.xml File
- Browse to the board_spec.xml file in the hardware directory of your Custom Platform.
-
Open the board_spec.xml file in a text editor and modify
the XML accordingly.
For example, if your hardware system has two DDR memories as default global memory and two QDR banks that you model as heterogeneous memory, modify the memory sections of the board_spec.xml file to resemble the following:
<!-- DDR3-1600 --> <global_mem name="DDR" max_bandwidth="25600" interleaved_bytes="1024" config_addr="0x018"> <interface name="board" port="kernel_mem0" type="slave" width="512" maxburst="16" address="0x00000000" size="0x100000000" latency="240"/> <interface name="board" port="kernel_mem1" type="slave" width="512" maxburst="16" address="0x100000000" size="0x100000000" latency="240"/> </global_mem> <!-- QDRII --> <global_mem name="QDR" max_bandwidth="17600" interleaved_bytes="8" config_addr="0x100"> <interface name="board" type="slave" width="64" maxburst="1" address="0x200000000" size="0x1000000" latency="1 addpipe="1"> <port name="kernel_qdr0_r" direction="r"/> <port name="kernel_qdr0_w" direction="w"/> </interface> <interface name="board" type="slave" width="64" maxburst="1" address="0x201000000" size="0x1000000" latency="150" addpipe="1"> <port name="kernel_qdr1_r" direction="r"/> <port name="kernel_qdr1_w" direction="w"/> </interface> </global_mem>
Setting Up Multiple Memory Dividers in Qsys
You must create multiple OpenCL Memory Bank Dividers when you have a true heterogeneous memory system. Consider a system with one DDR memory interface and one QDR memory interface. Because the two banks have different memory topologies, you cannot combine them under a single global memory.
Modifying the Boardtest Program and the Host Code for Your Heterogeneous Memory Solution
The boardtest program is an OpenCL kernel that allows you to test host-to-device bandwidth, memory bandwidth, and general functionality of your Custom Platform.
- Browse to the <path to SDK installation>/board/custom_platform_toolkit/tests/boardtest directory.
-
Open the boardtest.cl file in a text editor and assign a
buffer location to each global memory argument.
For example:
__kernel void mem_stream (__global__attribute__((buffer_location("DDR"))) uint *src, __global __attribute__((buffer_location("QDR"))) uint *dst, uint arg, uint arg2)
Here, uint *src is assigned to DDR memory, and uint *dst is assigned to QDR memory. The board_spec.xml file specifies the characteristics of both memory systems.
-
To leverage your heterogeneous memory solution in your OpenCL
system, modify your host code by adding the CL_MEM_HETEROGENEOUS_ALTERA flag to your clCreateBuffer call.
For example:
ddatain = clCreateBuffer(context, CL_MEM_READ_WRITE | memflags | CL_MEM_HETEROGENEOUS_ALTERA, sizeof(unsigned) * vectorSize, NULL, &status);
Intel® strongly recommends that you set the buffer location as a kernel argument before writing the buffer. When using a single global memory, you can write the buffers either before or after assigning them to a kernel argument. In heterogeneous memory systems, the host sets the buffer location before writting the buffer. In other words, the host will call the clSetKernelArgument function before calling the clEnqueueWriteBuffer function.
In your host code, invoke the clCreateBuffer, clSetKernelArg, and clEnqueueWriteBuffer calls in the following order:
ddatain = clCreateBuffer(context, CL_MEM_READ_WRITE | memflags | CL_MEM_HETEROGENEOUS_ALTERA, sizeof(unsigned) * vectorSize, NULL, &status); … status = clSetKernelArg(kernel[k], 0, sizeof(cl_mem), (void*)&ddatain); … status = clEnqueueWriteBuffer(queue, ddatain, CL_FALSE, 0, sizeof(unsigned) * vectorSize,hdatain, 0, NULL, NULL);
The ALTERAOCLSDKROOT/board/custom_platform_toolkit/tests/boardtest/host/memspeed.cpp file presents a similar order of these function calls.
-
After you modify the boardtest.cl file and the host code, compile the host and kernel
code and verify their functionality.
When compiling your kernel code, you must disable burst-interleaving of all memory systems by including the --no-interleaving <global_memory_type> option in the aoc command.
Verifying the Functionality of Your Heterogeneous Memory System
In OpenCL™ systems with homogeneous memory, you have to option to set the CL_CONTEXT_COMPILER_MODE_ALTERA=3 flag in your host code to disable the reading of the .aocx file and the reprogramming of the FPGA. Setting the CL_CONTEXT_COMPILER_MODE_ALTERA=3 flag is useful when instantiating your board to verify the functionality of your Custom Platform without designing the floorplan and specifying the LogicLock™ regions.
With heterogeneous memory systems, the runtime environment must read the buffer locations of each buffer, described in the .aocx file, to verify the memory systems' functionality. However, you might want to verify the functionality of your Custom Platform without implementing the final features of the board design, such as designing the floorplan and specifying the LogicLock™ regions.
- Verify that the CL_CONTEXT_COMPILER_MODE_ALTERA flag is unset in your host code.
- Browse to the board/<board name>/source/host/mmd directory of your Custom Platform.
- Open the acl_pcie_device.cpp memory-mapped device (MMD) file in a text editor.
-
Modify the reprogram function in the
acl_pcie_device.cpp file by adding a return
0; line, as shown below:
int ACL_PCIE_DEVICE::reprogram(void *data, size_t data_size) { return 0; // assume failure int reprogram_failed = 1; // assume no rbf or hash in fpga.bin int rbf_or_hash_not_provided = 1; // assume base and import revision hashes do not match int hash_mismatch = 1; ... }
- Recompile the acl_pcie_device.cpp file.
- Verify that the CL_CONTEXT_COMPILER_MODE_ALTERA flag remains unset.
Document Revision History
Date | Version | Changes |
---|---|---|
December 2016 | 2016.12.13 | Initial release. |