OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission of the Khronos Group™.
Prerequisites for the altera_s10pciedk Reference Platform:
-based accelerator card with working
) and memory interfaces
Test these interfaces together in the same design using the same version of the Intel® Quartus® Prime Pro Edition software that you will use to develop your Custom Platform.Attention:
The native Stratix 10 GX FPGA Development Kit does not automatically work with the SDK. Before using the Stratix 10 GX FPGA Development Kit with the SDK, you must first contact your field applications engineer or regional support center representative to configure the development kit for you.
Alternatively, contact support for assistance.
- Intel® Quartus® Prime Pro Edition software
- Designing with Logic Lock regions
- FPGA architecture, including clocking, global routing, and I/Os
- High-speed design
- Timing analysis
- Platform Designer design and Avalon® interfaces
- Tcl scripting
- DDR4 external memory
This document also assumes that you are familiar with the following Intel® FPGA SDK for OpenCL™ -specific tools and documentation:
- Custom Platform Toolkit and the Intel® FPGA SDK for OpenCL™ Custom Platform Toolkit User Guide
Arria® 10 Reference Platform (a10_ref) and the
Intel® FPGA SDK for OpenCL™
Arria® 10 GX FPGA Development Kit Reference Platform Porting
The whole software stack in the s10_ref is derived from the a10_ref Reference Platform.
The Intel® Stratix® 10 GX FPGA Development Kit Reference Platform targets a subset of the hardware features available in the Intel® Stratix® 10 GX FPGA Development Kit.
Features of the s10_ref Reference Platform:
- OpenCL Host
The s10_ref Reference Platform uses a PCIe-based host that connects to the Intel® Stratix® 10 PCIe Gen2 x8 hard IP core.
- OpenCL Global Memory
The hardware provides one 2-gigabyte (GB) DDR4 SDRAM daughtercard that is mounted on the HiLo connector (J14 in Figure 1).
- FPGA Programming
Via external cable and the Intel® Stratix® 10 GX FPGA Development Kit's on-board Intel® FPGA Download Cable II interface.
- Guaranteed Timing
The s10_ref Reference Platform relies on the Intel® Quartus® Prime Pro Edition compilation flow to provide guaranteed timing closure. The timing-clean s10_ref Reference Platform is preserved in the form of a precompiled post-fit netlist (that is, the base.qdb Intel® Quartus® Prime Database Export File). The Intel® FPGA SDK for OpenCL™ Offline Compiler imports this preserved post-fit netlist into each OpenCL kernel compilation.
To compile your OpenCL kernel for a specific board variant, include the -board=<board_name> option in your aoc command (for example, aoc -board=s10gx_ea_htile myKernel.cl).
|Windows File or Folder||Linux File or Directory||Description|
|board_env.xml||board_env.xml||eXtensible Markup Language (XML) file that describes the Reference Platform to the Intel® FPGA SDK for OpenCL™ .|
Quartus® Prime project templates for the s10gx_ea_htile
See Contents of the s10gx_ea_htile Directory for a list of files in this directory.
|windows64||linux64||Contains the MMD library, kernel mode driver, and executable files of the SDK utilities (that is, install, uninstall, flash, program, diagnose) for your 64-bit operating system.|
For Windows, the source_windows64 folder contains source codes for the MMD library and SDK utilities. The MMD library and the SDK utilities are in the windows64 folder.
For Linux, the source directory contains source codes for the MMD library and SDK utilities. The MMD library and the SDK utilities are in the linux64 directory.
|acl_ddr4_s10.qsys||Platform Designer system that, together with the .ip files in the ip/acl_ddr4_s10/ subdirectory, implements the mem component.|
Settings File for the base project revision. This file includes, by
reference, all the settings in the flat.qsf file.
Use this revision when porting the s10_ref Reference Platform to your own Custom Platform. The Intel® Quartus® Prime Pro Edition software compiles this base project revision from source code.
Quartus® Prime Archive File that
and pr_base.id. This file is
generated by the scripts/post_flow_pr.tcl file during base revision
compile, and is used during import revision compilation.
|board.qsys||Platform Designer system that implements the board interfaces (that is, the static region) of the OpenCL hardware system.|
|board_spec.xml||XML file that provides the definition of the board hardware interfaces to the SDK.|
|device.tcl||Tcl file that is included in all revisions and contains all device-specific information (for example, device family, ordering part number (OPN), voltage settings, etc.)|
Settings File for the flat project revision. This file includes all
the common settings, such as pin location assignments, that are used
in the other revisions of the project (that is, base, top, and
top_synth). The base.qsf,
top.qsf, and top_synth.qsf files include, by
reference, all the settings in the flat.qsf file.
The Intel® Quartus® Prime software compiles the flat revision with minimal location constraints. The flat revision compilation does not generate a base.qar file that you can use for future import compilations and does not implement the guaranteed timing flow.
|import_compile.tcl||Tcl script for the SDK-user compilation flow (that is, import revision compilation).|
|max5_116.pof||Programming file for the
MAX® V device on the
Stratix® 10 GX FPGA Development Kit
that sets the memory reference clock to 116 MHz by default at
You must program the max5_116.pof file onto your s10gx_ea_htile board.
Quartus® Prime Settings File that
collects all the required .ip
files in a unique location.
During flat and base revision compilations, the board.qsys and acl_ddr4_s10.qsys Platform Designers files are added to the opencl_bsp_ip.qsf file.
|quartus.ini||Contains any special Intel® Quartus® Prime software options that you need to compile OpenCL kernels for the s10_ref Reference Platform.|
|top.qpf||Intel® Quartus® Prime Project File for the OpenCL hardware system.|
|top.qsf||Intel® Quartus® Prime Settings File for the SDK-user compilation flow.|
|top.sdc||Synopsys Design Constraints File that contains board-specific timing constraints.|
|top.v||Top-level Verilog Design File for the OpenCL hardware system.|
|top_post.sdc||Platform Designer and Intel® FPGA SDK for OpenCL™ IP-specific timing constraints.|
|top_synth.qsf||Intel® Quartus® Prime Settings File for the Intel® Quartus® Prime revision in which the OpenCL kernel system is synthesized.|
|ip/acl_ddr4_s10/<file_name>||Directory containing the .ip
files that the
Quartus® Prime Pro Edition
software needs to parameterize the mem component.
You must provide both the acl_ddr4_s10.qsys file and the corresponding .ip files in this directory to the Intel® Quartus® Prime Pro Edition software.
|ip/board/<file_name>||Directory containing the .ip files that the
Quartus® Prime Pro Edition software needs to
parameterize the board instance.
You must provide both the board.qsys file and the corresponding .ip files in this directory to the Intel® Quartus® Prime Pro Edition software.
|ip/freeze_wrapper.v||Verilog Design File that implements the freeze logic.|
|ip/irq_controller/<file_name>||IP that receives interrupts from the OpenCL
kernel system and sends message signaled interrupts (MSI) to the
Refer to the Message Signaled Interrupts section for more information.
|scripts/create_fpga_bin_pr.tcl||Tcl script that generates the fpga.bin file. The fpga.bin file contains all the
necessary files for configuring the FPGA.
For more information on the fpga.bin file, refer to the Define the Contents of the fpga.bin File for the Intel Stratix 10 GX FPGA Development Kit Reference Platform section.
|scripts/helpers.tcl||Tcl script with helper functions used by qar_ip_files.tcl|
|scripts/post_flow_pr.tcl||Tcl script that implements the guaranteed timing closure flow, as described in the Guaranteed Timing Closure of the Intel Stratix 10 GX FPGA Development Kit Reference Platform Design section.|
|scripts/pre_flow_pr.tcl||Tcl script that executes before the invocation of the Intel® Quartus® Prime software compilation. Running the script generates the Platform Designer HDL for board.qsys and kernel_system.qsys.|
|scripts/qar_ip_files.tcl||Tcl script that packages up base.qdb and pr_base.id during base revision compile.|
|scripts/adjust_plls_s10.tcl||PLL adjustment script for the kernel clock PLL to guarantee timing closure on the kernel clock, by setting it to the maximum allowed frequency.|
|scripts/kernel_system_update.tcl||Tcl script to update the kernel system.|
|root_partition.qdb||Database export of the base revision compile of the postfit netlist of the static board interface region.|
Developing your Custom Platform requires in-depth knowledge of the contents in the following documents and tools:
- Intel® FPGA SDK for OpenCL™ Custom Platform User Guide
- Contents of the SDK Custom Platform Toolkit
- Intel® FPGA SDK for OpenCL™ Intel® Arria® 10 GX FPGA Development Kit Reference Platform Porting Guide
- Documentation for all the Intel® FPGA IP in your Custom Platform
- Intel® FPGA SDK for OpenCL™ Getting Started Guide
- Intel® FPGA SDK for OpenCL™ Programming Guide
In addition, you must independently verify all IP on your computing card (for example, PCIe® controllers and DDR4 external memory).
- Copy the s10_ref Reference Platform from the drop directory.
- Paste the s10_ref directory into a directory that you own (that is, not a system directory) and then rename it ( <your_custom_platform> ).
- Choose the s10gx_ea_htile board variant in the <your_custom_platform>/hardware directory to match the production silicon for the Intel® Stratix® 10 FPGA as the basis of your design.
- Rename s10gx_ea_htile board variant to match the name of your FPGA board ( <your_custom_platform>/hardware/<board_name> ).
- Modify the <your_custom_platform>/board_env.xml file so that the name and default fields match the changes you made in step 2 and step 4, respectively.
Modify the my_board name in
file to match the change you made in step 2.
> aoc -list-boards Board list: my_board
- In the SDK, invoke the command aoc -list-boards to confirm that the Intel® FPGA SDK for OpenCL™ Offline Compiler displays the board name in your Custom Platform.
You can add a component in Platform Designer and connect it to the existing system, or add a Verilog file to the available system. After adding the custom components, connect those components in Platform Designer.
- Instantiate your PCIe controller, as described in Host-to-Intel Stratix 10 Communication over PCIe section.
Instantiate any memory controllers and I/O channels. You can
add the board interface hardware either as Platform Designer
components in the board.qsys
Platform Designer system or as HDL in the top.v file.
The board.qsys file and the top.v file are in the <your_custom_platform>/hardware/<board_name> directory.
- Modify the device.tcl file to match all the correct settings for the device on your board.
Modify the <your_custom_platform>/hardware/<board_name>/flat.qsf
file to use only the pin-outs and settings for your system. The
include all the settings from the flat.qsf
The top.qsf file and top_synth.qsf file are in the <your_custom_platform>/hardware/<board_name> directory.
- Update the <your_custom_platform>/hardware/<board_name>/board_spec.xml file. Ensure that there is at least one global memory interface, and all the global memory interfaces correspond to the exported interfaces from the board.qsys Platform Designer System File.
Set the environment variable ACL_DEFAULT_FLOW to
Setting this environment variable instructs the SDK to compile the flat revision corresponding to <your_custom_platform>/hardware/<board_name>/flat.qsf file without the partitions or Logic Locks.Tip: Intel recommends that you get a timing clean flat revision compiled before proceeding to the base revision compiles. You can also invoke the following command with the -bsp-flow=<revision_type> attribute to run different revisions of your project (for example, flat or base compiles).
aoc -bsp-flow=flat boardtest.cl -o=bin/boardtest.aocx
Set the environment variable ACL_DEFAULT_FLOW to base.
Setting this environment variable instructs the SDK to compile the base revision corresponding to the <your_custom_platform>/hardware/<board_name>/base.qsf file.
Perform the steps outlined in the
file to compile the
OpenCL kernel source file.
The environment variable INTELFPGAOCLSDKROOT points to the location of the SDK installation.
If compilation fails because of timing failures, fix the
errors, or compile
with different seeds. To compile the kernel with a different seed, include the
-seed=<N> option in the aoc
command (for example,
You might be able to fix minor timing issues by simply compiling your kernel with a different seed.
Following is the XML code of an example board_spec.xml file:
<?xml version="1.0"?> <board version="17.1" name="s10gx_ea_htile"> <compile name="top" project="top" revision="top" qsys_file="none" generic_kernel="1"> <generate cmd="quartus_sh -t scripts/pre_flow_pr.tcl"/> <synthesize cmd="quartus_cdb -t import_compile.tcl"/> <auto_migrate platform_type="s10_ref" > <include fixes=""/> </auto_migrate> </compile> <compile name="base" project="top" revision="base" qsys_file="none" generic_kernel="1"> <generate cmd="quartus_sh -t scripts/pre_flow_pr.tcl base"/> <synthesize cmd="quartus_sh --flow compile top -c base"/> <auto_migrate platform_type="s10_ref" > <include fixes=""/> </auto_migrate> </compile> <compile name="flat" project="top" revision="flat" qsys_file="none" generic_kernel="1"> <generate cmd="quartus_sh -t scripts/pre_flow_pr.tcl flat"/> <synthesize cmd="quartus_sh --flow compile top -c flat"/> <auto_migrate platform_type="s10_ref" > <include fixes=""/> </auto_migrate> </compile> <device device_model="1sg280lu3f50e1vgs1_dm.xml"> <used_resources> <alms num="6566"/> <!-- ALMs used in final placement - ALMs used for registers --> <ffs num="20030"/> <dsps num="0"/> <rams num="112"/> </used_resources> </device> <!-- DDR4-1866 --> <global_mem name="DDR" max_bandwidth="14928" interleaved_bytes="1024" config_addr="0x018"> <interface name="board" port="kernel_mem0" type="slave" width="512" maxburst="16" address="0x00000000" size="0x80000000" latency="240" addpipe="1"/> </global_mem> <host> <kernel_config start="0x00000000" size="0x0100000"/> </host> <interfaces> <interface name="board" port="kernel_cra" type="master" width="64" misc="0"/> <interface name="board" port="kernel_irq" type="irq" width="1"/> <interface name="board" port="acl_internal_snoop" type="streamsource" enable="SNOOPENABLE" width="31" clock="board.kernel_clk"/> <kernel_clk_reset clk="board.kernel_clk" clk2x="board.kernel_clk2x" reset="board.kernel_reset"/> </interfaces> </board>
To compile the MMD layer for Windows, perform the following tasks:
- Install the GNU make utility on your development machine.
- Install a version of Microsoft Visual Studio that has the ability to compile 64-bit software (for example, Microsoft Visual Studio version 2010 Professional).
- Set the development environment so that SDK users can invoke commands and utilities at the command prompt.
- Modify the <your_custom_platform_name>/source/Makefile.common file so that TOP_DEST_DIR points to the top-level directory of your Custom Platform.
- In the Makefile.common file or the development environment, set the JUNGO_LICENSE variable to your Jungo WinDriver license.
- To check that you have set up the software development environment properly, invoke the gmake or gmake clean command.
To compile the MMD layer for Linux, perform the following tasks:
- Ensure that you use a Linux distribution that Intel® supports (for example, GNU Compiler Collection (GCC) version 4.47).
- Modify the <your_custom_platform>/source/Makefile.common file so that TOP_DEST_DIR points to the top-level directory of your Custom Platform.
- To check that you have set up the software environment properly, invoke the make or make clean command.
Program your FPGA device with the
and then reboot your system.
You should have created the base.sof file when integrating your Custom Platform with the Intel® FPGA SDK for OpenCL™ . Refer to the Integrating Your Intel® Stratix® 10 Custom Platform with the Intel® FPGA SDK for OpenCL™ section for more information.
Confirm that your operating system recognizes a
with your vendor and device IDs.
- For Windows, open the Device Manager and verify that the correct device and IDs appear in the listed information.
- For Linux, invoke the lspci command and verify that the correct device and IDs appear in the listed information.
- Run the aocl install <path_to_customplatform> utility command to install the kernel driver on your machine.
For Windows, set the PATH environment variable. For
Linux, set the LD_LIBRARY_PATH environment
For more information about the settings for PATH and LD_LIBRARY_PATH, refer to Setting the Intel® FPGA SDK for OpenCL™ User Environment Variables in the Intel® FPGA SDK for OpenCL™ Getting Started Guide.
- Modify the version_id_test function in your <your_custom_platform>/source/host/mmd/acl_pcie_device.cpp MMD source code file to exit after reading from the version ID register.
- Run the aocl diagnose utility command and confirm that the version ID register reads back the ID successfully. You may set the environment variables ACL_HAL_DEBUG and ACL_PCIE_DEBUG to a value of 1 to visualize the result of the diagnostic test on your terminal.
- In the software development environment available with the s10_ref Reference Platform, replace all references of "s10_ref" with the name of your Custom Platform.
- Modify the PACKAGE_NAME and MMD_LIB_NAME fields in the <your_custom_platform>/source/Makefile.common file.
- Modify the name, linklib, and mmlibs elements in <your_custom_platform>/board_env.xml file to your custom MMD library name.
In your Custom Platform, modify the following lines of code in
the hw_pcie_constants.h file to include
information of your Custom Platform:
#define ACL_BOARD_PKG_NAME "s10_ref" #define ACL_VENDOR_NAME "Intel Corporation" #define ACL_BOARD_NAME "Stratix 10 Reference Platform"
For Windows, the hw_pcie_constants.h file is in the <your_custom_platform>\source_windows64\include folder. For Linux, the hw_pcie_constants.h file is in the <your_custom_platform>/linux64/driver directory.Note: The ACL_BOARD_PKG_NAME variable setting must match the name attribute of the board_env element that you specified in the board_env.xml file.
Define the Device ID, Subsystem Vendor ID, Subsystem Device ID,
and Revision ID, as defined in the Device Identification
Registers for Intel Stratix
PCIe Hard IP
Note: The PCIe® IDs in the hw_pcie_constants.h file must match the parameters in the PCIe® controller hardware.
- Update your Custom Platform's board.qsys Platform Designer system and the hw_pcie_constants.h file with the IDs defined in step 5.
For Windows, update the DeviceList fields in the
file to match your PCIe ID values and then rename the file to acl_board_<your_custom_platform>.inf.
Note: The <your_custom_platform> string in acl_board_<your_custom_platform>.inf must match the string you specify for the name field in the board_env.xml file.
- Run make in the <your_custom_platform>/source directory to generate the driver.
Update the device part number in the following files within the <your_custom_platform>/hardware/<board_name> directory:
In the device.tcl file,
change the device part number in the set global
assignment -name DEVICE 1SG280LU3F50E3VGS2 QSF assignment.
The updated device number will appear in the base.qsf, top.qsf, and top_synth.qsf files.
- In the board.qsys and acl_ddr4_s10.qsys files, change all occurrences of 1SG280LU3F50E3VGS2.
- In your Custom Platform, instantiate your external memory IP based on the information in the DDR4 as Global Memory for OpenCL Applications section. Update the information pertaining to the global_mem element in the <your_custom_platform>/hardware/<board_name>/board_spec.xml file.
- Remove the boardtest hardware configuration file that you created during the integration of your Custom Platform with the Intel® FPGA SDK for OpenCL™ .
kernel source file.
The environment variable INTELFPGAOCLSDKROOT points to the location of the SDK installation.
- Reprogram the FPGA with the new boardtest hardware configuration file and then reboot your machine.
Modify the wait_for_uniphy function in the
acl_pcie_device.cpp MMD source code
file to exit after checking the UniPHY status register. Rebuild the MMD
For Windows, the acl_pcie_device.cpp file is in the <your_custom_platform>\source\host\mmd folder. For Linux, the acl_pcie_device.cpp file is in the <your_custom_platform>/source/host/mmd directory.
SDK utility and confirm that the host
reads back both the version ID and the value 0 from the uniphy_status component.
The utility should return the message Uniphy are calibrated.
Consider analyzing your design in the Signal Tap logic analyzer to confirm the
successful calibration of all memory controllers.
Note: For more information on Signal Tap logic analyzer, download the Signal Tap II Logic Analyzer tutorial from the University Program Tutorial page.
- In the <your_custom_platform>/hardware/<board_name>/board.qsys file, update the REF_CLK_RATE parameter value on the kernel_clk_gen IP module.
- In the <your_custom_platform>/hardware/<board_name>/top.sdc file, update the create_clock assignment for kernel_pll_refclk.
- [Optional] In the <your_custom_platform>/hardware/<board_name>/top.v file, update the comment for the kernel_pll_refclk input port.
Perform the steps outlined in
file to build the hardware configuration file from the
kernel source file.
The environment variable INTELFPGAOCLSDKROOT points to the location of the Intel® FPGA SDK for OpenCL™ installation.
- Program your FPGA device with the hardware configuration file you created in step 1 and then reboot your machine.
Remove the early-exit modification in the version_id_test function in the acl_pcie_device.cpp file that you implemented when you
established communication between the board and the host interface.
For Windows, the acl_pcie_device.cpp file is in the <your_custom_platform>\source\host\mmd folder. For Linux, the acl_pcie_device.cpp file is in the <your_custom_platform>/source/host/mmd directory.
where <device_name> is the string you
define in your Custom Platform to identify each board.
By default, <device_name> is the acl number (for example, acl0 to acl31) that corresponds to your FPGA device. In this case, invoke the aocl diagnose acl0 command.
Build the boardtest host
application using the .sln file (Windows)
or Makefile (Linux) in the SDK's Custom Platform Toolkit.
For Windows, the .sln file for Windows is in the INTELFPGAOCLSDKROOT\board\custom_platform_toolkit\tests\boardtest\host folder. For Linux, the Makefile is in the INTELFPGAOCLSDKROOT/board/custom_platform_toolkit/tests/boardtest directory.
Set the environment variable CL_CONTEXT_COMPILER_MODE_INTELFPGA to a value of
3 and run the boardtest host application.
For more information on CL_CONTEXT_COMPILER_MODE_INTELFPGA, refer to Troubleshooting Intel® Stratix® 10 GX FPGA Development Kit Reference Platform Porting Issues.
Establish the floorplan of your design.
Important: Consider all design criteria outlined in the FPGA System Design section of the Intel® FPGA SDK for OpenCL™ Custom Platform Toolkit User Guide.
Compile several seeds of the
file until you generate a design that closes timing cleanly.
To specify the seed number, include the -seed=<N> option in your aoc command.
- Copy the base.qar file from the INTELFPGAOCLSDKROOT/board/s10_ref/hardware/s10gx_ea_htile directory into your Custom Platform.
Use the flat.qsf file in
the s10_ref Reference Platform as references to determine the type of
information you must include in the flat.qsf file for your Custom Platform.
The base.qsf, top.qsf, and top_synth.qsf files automatically inherit all the settings in the flat.qsf file. However, if you need to modify Logic Lock Plus region, make the change only in the base.qsf file.
- Confirm that you can use the .aocx file to reprogram the FPGA by invoking the aocl program acl0 boardtest.aocx command.
- Remove the ACL_DEFAULT_FLOW environment variable that you added when integrating your Custom Platform with the Intel® FPGA SDK for OpenCL™ .
- Ensure that the environment variable CL_CONTEXT_COMPILER_MODE_INTELFPGA is not set.
- Run the boardtest_host executable.
- Port the system design and the flat.qsf file to your computing card.
kernel source file using the base revision. Fix any timing failures and
recompile the kernel until timing is clean. You can add the -bsp-flow=base argument to the aoc command to generate
base.qar and root_partition.qdb files during the kernel
INTELFPGAOCLSDKROOT points to the location of the Intel® FPGA SDK for OpenCL™ installation.
- Copy the generated base.qar and root_partition.qdb files into your Custom Platform.
Using the default compilation flow, test
root_partition.qdb files across several OpenCL™ design examples and confirm that the
following criteria are satisfied:
- All compilations close timing.
- The OpenCL design examples achieve satisfactory Fmax.
- The OpenCL design examples function on the accelerator board.
|ACL_HAL_DEBUG||Set this variable to a value of 1 to 5 to enable increasing debug output from the Hardware Abstraction Layer (HAL), which interfaces directly with the MMD layer.|
|ACL_PCIE_DEBUG||Set this variable to a value of 1 to 10000 to enable increasing debug output from the MMD. This variable setting is useful for confirming that the version ID register was read correctly and the UniPHY IP cores are calibrated.|
|ACL_PCIE_JTAG_CABLE||Set this variable to override the default quartus_pgm argument that specifies the cable number. The default is cable 1. If there are multiple Intel® FPGA Download Cable, you can specify a particular one here.|
|ACL_PCIE_JTAG_DEVICE_INDEX||Set this variable to override the default quartus_pgm argument that specifies the FPGA device index. By default, this variable has a value of 2. If the FPGA is not the first device in the JTAG chain, you can customize the value.|
|ACL_PCIE_USE_JTAG_PROGRAMMING||Set this variable to force the MMD to reprogram the FPGA using the JTAG cable.|
|ACL_PCIE_DMA_USE_MSI||Set this variable if you want to use MSI for DMA transfers on Windows.|
|CL_CONTEXT_COMPILER_MODE_INTELFPGA||Unset this variable or set it to a value of 3. The OpenCL host runtime reprograms the FPGA as needed, which it does at least once during initialization. To prevent the host application from programming the FPGA, set this variable to a value of 3.|
- Intel® Stratix® 10 PCIe® hard IP core
- Parameter Settings section of the Intel® Stratix® 10 Avalon® -MM DMA Interface for PCIe Solutions User Guide
|Application interface type||
-MM with DMA
This Avalon® Memory-Mapped ( Avalon® -MM) interface instantiates the embedded DMA of the PCIe® hard IP core.
|Hard IP mode||
Gen2x8, Interface: 256-bit, 125 MHz
Number of Lanes: x8
Lane Rate: Gen2 (5.0 Gbps)
Note: This is not the fastest configuration, but it was chosen to obtain reasonable timing closure rates even when running flat compile flow on large designs.
|Rx Buffer credit allocation||
Note: This setting is derived experimentally.
|Intel® Stratix® 10 Avalon® -MM Settings|
|Export MSI/MSI-X conduit interfaces||Enabled
Export the MSI interface in order to connect the interrupt sent from the kernel interface to the MSI.
|Instantiate Internal Descriptor Controller||Enabled
Instantiates the descriptor controller in the Avalon® -MM DMA bridge. Use the 128-entry descriptor controller that the PCIe® hard IP core provides.
Disabled for a10gx_hostch board variant
The descriptor controller is implemented in the ip/host_channel subdirectory.
|Address width of accessible PCIe memory space||
This value is machine dependent. To avoid truncation of the MSI memory address, 64-bit machines should allot 64 bits to access the PCIe® address space.
|Base Address Register (BAR) Settings|
|Base Address Registers (BARs)||This design uses two BARs.
For BAR 0, set Type to 64-bit prefetchable memory. The Size parameter setting is disabled because the Instantiate Internal Descriptor Controller parameter is enabled in the Avalon® -MM system settings.
BAR 0 is only used to access the DMA Descriptor Controller, as described in the Intel® Stratix® 10 Avalon® -MM DMA for PCI Express section of the Intel® Stratix® 10 Avalon® -MM DMA Interface for PCIe® Solutions User Guide.
For Bar 4, set Type to 64-bit prefetchable memory, and set Size to 256 KBytes - 18 bits.
BAR 4 is used to connect PCIe to the OpenCL kernel systems and other board modules.
|ID Register Name||ID Provider||Description||Parameter Name in PCIe IP Core|
|Vendor ID||PCI-SIG®||Identifies the FPGA manufacturer.
Always set this register to 0x1172, which is the Intel® vendor ID.
|Device ID||Intel®||Describes the PCIe configuration on the FPGA according
Set the device ID to the device code of the FPGA on your accelerator board.
For the Intel® Stratix® 10 GX FPGA Development Kit Reference Platform, set the Device ID register to 0x2494, which signifies Gen 3 speed, 8 lanes, Intel® Stratix® 10 device family, and Avalon® -MM interface, respectively.
Refer to Table 2 for more information.
|Revision ID||When setting this ID, ensure that it matches the
following revision IDs:
The Intel® FPGA SDK for OpenCL™ utility checks the base class value to verify whether the board is an OpenCL™ device.
Do not modify the class code settings.
|Subsystem Vendor ID||Board vendor||Identifies the manufacturer of the accelerator board.
Set this register to the vendor ID of manufacturer of your accelerator board. For the s10_ref Reference Platform, the subsystem vendor ID is 0x1172.
If you are a board vendor, set this register to your vendor ID.
|Subsystem Device ID||Board vendor||Identifies the accelerator board.
The SDK uses this ID to identify the board because the software might perform differently on different boards. If you create a Custom Platform that supports multiple boards, use this ID to distinguish between the boards. Alternatively, if you have multiple Custom Platforms, each supporting a single board, you can use this ID to distinguish between the Custom Platforms.
Important: Make this ID unique to your Custom Platform. For example, for the s10_ref Reference Platform, the ID is 0x5170.
You can find these PCIe ID definitions in the PCIe controller instantiated in the INTELFPGAOCLSDKROOTboard/s10_ref/hardware/s10gx_ea_htile/board.qsys Platform Designer System File. These IDs are necessary in the driver and the SDK's programming flow. The kernel driver uses the Vendor ID, Subsystem Vendor ID and the Subsystem Device ID to identify the boards it supports. The SDK's programming flow checks the Device ID to ensure that it programs a device with a .aocx Intel® FPGA SDK for OpenCL™ Offline Compiler executable file targeting that specific device.
|Location in ID||Definition|
|10:8||Number of lanes
|3||1 — Soft IP (SIP)
This ID indicates that the PCIe protocol stack is implemented in soft logic. If unspecified, the IP is considered a hard IP.
PCIe interface type
The version ID for the s10_ref Reference Platform is 0xA0C7C1E4.
Before communicating with any part of the FPGA system, the host first reads from this version_id register to confirm the following:
- The PCIe can access the FPGA fabric successfully
- The address map matches the map in the MMD software
Update the VERSION_ID parameter in the version_id component to a new value with every slave addition or removal from the PCIe BAR 4 bus, or whenever the address map changes.
The two header files that describe the hardware design to the software are in the following locations:
- For Windows systems, the header files are in the INTELFPGAOCLSDKROOT\board\s10_ref\source\include folder, where INTELFPGAOCLSDKROOT is the path to the SDK installation.
- For Linux systems, the header files are in the INTELFPGAOCLSDKROOT/board/s10_ref/linux64/driver directory.
|Header File Name||Description|
|hw_pcie_constants.h||Header file that defines most of
the hardware constants for the board design.
This file includes constants such as the IDs described in PCIe Device Identification Registers, BAR number, and offset for different components in your design. In addition, this header file also defines the name strings of ACL_BOARD_PKG_NAME, ACL_VENDOR_NAME, and ACL_BOARD_NAME.
Update the information in this file whenever you change the board design.
Header file that defines DMA-related hardware constants.
Update these addresses whenever you change the board design. Refer to the Direct Memory Access section for more information.
Use the Intel® FPGA SDK for OpenCL™ install utility to install the kernel driver.
The s10_ref Reference Platform
- For Windows systems, the driver is in the
The kernel driver, the WinDriver application programming interface (API), is a third-party driver from Jungo Connectivity Ltd. For more information about the WinDriver, refer to the Jungo Connectivity Ltd. website or contact a Jungo Connectivity representative.
- For Linux, an open-source MMD-compatible kernel driver is in the <path_to_sl0pciedk>/linux64/driver directory. The table below highlights some of the files that are available in this directory.
|pcie_linux_driver_exports.h||Header file that defines the special commands
that the kernel driver supports.
The installed kernel driver works as a character device. The basic operations to the driver are open(), close(), read(), and write().
To execute a complicated command, create a variable as an acl_cmd struct type, specify the command with the proper parameters, and then send the command through a read() or write() operation. This header file defines the interface of the kernel driver, which the MMD layer uses to communicate with the device.
|aclpci.c||File that implements the Linux kernel driver's
basic structures and functions, such as the init, remove, and
probe functions, as well as
hardware design-specific functions that handle interrupts.
For more information on the interrupt handler, refer to the Message Signaled Interrupts section.
|aclpci fileio.c||File that implements the kernel driver's file
The kernel driver that is available with the s10_ref Reference Platform supports four file I/O operations: open(), close(), read(), and write(). Implementing these file I/O operations allows the OpenCL™ user program to access the kernel driver through the file I/O system calls (that is, open, read, write, or close).
|aclpci cmd.c||File that implements the specific commands defined in the pcie_linux_driver_exports.h file.
These special commands include SAVE_PCI_CONTROL_REGS, LOAD_PCI_CONTROL_REGS, and GET_PCI_SLOT_INFO.
|aclpci dma.c||File that implements DMA-related routines in the
Refer to the Direct Memory Access section for more information.
|aclpci queue.c||File that implements a queue structure for use in the kernel driver to simplify programming.|
The instantiation process exports the DMA controller slave ports (that is, rd_dts_slave and wr_dts_slave) and master ports (that is, rd_dcm_master and wr_dcm_master) into the PCIe module. Two additional master ports, dma_rd_master and dma_wr_master, are exported for DMA read and write operations, respectively. For the DMA interface to function properly, all these ports must be connected correctly in the board.qsys Platform Designer system, where the PCIe hard IP is instantiated.
At the start of DMA transfer, the DMA Descriptor Controller reads from the DMA descriptor table in user memory, and stores the status and the descriptor table into a FIFO address. There are two FIFO addresses: Read Descriptor FIFO address and Write Descriptor FIFO address. After storing the descriptor table into a FIFO address, DMA transfer into the FIFO address can occur. The dma_rd_master port, which moves data from user memory to the device, must connect to the rd_dts_slave and wr_dts_slave ports. Because the dma_rd_master port connects to DDR4 memory also, the locations of the rd_dts_slave and wr_dts_slave ports in the address space must be defined in the hw_pcie_dma.h file.
The rd_dcm_master and wr_dcm_master ports must connect to the txs port. At the end of the DMA transfer, the DMA controller writes the MSI data and the done status into the user memory via the txs slave. The txs slave is part of the PCIe® hard IP in board.qsys.
All modules that use DMA must connect to the dma_rd_master and dma_wr_master ports. For DDR4 memory connection, Intel® recommends implementing an additional pipeline to connect the two 256-bit PCIe DMA ports to the 512-bit memory slave. For more information, refer to the DDR4 Connection to PCIe® Host section.
The MMD layer uses DMA to transfer data if it receives a data transfer request that satisfies both of the following conditions:
- A transfer size that is greater than 1024 bytes
- The starting addresses for both the host buffer and the device offset are aligned to 64 bytes
For Windows, the Jungo WinDriver imposes a 5000 to 10000 limit on the number of interrupts received per second in user mode. This limit translates to a 2.5 gigabytes per second (GBps) to 5 GBps DMA bandwidth when a full 128-entry table of 4 KB page is transferred per interrupt.
On Windows, polling is the default method for maximizing PCIe DMA bandwidth at the expense of CPU run time. To use interrupts instead of polling, assign a non-NULL value to the ACL_PCIE_DMA_USE_MSI environment variable.
To implement a DMA transfer:
- Verify that the previous DMA transfer sent all the requested bytes of data.
Map the virtual memories that are requested for DMA transfer to physical
Note: The amount of virtual memory that can be mapped at a time is system dependent. Large DMA transfers will require multiple mapping or unmapping operations. For a higher bandwidth, map the virtual memory ahead in a separate thread that is in parallel to the transfer.
- Set up the DMA descriptor table on local memory.
- Write the location of the DMA descriptor table, which is in user memory, to the DMA control registers (that is, RC Read Status and Descriptor Base and RC Write Status and Descriptor Base).
- Write the Platform Designer address of descriptor FIFOs to the DMA control registers (that is EP Read Descriptor FIFO Base and EP Write Status and Descriptor FIFO Base).
- Write the start signal to the RD_DMA_LAST_PTR and WR_DMA_LAST_PTR DMA control registers.
- After the current DMA transfer finishes, repeat the procedure to implement the next DMA transfer.
Two different modules generate the signal for the MSI line. The DMA controller in the PCIe hard IP core generates the DMA's MSI. The PCI Express interrupt request (IRQ) module (that is, the INTELFPGAOCLSDKROOT/board/s10_ref/hardware/s10gx_ea_htile/ip/irq_controller directory) generates the kernel interface's MSI.
For more information on the PCI Express IRQ module, refer to Handling PCIe Interrupts webpage.
In INTELFPGAOCLSDKROOT/board/s10_ref/hardware/s10gx_ea_htile/board.qsys, the DMA MSI is connected internally; however, you must connect the kernel interface interrupt manually. For the kernel interface interrupt, the PCI Express IRQ module is instantiated as pcie_irq_0 in board.qsys. The kernel interface interrupts connections are as follows:
- The kernel_irq_to_host port from the OpenCL Kernel Interface (kernel_interface) connects to the interrupt receiver, which allows the OpenCL kernels to signal the PCI Express IRQ module to send an MSI.
- The PCIe hard IP's msi_intfc port connects to the MSI_Interface port in the PCI Express IRQ module. The kernel interface interrupt receives the MSI address and the data necessary to generate the interrupt via msi_intfc.
- The IRQ_Gen_Master port on the PCI Express IRQ module, which is used to write the MSI, connects to the txs port on the PCIe hard IP.
- The IRQ_Read_Slave and IRQ_Mask_Slave ports connect to the pipe_stage_host_ctrl module on Bar 4. After receiving an MSI, the user driver can read the IRQ_Read_Slave port to check the status of the kernel interface interrupt, and read the IRQ_Mask_Slave port to mask the interrupt.
The interrupt service routine in the Linux driver checks which module generates the interrupt. For the DMA's MSI, the driver reads the DMA descriptor table's status bit in local memory, as specified in the Read DMA Example section of the Intel Stratix 10 Avalon-MM DMA Interface for PCIe Solutions User Guide. For kernel interface's MSI, the driver reads the interrupt line sent by the kernel interface.
The interrupt service routine involves the following tasks:
- Check DMA status on the DMA descriptor table.
- Read the kernel status from the IRQ_READ_SLAVE port on the PCI Express IRQ module.
- If a kernel interrupt was triggered, mask the interrupt by writing to the IRQ_MASK_SLAVE port on the PCI Express IRQ module. Then, execute the kernel interrupt service routine.
- If a DMA interrupt was triggered, reset the DMA descriptor table and execute the DMA interrupt service routine.
- If applicable, unmask a masked kernel interrupt.
The Intel® Stratix® 10 GX FPGA Development Kit Reference Platform automatically tries to detect the cable by default when programming the FPGA via the Intel FPGA Download Cable.
You can set the ACL_PCIE_JTAG_CABLE or ACL_PCIE_JTAG_DEVICE_INDEX environment variables to disable the auto-detect feature and use values that you define.
Cable autodetect is useful when you have multiple devices connected to a single host.
The memory-mapped device (MMD) uses in-system sources and probes to identify the cable connected to the target board. You must instantiate the cade_id register block and connect it to Bar 4 with the correct address map. You must also instantiate board_in_system_sources_probes_cade_id, which is an in-system sources and probe component, and connect it to cade_id register.
The MMD must be updated to take in the relevant changes. Add the scripts/find_jtag_cable.tcl script to be added to your custom platform.
When the FPGA is being programmed via the Intel FPGA Download Cable, the MMD invokes quartus_stp to execute the find_jtag_cable.tcl script. The script identifies the cable and index number which is then used to program the FPGA through the quartus_pgm command.
If you have a Custom Platform that is ported from a previous version of the s10_ref Reference Platform, you have the option to modify your Custom Platform as described above. This modification is not mandatory.
DDR4 external memory interfaces
For more information on the DDR4 external memory interface IP, refer to the DDR3 Board Design Guidelines and DDR4 Board Design Guidelines sections in Intel Stratix 10 External Memory Interfaces IP User Guide.
|IP Parameter||Configuration Setting|
|Timing Parameters||As per the computing card's data specifications.|
|Avalon Width Power of 2||Currently, OpenCL™ does not support non-power-of-2 bus widths. As a result, the s10_ref Reference Platform uses the option that forces the DDR4 controller to power of 2. Use the additional pins of this x72 core for error checking between the memory controller and the physical module.|
|Byte Enable Support||Enabled
Byte enable support is necessary in the core because the Intel® FPGA SDK for OpenCL™ requires byte-level granularity to all memories.
|Performance||Enabling the reordering of DDR4 memory accesses and a deeper command
queue look-ahead depth might provide increased bandwidth for some
OpenCL kernels. For a target application, adjust these and other
parameters as necessary.
Note: Increasing the command queue look-ahead depth allows the DDR4 memory controller to reorder more memory accesses to increase efficiency, which improves overall memory throughput.
|Debug||Disabled for production.|
The DDR4 IP core has one bank where its width and address configurations match those of the DDR4 SDRAM. Intel® tunes the other parameters such as burst size, pending reads, and pipelining. These parameters are customizable for an end application or board design.
The Avalon® master interfaces from the OpenCL Memory Bank Divider component connect to their respective memory controllers. The Avalon® slave connects to the PCIe® and DMA IP core. Implementations of appropriate clock crossing and pipelining are based on the design floorplan and the clock domains specific to the computing card. The OpenCL Memory Bank Divider section in the Intel® FPGA SDK for OpenCL™ Custom Platform Toolkit User Guide specifies the connection details of the snoop and memorg ports.
The INTELFPGAOCLSDKROOT/board/s10_ref/hardware/s10gx_ea_htile/board.qsys Platform Designer system uses a custom UniPHY Status to AVS IP component to aggregate different UniPHY status conduits into a single Avalon® slave port named s. This slave port connects to the pipe_stage_host_ctrl component so that the PCIe host can access it.
A clock crosser is necessary because the kernel interface for the compiler must be clocked in the kernel clock domain. In addition, the width, address width, and burst size characteristics of the kernel interface must match those specified in the OpenCL Memory Bank Divider connecting to the host. Appropriate pipelining also exists between the clock crosser and the memory controller.
The Intel Stratix 10 FPGA Development Kit Reference Platform has one DDR4 memory bank. As a result, the Reference Platform instantiates the OpenCL Kernel Interface component and sets the Number of global memory systems parameter to 1.
Examples of design complexities:
- Designing a robust reset sequence
- Establishing a design floorplan
- Managing global routing
Optimizations of these design complexities occur in tandem with one another to meet timing and board hardware optimization requirements.
These clock domains include:
- 125 MHz PCIe® clock
- 233 MHz DDR4 clock
- 50 MHz general clock (config_clk)
- Kernel clock that can have any clock frequency
With the exception of the kernel clock, the s10_ref Reference Platform is responsible for the timing closure of these clocks. However, because the board design must clock cross all interfaces in the kernel clock domain, the board design also has logic in the kernel clock domain. It is crucial that this logic is minimal and achieves an Fmax higher than typical kernel performance.
These reset drivers include:
- The por_reset_counter in the INTELFPGAOCLSDKROOT/board/s10_ref/hardware/s10gx_ea_htile/board.qsys Platform Designer system implements the power-on-reset. The power-on-reset resets all the hardware on the device by issuing a reset for a number of cycles after the FPGA completes configuration.
- The PCIe® bus issues a perst reset that resets all hardware on the device.
- The OpenCL™ Kernel Interface component issues the kernel_reset that resets all logic in the kernel clock domain.
The power-on-reset and the perst reset are combined into a single global_reset; therefore, there are only two reset sources in the system (that is, global_reset and kernel_reset). However, these resets are explicitly synchronized across the various clock domains, resulting in several reset interfaces.
Important Considerations Regarding Resets
- Synchronizing resets to different clock domains might cause several high
Platform Designer automatically synchronizes resets to the clock domain of each connected component. In doing so, the Platform Designer instantiates new reset controllers with derived names that might change when the design changes. This name change makes it difficult to make and maintain global clock assignments to some of the resets. As a result, for each clock domain, there are explicit reset controllers. For example, global_reset drives reset_controller_pcie and reset_controller_ddr4; however, they are synchronized to the PCIe and DDR4 clock domains, respectively.
- Resets and clocks must work together to propagate reset to all logic.
Resetting a circuit in a given clock domain involves asserting the reset over a number of clock cycles. However, your design may apply resets to the PLLs that generate the clocks for a given clock domain. This means a clock domain can hold in reset without receiving the clock edge that is necessary for synchronous resets. In addition, a clock holding in reset might prevent the propagation of a reset signal because it is synchronized to and from that clock domain. Avoid such situations by ensuring that your design satisfies the following criteria:
- Generate the global_reset signal off the free-running config_clk.
- The ddr4_calibrate IP resets the External Memory Interface controller separately.
- Apply resets to both reset interfaces of a clock-crossing bridge or FIFO
FIFO content corruption might occur if only part of a clock-crossing bridge or a dual-clock FIFO component is reset. These components typically provide a reset input for each clock domain; therefore, reset both interfaces or none at all. For example, in the s10_ref Reference Platform, kernel_reset resets all the kernel clock-crossing bridges between DDR on both the m0_reset and s0_reset interfaces.
- Chip Planner
- Logic Lock Plus regions
Intel® performed the following tasks iteratively to derive the floorplan of the s10_ref Reference Platform:
- Compile a design without any region or floorplanning constraints.
Intel® recommends that you compile the design with several seeds.
- Examine the placement of the IP cores (for example, PCIe® , DDR4, Avalon® interconnect pipeline stages and adapters) for candidate locations, as determined by the Intel® Quartus® Prime Pro Edition software's Fitter. In particular, Intel® recommends examining the seeds that meet or almost meet the timing constraints.
For the s10_ref Reference Platform, the PCIe® I/O is located in the lower left corner of the Intel Stratix 10 FPGA. The DDR4 I/O is located on the top part of the left I/O column of the device. Because the placements of the PCIe and DDR4 IP components tend to be close to the locations of their respective I/Os, you can apply Logic Lock Plus regions to constrain the IP components to those candidate regions.
As shown in this Chip Planner view of the floorplan, the Logic Lock Plus region spread out between the PCIe I/O and the top region of the left I/O column (that is, the DDR4 I/O area). The Logic Lock Plus region (Region 1) covers the PCIe I/O and contains most of the static board interface logic.
You must create a dedicated Logic Lock Plus region for the OpenCL™ kernel system. Furthermore, do not place kernel logic in the board's Logic Lock Plus regions (that is, static region). As shown in Figure 1, the logic for the boardtest.clOpenCL kernel, that is, the scatter area, can be placed anywhere except within the seven Logic Lock Plus regions.
Intel® recommends the following strategies to maximize the available FPGA resources for the OpenCL kernel system to improve kernel routability:
- The size of a Logic Lock Plus region should be just large enough to contain the board logic and to meet timing constraints of the board clocks. Oversized Logic Lock Plus regions consume FPGA resources unnecessarily.
- Avoid creating tightly-packed Logic Lock Plus regions
that cause very high logic utilization and high routing congestion.
High routing congestion within the Logic Lock Plus regions might decrease the Fitter's ability to route OpenCL kernel signals through the regions.
In the case where the board clocks are not meeting timing and the critical path is between the Logic Lock Plus regions (that is, across region-to-region gap), insert back-to-back pipeline stages on paths that cross the gap. For example, if the critical path is between Region 1 and Region 2, lock down the first pipeline stage (an Avalon-MM Pipeline Bridge component) to Region 1, lock down the second pipeline stage to Region 2, and connect the two pipeline stages directly. This technique ensures that pipeline registers are on both sides of the region-to-region gap, thereby minimizing the delay of paths crossing the gap.
Refer to the Pipelining section for more information.
There is no restriction on the placement location of the OpenCL™ kernel on the device. As a result, the kernel clocks and kernel reset must distribute high fan-out signals globally.
In the Platform Designer, you can implement pipelines via an Avalon® -MM Pipeline Bridge component by setting the following pipelining parameters within the Avalon® -MM Pipeline Bridge dialog box:
- Select Pipeline command signals
- Select Pipeline response signals
- Select both Pipeline command signals and Pipeline response signals
Examples of Pipeline Implementation
- Signals that traverse long distances because of the floorplan's shape or the
region-to-region gaps require additional pipelines.
The DMA at the bottom of the FPGA must connect to the DDR4 memory at the top of the FPGA. To achieve timing closure of the board interface logic at a DDR4 clock speed of 233 MHz, additional pipeline stages between the OpenCL™ Memory Bank Divider component and the DDR4 controller IP are necessary. In the Intel® Stratix® 10 GX FPGA Development Kit Reference Platform's board.qsys Platform Designer system, the pipeline stages are named pipe_stage_ddr4a_dimm_*.
The middle pipeline stage, pipe_stage_ddr4a_dimm, combines both the direct kernel DDR4 accesses and the accesses through the OpenCL Memory Bank Divider. The multistage pipeline approach ensures that the kernel entry point to the pipeline is geared towards neither the OpenCL Memory Bank Divider, which is close to the PCIe® IP core, nor the DDR4 IP core, which is at the very top of the FPGA.
The SDK provides the IP to generate the kernel clock, and a post-flow script that ensures this clock is configured with a safe operating frequency confirmed by timing analysis. The Custom Platform developer imports a post-fit netlist that has already achieved timing closure on all non-kernel clocks.
The REF_CLK_RATE parameter specifies the frequency of the reference clock that connects to the kernel PLL ( pll_refclk). For the s10_ref Reference Platform, the REF_CLK_RATE frequency is 50 MHz.
The KERNEL_TARGET_CLOCK_RATE parameter specifies the frequency that the Intel® Quartus® Prime Pro Edition software attempts to achieve during compilation. The board hardware contains some logic that the kernel clock clocks. At a minimum, the board hardware includes the clock crossing hardware. To prevent this logic from limiting the Fmax achievable by a kernel, the KERNEL_TARGET_CLOCK_RATE must be higher than the frequency that a simple kernel can achieve on your device. For the Intel® Stratix® 10 GX FPGA Development Kit that the s10_ref Reference Platform targets, the KERNEL_TARGET_CLOCK_RATE is 1 GHz.
In the import revision compilation, the compilation script import_compile.tcl invokes the INTELFPGAOCLSDKROOT/board/s10_ref/hardware/s10gx_ea_htile/scripts/post_flow.tcl Tcl script in the s10_ref Reference Platform after every Intel® Quartus® Prime Pro Edition software compilation using quartus_cdb.
The post_flow.tcl script also determines the kernel clock and configures it to a functional frequency.
Intel® Quartus® Prime Pro Edition compiler
Intel® Quartus® Prime software provides several mechanisms for preserving the placement and routing of some previously compiled logic and importing this logic into a new compilation. For Intel® Stratix® 10 devices, the previously compiled logic is imported into the compilation flow.
The Intel® Quartus® Prime Pro Edition compilation flow can preserve the placement and routing of the board interface partition via the exported Intel® Quartus® Prime Archive File. base.qar and root_partition.qdb files contain all the database files for the base compilation of root_partition. The s10_ref Reference Platform is configured with the project revisions and partitioning that are necessary to implement the compilation flow. By default, the SDK invokes the Intel® Quartus® Prime Pro Edition software on the top revision. This revision is configured to import and restore the base.qdb file, which has been precompiled and exported from a base revision compilation.
When developing your Custom Platform from the s10_ref Reference Platform, it is essential to maintain the flat.qsf, base.qsf, top.qsf, and top_synth.qsf Intel® Quartus® Prime Settings Files.
The s10_ref Reference Platform includes two additional partitions: the Top partition and the kernel partition. The Top partition contains all logic, and the kernel partition contains the logic.
Invoke the Intel® Quartus® Prime compilation flow by calling the following quartus_sh executables:
- The board developer runs the quartus_sh --flow compile top -c base command to execute the base revision compilation. This compilation closes timing, locks down the static region, and generates the base.qdb file.
- The user of the Intel® Stratix® 10 FPGA Development Kit Reference Platform or a Custom Platform runs the quartus_sh -t import_compile.tcl command to execute the import revision compilation. This compilation generates programming files that are guaranteed to be timing closed.
The script performs the necessary tasks to ensure that the import revision compilations are timing-closed.
Running the quartus_sh --flow compile top -c base command executes the following tasks:
- Runs quartus_syn to execute the Analysis and Synthesis stage of the Intel® Quartus® Prime compilation flow.
- Runs quartus_fit to execute the Place and Route stage of the Intel® Quartus® Prime compilation flow.
- Runs quartus_sta to execute the Static Timing Analysis stage of the Intel® Quartus® Prime compilation flow.
- Runs the
The post_flow_pr.tcl script determines the maximum frequency at which the OpenCL™ kernel can run and generates the corresponding PLL settings. The script then reruns static timing analysis. The script also exports the compilation database of the base revision compilation results as a forward-compatible Partition Database File (.qdb). Refer to the QDB File Generation section for more information.
- Runs quartus_asm to generate the .sof file with updated embedded PLL settings. Updating the .sof file allows it to run safely on the board with the maximum kernel frequency.
- Generates the fpga.bin file, which contains the full-chip programming file. The full-chip programming file (base.sof) is in the .acl.sof section of the fpga.bin file.
The .aocx file that the base revision compilation flow generates only contains the .sof full-chip programming file. The Intel® FPGA SDK for OpenCL™ program utility automatically uses JTAG programming when it programs with a .aocx file from the base revision compilation. Only the import revision compilation flow, executed by the SDK user, generates a .aocx file.
The import_compile.tcl script executes the following tasks:
- Runs the
file. The pre_flow_pr.tcl script generates the
board.qsys and the kernel_system.qsys
Platform Designer System Files.
Refer to the Platform Designer System Generation section for more information.
- Imports the base revision compilation results as a .qdb file.
Refer to the QDB File Generation section for more information.
- Runs quartus_fit and quartus_asm to verify that the .qdb file is forward compatible.
- Runs quartus_syn to execute the Analysis and Synthesis stage of the Intel® Quartus® Prime compilation flow for the kernel partition only.
- Runs quartus_fit to execute the Place and Route stage of the Intel® Quartus® Prime compilation flow for the entire design.
- Runs quartus_sta to execute the static timing analysis stage of the Intel® Quartus® Prime compilation flow.
- Runs the INTELFPGAOCLSDKROOT/board/s10_ref/hardware/s10gx_ea_htile/scripts/post_flow_pr.tcl file. The post_flow_pr.tcl script determines the maximum frequency at which the OpenCL™ kernel can run and generates the corresponding PLL settings. The script then reruns the static timing analysis.
- Runs quartus_asm to generate the full-chip programming files for the base revision.
- Runs quartus_asm to generate the full-chip programming files for the import revision.
- Generates the fpga.bin file, which contains the top.sof full-chip programming file.
Before quartus_asm generates the .sof file in an import revision compilation, the static region of the import revision compilation is compared to the static region of the base revision compilation to check for errors. To prevent a mismatch error in the I/O configuration shift register (IOCSR) bits, the PLL settings in the base.sof and top.sof files must be identical. When designing the Intel® Stratix® 10 FPGA Development Kit Reference Platform, Intel® ensured in the import_compile.tcl Tcl script that the PLL settings in both the base.sof file and the top.sof file are identical, resulting in an additional quartus_asm execution step to regenerate the base.sof file.
The board.qsys Platform Designer system represents the bulk of the static region. The pre_flow_pr.tcl script generates both Platform Designer systems on the fly before the beginning of the Intel® Quartus® Prime compilation flow in both the base and import revision compilations.
The INTELFPGAOCLSDKROOT/board/s10_ref/hardware/s10gx_ea_htile/scripts/post_flow_pr.tcl script creates the root_partition.qdb file. The .tcl file invokes the export_design command to export the entire base revision compilation database to the base.qar file that also contains the root_partition.qdb and pr_base.id files. For your Custom Platform, you do not need to add the root_partition.qdb and pr_base.id files to the board directory (that is, INTELFPGAOCLSDKROOT/board/<custom_platform>/hardware/<board_name> ) separately.
The order of the application of time constraints is based on the order of appearance of the top.sdc and top_post.sdc in the top.qsf file.
One noteworthy constraint in the s10_ref Reference Platform is the multicycle constraint for the kernel reset in the top_post.sdc file. Using global routing saves routing resources and provides more balanced skew. However, the delay across the global route might cause recovery timing issues that limit kernel clock speed. Therefore, it is necessary to include a multicycle path on the global reset signal.
The following sections describe the implementation of these files for the Intel® Stratix® 10 GX FPGA Development Kit Reference Platform.
In the s10_ref Reference Platform, Intel® uses the bin folder for Windows dynamic link libraries (DLLs), the lib directory for delivering libraries, and the libexec directory for delivering the SDK utility executables. This directory structure allows the PATH environment variable to point to the location of the DLLs (that is, bin) in isolation of the SDK utility executables.
The device section contains the name of the device model file available in the INTELFPGAOCLSDKROOT/share/models/dm directory of the SDK and in the board spec.xml file. The used_resources element accounts for all logic outside of the kernel partition. The value of used_resources for alms equals the difference between the total number of adaptive logic modules (ALMs) used in final placement and the total number of ALMs available to the kernel partition. You can derive this value from the Partition Statistic section of the Fitter report after a compilation. Consider the following ALM categories within an example Fitter report:
+----------------------------------------------------------------------------------+ ; Fitter Partition Statistics ; +----------------------+-----------------+-----------------------------------------+ ; Statistic ; l ; freeze_wrapper_inst|kernel_system_inst ; +----------------------+-----------------+-----------------------------------------+ ; ALMs needed [=A-B+C] ; 0 / 427200 (0%) ; 0 / 385220 (0%) ;
The value of used_resources equals the total number of ALMs in l minus the total number of ALMs in freeze wrapper inst|kernel_system_inst. In the example above, used_resources = 427200 - 385220 = 41980 ALMs.
You can derive used_resources for rams and dsps in the same way using M20Ks and DSP blocks, respectively. The used_resources value for ffs is four times the used_resources value for alms because there are two primary and two secondary logic registers per ALM.
In the board_spec.xml file, there is one global_mem section for DDR memory. Assign the string DDR to the name attribute of the global_mem element. The board instance in Platform Designer provides all of these interfaces. Therefore, the string board is specified in the name attribute of all the interface elements within global_mem.
Because DDR memory serves as the default memory for the board that the s10_ref Reference Platform targets, its address attribute begins at zero. Its config_addr is 0x018 to match the memorg conduit used to connect to the corresponding OpenCL Memory Bank Divider for DDR.Attention: The width and burst sizes must match the parameters in the OpenCL Memory Bank Divider for DDR (memory_bank_divider).
The interfaces section describes kernel clocks, reset, CRA, and snoop interfaces. The OpenCL Memory Bank Divider for the default memory (in this case, memory_bank_divider) exports the snoop interface described in the interfaces section. The width of the snoop interface should match the width of the corresponding streaming interface.
In the order from the longest to the shortest configuration time, the two FPGA programming methods are as follows:
- To replace both the FPGA periphery and the core while maintaining the programmed state after power cycling, use Flash programming.
- To replace both the FPGA periphery and the core, use the Intel® Quartus® Prime Programmer command-line executable (quartus_pgm) to program the device via cables such as the Intel® FPGA Download Cable (formerly USB-Blaster).
|.acl.sof||The full programming bitstream for the compiled design. This section appears in the fpga.bin files generated from both the base revision and the import revision compilations.|
The source codes of an MMD library that demonstrates good performance are available in the INTELFPGAOCLSDKROOT/board/s10_ref/source/host/mmd directory. Refer to the Host-to-Device MMD Software Implementation section in the Stratix V Network Reference Platform Porting Guide for more information.
For more information on the MMD API functions, refer to the MMD API Descriptions section of the Intel® FPGA SDK for OpenCL™ Custom Platform Toolkit User Guide.
The install.bat script is located in the <your_custom_platform>\windows64\libexec directory, where <your_custom_platform> points to the top-level directory of your Custom Platform. This install.bat script triggers the install executable from Jungo Connectivity Ltd. to install the WinDriver on the host machine.
The install script is located in the <your_custom_platform>/linux64/libexec directory. This install script first compiles the kernel module in a temporary location and then performs the necessary setup to enable automatic driver loading after reboot.
The uninstall.bat script is located in the <your_custom_platform>\windows64\libexec directory, where <your_custom_platform> points to the top-level directory of your Custom Platform. This uninstall.bat script triggers the uninstall executable from Jungo Connectivity Ltd. to uninstall the WinDriver on the host machine.
The uninstall script is located in the <your_custom_platform>/linux64/libexec directory. This uninstall script removes the driver module from the kernel.
Without an argument, the utility returns the overall information of all the devices installed in a host machine. If a specific device name is provided as an argument (that is, aocl diagnose <device_name> ), the diagnose utility runs a memory transfer test and then reports the host-device transfer performance.
You can run the diagnose utility for multiple devices (that is, aocl diagnose <device_name1> <device_name2> <device_name3> ). If you want to run the diagnose utility for all devices, use the all option (that is aocl diagnose all).
The list-devices utility is similar to the diagnose utility. It first verifies the installation of the kernel driver and then lists all the devices.
|create_fpga_bin_pr.tcl||Creates the ELF binary file, fpga.bin, from the .sof file, the .rbf file, and the pr_base.id file.|
|post_flow_pr.tcl||This script runs after every Intel® Quartus® Prime Pro Edition software compilation. It facilitates the guaranteed timing flow by setting the kernel clock PLL, generating a small report in the acl_quartus_report.txt file, and rerunning STA with the modified kernel clock settings.|
|pre_flow_pr.tcl||This script generates the RTL of the top-level board.qsys Platform Designer system for the static region.|
|adjust_plls_s10.tcl||PLL adjustment script for the kernel clock PLL to guarantee timing closure on the kernel clock, by setting it to the maximum allowed frequency.|
|qar_ip_files.tcl||Tcl script that packages up base.qdb and pr_base.id during base revision compile.|
|helpers.tcl||Tcl script with helper functions used by qar_ip_files.tcl|
|device.tcl||Tcl file that is included in all revisions and contains all device-specific information (for example, device family, ordering part number (OPN), voltage settings, etc.)|
|scripts/kernel_system_update.tcl||Tcl script to update the kernel system.|
- The quartus_syn executable reads the SDC files. However, it does not support the Tcl command get_current_revision. Therefore, in the top_post.sdc file, a check is in place to determine whether quartus_syn has read the file before checking the current version.
In addition to these workarounds, take into account the following considerations:
- Intel® Quartus® Prime compilation is only ever performed after the Intel® FPGA SDK for OpenCL™ Offline Compiler embeds an OpenCL kernel inside the system.
- Perform Intel® Quartus® Prime compilation after you install the Intel® FPGA SDK for OpenCL™ and set the INTELFPGAOCLSDKROOT environment variable to point to the SDK installation.
- The name of the directory where the Intel® Quartus® Prime project resides must match the name field in the board_spec.xml file within the Custom Platform. The name must be case sensitive.
- The PATH or LD_LIBRARY_PATH environment variable must point to the MMD library in the Custom Platform.
|November 2017||17.1||Initial release.|