This document serves as a hardware developers guide for developing Accelerator Functional Units (AFUs) for the Acceleration Stack for Intel® Xeon® CPU with FPGAs product, hereafter referred to as the Acceleration Stack.
The intended audience consists of FPGA RTL designers developing AFUs for the Acceleration Stack on the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (referred to as Intel® PAC with Arria® 10 throughout this document) hardware platform.
|#||Precedes a command that indicates the command is to be entered as root.|
|$||Indicates a command is to be entered as a user.|
|This font||Filenames, commands, and keywords are printed in this font. Long command lines are printed in this font. Although long command lines may wrap to the next line, the return is not part of the command; do not press enter.|
|<variable_name>||Indicates the placeholder text that appears between the angle brackets must be replaced with an appropriate value. Do not enter the angle brackets.|
|Intel Acceleration Stack Quick Start Guide for Intel Programmable Acceleration Card with Intel Arria 10 GX FPGA||
This document describes the Acceleration Stack and provides instruction for hardware and software installation and setup required for development with the stack.
|Acceleration Stack for Intel Xeon CPU with FPGAs Core Cache Interface (CCI-P) Reference Manual||
This document describes the CCI-P protocol and requirements placed on AFUs.
|Acceleration Stack for Intel® Xeon® CPU with FPGAs AFU Simulation Environment User’s Guide||
This document provides instructions on how to use the Intel® Accelerator Functional Unit (AFU) Simulation Environment (ASE).
|OPAE Tools User Guide||
This user guide documents the utilities provided in the Open Programmable Acceleration Engine (OPAE) software component of the Acceleration Stack.
|Acceleration Stack for Intel Xeon CPU with FPGAs Release Notes||
This document lists the key features, limitations and changes from the previous release.
|Intel® Quartus® Prime Pro Edition Handbook Volume 3: Verification, Chapter 8 Design Debugging with the Signal Tap Logic Analyzer||
This documentation describes Signal Tap and its use for general FPGA debug and provides a baseline reference for remote Signal Tap debug of AFUs.
Accelerator Functional Unit (AFU)
Application Programming Interface
Intel® Accelerator Functional Unit (AFU) Simulation Environment (ASE)
|CCI-P||Core Cache Interface (CCI-P)|
Direct Memory Access
Dynamic Random-Access Memory
FPGA Interface Manager (FIM)
FPGA Interface Unit (FIU)
Field Programmable Gate Array
Intel® PAC with Arria® 10
Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA, referred to as Intel® PAC with Arria® 10 throughout this document
Native Loopback - a reference to the Native Loopback (NLB) example AFU
Open Programmable Acceleration Engine (OPAE)
Transmission Control Protocol
|Acceleration Stack for Intel® Xeon® CPU with FPGAs||Acceleration Stack||
A collection of software, firmware and tools that provides performance-optimized connectivity between an Altera® FPGA and an Intel® Xeon® processor.
|Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA||Intel® PAC with Arria® 10||
PCIe® accelerator card with an Intel® Arria® 10 FPGA. Programmable Acceleration Card is abbreviated PAC.
Contains a FPGA Interface Manager (FIM) that pairs with an Intel® Xeon® processor over PCIe® bus.
|Intel® Xeon® Processor with Integrated FPGA||Integrated FPGA Platform||
Intel® Xeon® plus FPGA platform with the Intel® Xeon® and an FPGA in a single package and sharing a coherent view of memory via Quick Path Interconnect (QPI).
This chapter outlines the prerequisites for AFU development.
Before using this guide, refer to the Altera Acceleration Stack Quick Start Guide for Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA , referred to as Quick Start Guide throught this document. The Quick Start Guide provides an overview of the Acceleration Stack and provides instruction for installation and setup of hardware and software components of the stack. It is essential to familiarize yourself with the concepts developed for the Acceleration Stack and to complete the installation and setup procedures covered in the Quick Start Guide.
This guide for AFU development builds on the concepts and environment setup established in the Quick Start Guide.
Throughout this guide, the following two references taken from the Quick Start Guide are used to refer to the Acceleration Stack installation:
- $DCP_LOC—This reference points to the directory where the Acceleration Stack release tarball was unarchived as instructed in the Quick Start Guide, which is also referred to in this document as the Acceleration Stack installation.
- $OPAE_LOC—This reference points to the directory where the OPAE software source code included in the Acceleration Stack release tarball was unarchived as instructed in the Quick Start Guide.
All code and command line references and examples in this guide assume your PATH environment variable includes the following locations to executables in the Acceleration Stack installation:
the utilities and helper scripts included in the Acceleration Stack release
The following alternative references exist in the Acceleration Stack installation and documentation:
- Discrete Configurable Platform (DCP)—This term is an alternative reference to the Acceleration Stack with the Intel® PAC with Arria® 10 hardware platform.
- Green-Bits, Green BitStream, GBS—These term are used as alternative references to an AFU and associated compiled AFU images such as an AFU PR bitstream or loadable AFU image for OPAE. The references are commonly found in the Acceleration Stack installation directory tree and in source code comments.
- Blue-Bits, Blue BitStream, BBS—These terms are used as alternative references to the base configuration, FIM and associated base configuration bitstream. The references are commonly found in the Acceleration Stack installation directory tree and in source code comments.
Generating AFU PR bitstreams and loadable AFU images for OPAE requires the following software and IP:
- Intel® Quartus® Prime Pro Edition software version 17.0.0 (only version supported)
- Altera® FPGA PCI Express SR-IOV Block IP license
- python2-jsonschema package from the epel repository
For requirements when using ASE for AFU functional verification, refer to the ASE User Guide.
The Acceleration Stack is an advanced application of FPGA technology. Most of the platform-level complexity has been abstracted away for the AFU developer by the FPGA Interface Manager (FIM) in the FPGA static region. This guide assumes the following FPGA logic design-related knowledge and skills:
Familiarity with Partial Reconfiguration (PR) compilation flows, including the Intel® Quartus® Prime Pro Edition PR flow, concepts of physical and logical partitioning in the FPGA, module boundary best practices, and resource restrictions.
The physical and logical partitioning of the FIM static region and the PR regions for AFUs has already been done. User AFUs conveniently plug-in to the structure defined by the Acceleration Stack with a well-structured set of standard interface signals to the FIM. This level of abstraction allows developers of AFUs to concentrate on their area of expertise in their end application space by minimizing time and effort on the PR flow itself. The Acceleration Stack provides helper scripts to automate the PR flow during compilation of the AFU RTL for generating a loadable AFU image for use by OPAE. The Acceleration Stack has already laid out the structure, and familiarity with PR flows is a plus for design of an AFU within this predetermined structure.
- Knowledge and skills in static timing closure, including familiarity and skill with the TimeQuest tool in Intel® Quartus® Prime Pro Edition, applying timing constraints, Synopsys® Design Constraints (.sdc) language and Tcl scripting, and design methods to close on timing critical paths.
- Knowledge and skills with industry standard RTL simulation tools supported by the Acceleration Stack. For more information, refer to the Intel® Accelerator Functional Unit (AFU) Simulation Environment (ASE) Quick Start Guide.
- Knowledge and skill with the Signal Tap II Logic Analyzer tool in the Intel® Quartus® Prime Pro Edition software.
This chapter guides you through the process to generate a loadable AFU image for the nlb_mode_0 example AFU provided in the Acceleration Stack installation. Successful completion of the steps in this chapter quickly verifies your AFU development environment using a known-good design.
Build the nlb_mode_0 example AFU by invoking the run.sh script from a terminal window as shown in Code 3-1.
Code 3-1: Compile nlb_mode_0 Example AFU
$ cd $DCP_LOC/hw/samples/nlb_mode_0 $ $DCP_LOC/bin/run.sh
When the shell script completes, it will indicate successful generation of the loadable AFU image: $DCP_LOC/hw/samples/nlb_mode_0/nlb_400.gbs.
You can optionally repeat the steps in the Quick Start Guide to run the hello_fpga host application with the newly generated AFU image.
You can optionally restore the fresh state of the nlb_mode_0 example AFU design by invoking the clean.sh script from a terminal window as shown in Code 3-2.
Code 3-2: Restore the nlb_mode_0 Example AFU Design
$ cd $DCP_LOC/hw/samples/nlb_mode_0 $ $DCP_LOC/bin/clean.sh
Successfully compiling the nlb_mode_0 example AFU verifies that your environment is setup and ready to begin developing your own custom AFUs.
To facilitate dynamically loading AFUs, the Acceleration Stack utilizes a partial reconfiguration (PR) scheme. The FIM contains one or more PR regions for loading AFUs and a static region that provides services and resources to loaded AFUs.
The FIM includes the static region and one or more PR region partitions for loading AFUs from OPAE. The static region provides services to AFUs loaded in PR regions that include a host connection via CCI-P protocol over PCIe SR-IOV, a local pool of SDRAM memory, and clock and reset resources. The FIM static region also provides services to OPAE for dynamically loading AFUs and performing system management tasks (for example, version identification).
The FIM is part of the Intel® PAC with Arria® 10 hardware platform and is not modifiable.
The PR regions in the FIM are undefined AFUs preconfigured upon power up – host applications must use OPAE to load AFUs into the PR regions.
The beta release of the FIM supports one PR region.
The FIM bitstream is included in the Acceleration Stack installation and initially configures the FPGA at power up from configuration flash residing on the Intel® PAC with Arria® 10.
For instructions on flashing the on-board configuration flash with the FIM bitstream, refer to the Quick Start Guide.
Host software uses OPAE utilities and APIs to load an AFU into a PR region in the FIM by reference to a loadable AFU image. A loadable AFU image is the combination of an AFU PR bitstream and associated AFU metadata. The AFU PR bitstream is the output from Intel® Quartus® Prime Pro Edition PR compilation of your AFU RTL design with the FIM design database provided in the Acceleration Stack installation. The AFU metadata is used to provide OPAE information on AFU characteristics and operational parameters and is defined in a separate JSON file. The Packager utility included in the Acceleration Stack installation generates the loadable AFU image from the AFU PR bitstream and AFU metadata. It is possible to have several variations of loadable AFU images for a given AFU revision by combining its PR bitstream with unique metadata using the Packager utility.
The beta release supports dynamically swapping multiple AFUs within a single PR region per Intel® PAC with Arria® 10 installed in a system.
The rest of this chapter describes how to design an AFU within the platform and services provided by the FIM static region, compile an AFU PR bitstream compatible with the FIM, and generate a loadable AFU image for use by OPAE.
For usage information on the Packager utility and JSON file metadata format, supported keyword parameters, and minimum metadata requirements, refer to the OPAE Tools User Guide.
This release of the Acceleration Stack is designed to work with Intel® Quartus® Prime Pro Edition, version 17.0.0.
The Intel® Quartus® Prime Pro Edition partial reconfiguration (PR) compilation flow is used to compile AFUs in combination with the FIM design database to generate AFU PR bitstreams. The PR compilation flow is supported only at the command line using the scripts provided with the Acceleration Stack – you cannot use the Intel® Quartus® Prime Pro Edition GUI to generate a PR bitstream for the AFU.
The shell script, $DCP_LOC/bin/run.sh, implements the necessary command line steps.
You can use the GUI point tools in Intel® Quartus® Prime Pro Edition for tasks such as TimeQuest timing analysis, Chip Planner view, adding debug instances and nodes, and viewing compilation reports.
To generate an AFU PR bitstream, AFU developers should perform synthesis, place and route, and timing closure on their AFU while importing the FIM design database from the library as part of the PR compilation performed by the run.sh script.
The Acceleration Stack installation includes example AFU designs in the $DCP_LOC/hw/samples directory.
The overall flow to generate an AFU PR bitstream is as follows:
- The Acceleration Stack installation provides the compiled database for the FIM and a Intel® Quartus® Prime PR build project structure to support integrating your AFU within the framework provided by the FIM static region.
FIM design database
Quartus® Prime PR build project structure reside in
Note: Do not modify these library files.
project contains two revisions:
- The revision afu_synth is used to synthesize the user AFU.
- The revision afu_fit is used to generate an AFU PR bitstream by importing the qdb of the FIM from the library and the synthesized snapshot of your AFU from the afu_synth revision.
- AFU developers should close timing on their AFU on the afu_fit revision.
- Only script-based steps are supported for your AFU synthesis, PAR, and bitstream generation. GUI-based steps can be used for timing analysis, adding debug instances, and viewing compilation results.
- To run synthesis of multiple AFUs or PAR jobs in parallel, create multiple copies of the AFU’s project directory.
- ALMs: 382,590
- M20Ks: 2450
- DSP Blocks: 1402
- A subdirectory named "hw".
- A Quartus settings file (.qsf) located in the “hw” subdirectory, named “afu.qsf”.
For example, the AFU project directory for the hello_afu example included in the Acceleration Stack installation is located at $DCP_LOC/hw/samples/hello_afu. The required settings file is located at $DCP_LOC/hw/samples/hello_afu/hw/afu.qsf.
Place all Intel® Quartus® Prime settings for compiling your AFU design in the afu.qsf file.
At minimum, the afu.qsf settings file must point to all AFU design flles, including RTL source, Qsys subsystems (.qsys), IP variations (.ip), timing constraint files (.sdc), Signal Tap files (.stp), and Tcl scripts (.tcl). AFU design files can be located anywhere in the filesystem that can be resolved at Intel® Quartus® Prime PR compile time from the path references in the afu.qsf file.
For example, the AFU design files for the hello_afu example are located in $DCP_LOC/hw/samples/hello_afu/hw/rtl. The hello_afu example’s afu.qsf file points to the top level AFU RTL source file, afu.sv, with the following setting:set_global_assignment -name SYSTEMVERILOG_FILE “../hw/rtl/afu.sv”, where for this example <path-to-design-file-relative-to-afu-proj-dir> is hw/rtl/afu.sv.
Your AFU top level module must be named “AFU” and must define a module port list as shown in the hello_afu example located at $DCP_LOC/hw/samples/hello_afu/hw/rtl/afu.sv. Your AFU is restricted to this top-level port list. There are no restrictions to the design hierarchy beneath the AFU module.
The AFU module must be instantiated in the ccip_std_afu module, which is defined in the ccip_std_afu.sv file included in the Acceleration Stack installation.
Your AFU project must include the ccip_std_afu.sv and ccip_interface_reg.sv files by referring to them in the afu.qsf file. These two files are included in the design file sets of the example AFUs included in the Acceleration Stack installation. You can either directly refer to them or copy them over to your AFU design file set from the hello_afu example’s design file directory: $DCP_LOC/hw/samples/hello_afu/hw/rtl.
The hello_afu example provides a simple example of where to instance the AFU module in the ccip_std_afu.sv file along with the interface port listings for the CCI-P interface for host memory and MMIO accesses and the Avalon® memory mapped slave interfaces for accessing the local pools of SDRAM memory on the Intel® PAC with Arria® 10.
The nlb_mode_0 example AFU provides an example for a more involved AFU that includes multiple RTL source files and Synopsys® Design Contraints (sdc) files for the AFU.
Intel® FPGA Basic Building Blocks (BBBs) are reference designs of common functions that can be used in AFU designs. These references are provided as-is - they are not validated by Altera® PSG. The available BBBs, including documentation, are maintained at the GitHub site.
- The bitstreams used for Partial reconfiguration should be generated using the script-method provided by the $DCP_LOC/bin/run.sh script.
- Partial reconfiguration switches the PR region from one AFU to another AFU. Any software application exercising an AFU in the PR region should be terminated before initiating PR with OPAE to switch in a new AFU. This includes the remote debug feature.
- LAB/MLAB with initial content is not supported either in RAM mode or in ROM mode within the AFU. There are no such restrictions on M20K block usage.
- When using on-chip memory blocks (M20K or MLAB), implement clock enable logic in the AFU to avoid spurious writes into the memories upon exit of PR.
- Logic in the AFU should not depend on initial values or states coded through initial statements.
- After partial reconfiguration, the registers in the PR region (AFU) come up in an indeterminate state. To restore initial condition, a reset pulse is generated at the CCI-P interface after PR. AFUs must use this reset to restore all initial conditions.
- The PR region must contain only core resources like LABs, RAMs and DSPs. PLLs and Clock control blocks cannot be instantiated in the PR region.
Follow these guidelines when designing a custom AFU:
- Reset and initialize all output registers.
- On the CCI-P interface, drive host reads and writes on Virtual Channel 0 (VH0). This is the only channel available on the Intel® PAC with Arria® 10 platform.
- MMIO addressing must be quad word (qword) aligned. A qword is eight bytes.
- Regenerate the AFU ID for new AFUs using the third-party tool, UUID Generator. For more information on generating an AFU ID, refer to the CCI-P Reference Manual.
The Acceleration Stack for Intel® Xeon® CPU with FPGAs CCI-P Reference Manual documents all the requirements on an AFU interfacing with the FPGA Interface Unit (FIU) in the FIM over the CCI-P protocol as well as requirements for CSR and address mapping. An AFU design must meet all the requirements specified in the following sections of the CCI-P reference manual:
- Section 22.214.171.124: CCI-P Guidance
- Section 1.6: AFU Requirements
- Section 1.8: Device Feature List
The above sections in the CCI-P reference manual include requirements unique to the Intel® Xeon® Processor with Integrated FPGA (referred to as Integrated FPGA Platform throughout this document) hardware platform, but most of the information applies to the Intel® PAC with Arria® 10 platform. The notable differences between the two platforms is that the PAC does not have an UPI channel, and no accelerator cache is implemented in the FIM.
The hello_afu example AFU included with the Acceleration Stack provides an example implementation of a simple Device Feature List that meets the requirements for an AFU as specified by the CCI-P reference manual. The nlb_400_0 and dma_afu example AFUs provide example implementations of more featured Device Feature Lists.
For more information about the Avalon® -MM interface, refer to the Avalon Interface Specifications.
The Acceleration Stack installation includes two scripts to facilitate Intel® Quartus® Prime Pro EditionPR compilation of AFUs. These scripts are located in the $DCP_LOC/bin directory.
This script performs AFU synthesis on the afu_synth revision, fits the synthesis snapshot of the AFU and the final snapshot of the FIM design database, and invokes the Packager to generate the loadable AFU image file (.gbs).
For example, the command sequence shown in Code 4-1 compiles the hello_afu example AFU and generates a loadable AFU image.
Code 4-1: Compile hello_afu Example AFU
$ cd $DCP_LOC/hw/samples/hello_afu $ run.sh
By default, the run.sh script will generate the loadable AFU image using the first matching .json metadata file found relative to the AFU project directory: hw/rtl/*.json, hw/*.json, *.json
. You can explicitly specify a particular .json file to use for generating the AFU image by passing a positional argument as shown in Code 4-2.
Code 4-2: Pass an Argument to a Specific .json File
$ cd $DCP_LOC/hw/samples/hello_afu $ run.sh <my-alternate-json-location>/<filename>.json
If the .json filename starts with the “-“ character, pass the argument with a preceding “--“.
The run.sh script supports the following options:
By default, run.sh relies on your PATH environment variable to point to the Packager binary located in $DCP_LOC/bin. If PATH has not been setup in this way, use the –packager|-p option to explicitly point to the location of the Packager binary for generating the loadable AFU image.
By default, run.sh finds the FIM design database using its relative path in the Acceleration Stack installation. As installed, the FIM design database is located at $DCP_LOC/hw/lib, but if this path or the run.sh script’s relative location to it has been altered, use the –-bbs-lib|-l option to point to the FIM design database.
When run from the AFU’s project directory, the clean.sh script will restore the AFU design by deleting all Intel® Quartus® Prime Pro Edition compilation output from an invocation of the run.sh script. The clean.sh script takes no arguments or options. Code 4-3 shows an example of using the clean.sh script.
Code 4-3: Clean Up the AFU Design Directory
$ cd $DCP_LOC/hw/samples/hello_afu $ clean.sh
The run.sh script invokes the Packager after compiling an AFU to generate a loadable AFU image. For situations where you want to either update the metadata in an existing image or create an additional image with unique metadata without recompiling the AFU, run the Packager standalone.
The Packager utility is located in the $DCP_LOC/bin directory.
For more information, refer to the OPAE Tools User Guide.
The AFU supports functional verification of AFU RTL code using host application C code developed for the OPAE API without the need for Intel® PAC with Arria® 10 hardware. The ASE virtualizes the AFU’s physical link with the host, models certain aspects of the OPAE host memory model, and supports communication between the OPAE host application and supported RTL simulation tools used to emulate the AFU running on actual Intel Programmable Acceleration Card (PAC) hardware.
ASE is useful for verifying your AFU’s interoperability with the rest of the Acceleration Stack using a quick, iterative functional debug environment to minimize time spent in subsequent portions of the AFU development flow that involve more time-intensive steps (for example, PAR, timing closure). ASE also enables a more cost-efficient development environment by removing the dependency on PAC hardware for early functional debug of AFU interoperability within the Acceleration Stack.
For more information about ASE, refer to the following documentation:
- Acceleration Stack for Intel® Xeon® CPU with FPGAs ASE Quick Start Guide
- Acceleration Stack for Intel® Xeon® CPU with FPGAs ASE User Guide
The Acceleration Stack provides a remote Signal Tap facility. Use remote Signal Tap to debug an AFU in-system. The Signal Tap II Logic Analyzer, included in the Intel® Quartus® Prime Pro Edition, allows you to trigger on AFU signal events and capture traces of signals in your AFU design. The remote capability allows for control of trigger conditions and upload of captured signal traces from a networked workstation running the Signal Tap GUI.
Signal Tap is an in-system logic analyzer that you can use to debug FPGA logic. Conventional (non-remote) Signal Tap uses the physical FPGA JTAG interface and a Intel® FPGA Download Cable II to bridge the Intel® Quartus® Prime Signal Tap application running on a host system with the Signal Tap controller instances embedded in the FPGA logic. With Remote Signal Tap, you can achieve the same result without physically connecting to JTAG, which enables signal-level, in-system debug of AFUs deployed in servers where physical access is limited.
For more information about Signal Tap, refer to the "Related Documentation" section.
In addition to Signal Tap, the remote debug facility in OPAE supports the following in-system debug tools included with the Intel® Quartus® Prime Pro Edition:
- In-system sources and probes
- In-system Memory Content Editor
- Signal Probe
- System Console
This section describes how to generate a loadable AFU image with remote Signal Tap enabled. This section then describes how to debug a user AFU using OPAE’s mmlink utility, the System Console utility, and Intel® Quartus® Prime Pro Edition.
The nlb_mode_0_stp variation of the nlb_400 example AFU is used to illustrate how to enable and use remote Signal Tap.
To add Signal Tap trigger and data nodes from signals in your AFU, follow the method documented in the related information for Signal Tap.
For working within the PR compilation flow for compiling the Signal Tap-enabled AFU, follow these flow steps:
If the run.sh script has been run on the AFU project, skip to step 2, otherwise copy the
Quartus® Prime PR build project from the Acceleration Stack installation to your AFU project directory:
$ cp -rLf $DCP_LOC/hw/lib/build <path-to-afu-proj-dir>
Quartus® Prime PR build project from the
Quartus® Prime Pro Edition GUI:
Select the afu_synth revision.
If you have already run the run.sh script or otherwise ran your AFU through the synthesis step, skip to step 4.
- From the Quartus GUI, perform an Analysis & Elaboration on your AFU RTL to generate a netlist from which to add debug nodes with the Signal Tap tool.
- Invoke the Signal Tap tool from the Intel® Quartus® Prime Pro Edition GUI and add your AFU signals for trigger/data debug nodes as usual.
- When done adding debug nodes, save the .stp file and optionally choose to add the .stp file to the Intel® Quartus® Prime Pro Edition project and enable Signal Tap for the project.
- Exit Signal Tap.
- Exit the Intel® Quartus® Prime Pro Edition GUI.
In the nlb_mode_0_stp example, <path-to-afu-proj-dir> is $DCP_LOC/hw/samples/nlb_mode_0_stp.
The nlb_mode_0_stp example already has a .stp file: $DCP_LOC/hw/samples/nlb_mode_0_stp/hw/par/stp_basic.stp.
Signal Tap must be enabled in the AFU afu.qsf file. You must add the following settings to the afu.qsf file even if you enabled Signal Tap when saving the .stp file and exiting the Signal Tap GUI. Figure 6-1 shows the required Quartus settings in afu.qsf.
The following shows the required Intel® Quartus® Prime settings in afu.qsf:
Quartus Settings for Enabling
set_global_assignment -name VERILOG_MACRO INCLUDE_REMOTE_STP set_global_assignment -name SIGNALTAP_FILE \ ../<path-relative-to-afu-proj-dir>/<stp-filename>.stp set_global_assignment -name ENABLE_SIGNALTAP ON set_global_assignment -name USE_SIGNALTAP_FILE \ ../<path-relative-to-afu-proj-dir>/<stp-filename>.st
The nlb_mode_0_stp example already has the above settings added to its afu.qsf file located in $DCP_LOC/hw/samples/nlb_mode_0_stp/hw.
After adding the above settings to the AFU's afu.qsf file, generate the remote debug enabled AFU image:
$ cd <afu-proj-dir> $ run.sh
The nlb_mode_0_stp example already has a remote debug enabled AFU image:$DCP_LOC/hw/samples/nlb_mode_0_stp/bin/nlb_mode_0_stp.gbs.
Copy the following files from the Acceleration Stack installation over to a convenient working directory on the remote debug host:
- The Signal Tap .stp file compiled with your AFU. In the case of the nlb_mode_0_stp example AFU, the .stp file is located in the Acceleration Stack installation as $DCP_LOC/hw/samples/nlb_mode_0_stp/hw/par/stp_basic.stp.
- The following two files support establishing a connection on the remote
debug host to the AFU
Signal Tap instances on the
Intel® PAC with Arria® 10. These files are part of the Acceleration Stack release – do not modify them.
Follow these steps on the debug target host with the PAC installed:
- If not already done, load the Signal Tap-enabled AFU.
$ sudo fpgaconf $DCP_LOC/hw/samples/nlb_mode_0_stp/bin/nlb_mode_0_stp.gbs
- Open a TCP port to accept incoming connection requests from remote debug
$ sudo mmlink -P 3333
Follow these steps on the remote debug host:
- Use System Console to connect
to the debug target host’s TCP port for Signal Tap
debug connection on the target AFU. If the remote
debug host is a Windows platform, open a command shell to run the below
$ cd <path-to-debug-working-directory> $ system-console --rc_script=mmlink_setup_profiled.tcl remote_debug.sof <IP-address-of-debug-target-host> 3333The above command assumes your PATH environment variable on the remote debug host is setup to point to the following location in the Intel® Quartus® Prime Pro Edition installation:
<installation-path>/<q-edition>/sopc_builder/binwhere <q-edition> is "quartus" for Intel® Quartus® Prime Pro Edition or Intel® Quartus® Prime Standard Edition. For an Intel® Quartus® Prime Programmer Edition installation, <q-edition> is qprogrammer.
- After issuing the above commands, the System Console window appears. Wait for the
“Remote system ready” message in the Tcl Console pane.
Perform these steps on the remote debug host:
- Invoke the Signal Tap GUI.
For more information, refer to the "Related Documentation" section.
- From File > Menu, navigate to and open the .stp file you copied over from the "Prepare the Remote Debug Host" section when you were preparing the remote debug host for debugging the AFU.
- Complete connecting to the Signal Tap controller instances in the target AFU by selecting “System Console on … Sid Hub Controller System” from the Hardware drop-down option box in the JTAG Chain Configuration pane.
- Wait for the “JTAG ready” response.
At this point, you are ready to perform in-system debug with the Signal Tap GUI in the same manner as with the conventional target connection method.
For more information about Signal Tap features and usage, refer to the "Related Documentation" section.
Use host application C code software designed for the OPAE API to stimulate the AFU and verify proper operation within the Acceleration Stack. Leave the mmlink tool running in a separate terminal window on the debug target host while the remote debug host is connected. The mmlink process will continuously output status to the terminal window. Invoke OPAE host application or test software from their own terminal windows on the debug target host.
When using OPAE application/test code running on the debug target host to stimulate the AFU for the purposes of in-system debug, both the mmlink tool and your host application/test code must have simultaneous access to the AFU. For this to happen, any user space code calls to the fpgaOpen() OPAE API function must pass the FPGA_OPEN_SHARED flag. The Acceleration Stack installation uses the FPGA_OPEN_SHARED flag with calls to fpgaOpen() in the source code for the mmlink tool and the hello_fpga sample application, which enables remote debug as delivered in the installation for the nlb_mode_0_stp example AFU stimulated by the hello_fpga sample application without modification.
fpgaOpen(afc_token, &afc_handle, FPGA_OPEN_SHARED);
Any other sample applications included in the Acceleration Stack installation or host code of your own design must use the shared flag when used to stimulate the AFU during in-system remote debug where mmlink is required to run simultaneously.
When you are finished debugging, follow these steps to gracefully end the debug connection:
First, on the remote debug host…
- Save trace captures and exit the Signal Tap GUI.
- From the System Console File menu, click exit to disconnect from the target AFU.
On the debug target host…
You can either keep the mmlink instance active and host debug sessions from other remote debug hosts, or you can terminate mmlink with the <Ctl-C> key sequence from its terminal window. If you choose to keep mmlink active, you can only debug the currently loaded AFU. If you want to debug another AFU, you must first terminate the active mmlink process. Before loading another AFU, make sure to terminate any OPAE host application code accessing the current AFU.
- Signal Tap debug feature becomes non-functional when mmlink or System Console applications are closed.
- When performing PR, the AFU is non-existent and cannot be debugged. Therefore, System Console and mmlink applications should be terminated before attempting a partial reconfiguration of the AFU. Failing to do so might cause both PR and Signal Tap utilities to fail, taking the system into an unknown state. The system might have to be rebooted to restore the initial condition.
- The time to upload Signal Tap trace captures increases exponentially with sample depth. It is recommended to use sample depths less than "2K" for better Signal Tap user experience. Remote debug would still be functional even for larger depths but the time to upload the captured samples is significantly higher.
- System Console must be started after launching the mmlink application. If System Console returns an error, close the mmlink application, re-invoke mmlink, and launch System Console again.
If you get a Failed to connect message after invoking System Console, consider adding port tunneling. Do this when the debug target host is behind a firewall with respect to your remote debug host is not.
On the debug target host, run mmlink as before. Note that mmlink provides an option to specify a port number. Port 3333 is the default.
$ mmlink --port=3333
Setup port tunneling on the remote debug host. This example shows how to do so on a Windows remote debug host using PuTTY.
Use a PuTTY configuration screen as shown in the "SSH Tunneling with PuTTY" figure. For <SDP>, enter the name of the debug target host. This forwards the local port on your Windows host 4444 to port 3333 on the debug target host.
Then, Click Session, specify the name of the debug target host, click Save, and then Open. Login to the debug target host. This is your tunneling session.
Once the tunneling session is setup this forwarding is complete. Open a Windows Command Window and issue the system-console command as shown in the "Save and Open the Tunneling Session" figure.
$ system-console --rc_script=mmlink_setup_profiled.tcl remote_debug.sof localhost 4444
As before, the Quartus System Console comes up. Wait for the Remote system ready message on the tcl console of the System Console.
|2017.12.22||First release of comprehensive developer's guide to replace the Accelerator Functional Unit (AFU) Information Brief.|
|2017.10.02||Initial Release named "Accelerator Functional Unit (AFU) Information Brief".|