The purpose of this document is to provide a set of design guidelines and recommendations, as well as a list of factors to consider, for designs that use the Cyclone® V SoC and Arria V SoC FPGA devices. This document assists you in the planning and early design phases of the SoC FPGA design, Qsys sub-system design, board design and software application design.
Stages of the HPS Design Flow
Hardware and Software Partitioning
Determine your system topology and use it as a starting point for your HPS to FPGA interface design.
|Background: Hardware Software Partitioning|
HPS Pin Multiplexing and I/O Configuration Settings
Plan configuration settings for the HPS system including I/O multiplexing options, interface to FPGA and SDRAM, clocks, peripheral settings
|Design Considerations for Connecting Device I/O to HPS Peripherals and Memory|
HPS Clocks and Reset Considerations
HPS clocks and cold and warm reset considerations
|HPS Clocking and Reset Design Considerations|
HPS EMIF Considerations
Usage of the HPS EMIF controller and related considerations
|HPS EMIF Design Considerations|
FPGA Accelerator Design Considerations
Design considerations to manage coherency between FPGA accelerators and the HPS
|Design Considerations for FPGA based Accelerators|
Recommended Tools for IP Development
Signal Tap II, BFMs, System Console
|IP Debug Tools|
Stages of the Board Design Flow
HPS Power design considerations
Power on board bring up, early power estimation, design considerations for HPS and FPGA power supplies, power analysis and power optimization
|HPS Power Design Considerations|
Board design guidelines for HPS interfaces
Includes EMAC, USB, QSPI, SD/MMC, NAND, UART and I2C
|Design Guidelines for HPS Interfaces|
Stages of the Embedded Software Design Flow
Operating System (OS) considerations
OS considerations to meet your application needs, including real time, software reuse, support and ease of use considerations
|Selecting an Operating System for your application|
Boot Loader considerations
Boot loader considerations to meet your application needs. including GPL requirements, and features.
|Choosing Boot Loader Software|
|Boot and Configuration Design Considerations||Boot source, boot clock, boot fuses, configuration flows||Boot and Configuration Design Considerations|
HPS ECC Considerations
ECC for external SDRAM interface, L2 cache data memory, flash memory
|HPS ECC Design Considerations|
|HPS SDRAM Considerations||Using Preloader to debug HPS SDRAM, Accessing the HPS SDRAM||HPS SDRAM Considerations|
For design guidelines for the FPGA portion of your design, please refer to the Arria® V and Cyclone® V Device Design Guidelines.
While the HPS subsystem in Cyclone® V SoC and Arria® V SoC is architecturally similar, there are a few differences in features as listed below.
Cyclone® V SoC
Arria® V SoC
Maximum MPU Frequency
Up to 925 MHz
Up to 1.05 GHz
Controller Area Network (CAN)
Total HPS Dedicated I/O with Loaner capability
Up to 67
Automotive Grade Option
Maximum supported DDR3 Frequency for HPS SDRAM
The first step in SoC FPGA design is to determine how to partition your application into functions that will be implemented in the FPGA user logic and functions that will be implemented in the Hard Processor System (HPS). The resulting system topology guides the design considerations for the interface between the HPS and the FPGA logic.
By following the provided recommendations, you can select a configuration that meets the throughput, latency, coherency and Quality of Service (QoS) requirements of your application.
As a prerequisite to understanding the design considerations for interfacing between the Cyclone® V SoC and Arria V SoC HPS and FPGA logic, you should familiarize yourself with the HPS-to-FPGA interface architecture.
Lightweight (LW) HPS-to-FPGA Bridge
This bridge is primarily used by the MPU in the HPS to access control and status registers of IP implemented in FPGA user logic.
This bridge is used by the MPU or L3 masters in the HPS to access data implemented by memory or interfaces available to the FPGA user logic. This bridge is also used to access an optional boot memory for the HPS that is located within the FPGA fabric.
This bridge is used by masters implemented in the FPGA user logic to access the memory space within the HPS. This bridge also allows FPGA masters to access cache coherent data when the masters perform cacheable accesses.
The behavior of all these bridges is controlled by global programmers view (GPV) registers. Access to the GPV registers of all three bridges is provided through the lightweight HPS-to-FPGA bridge.
The FPGA fabric accesses the CPU subsystem, the HPS peripherals and the HPS SDRAM memory through the interconnect. This interconnect allows masters in the FPGA, L3 and the MPU to access slaves in the HPS (peripherals or memories) and in the FPGA. The system interconnect comprises the L3 interconnect and L4 buses. The L3 interconnect itself consists of the L3 Main Switch, the L3 Slave Peripheral Switch and the L3 Master Peripheral Switch as shown in the figure below.
- FPGA-to-HPS bridge non-cacheable
- FPGA-to-HPS bridge cacheable
User can configure the FPGA-to-SDRAM interface with up to six Avalon-MM interfaces or three AXI interfaces. These interfaces can be configured to support 32-, 64-, 128-, and 256-bit data. Refer to the SDRAM Controller Subsystem chapter of the Cyclone® V SoC / Arria® V SoC HPS Technical Reference Manual for more information on FPGA-to-SDRAM usage.
For cacheable accesses, the only option available for access is FPGA-to-HPS bridge cacheable.
Designers can connect soft logic components to the HPS using the Cyclone® V/ Arria® V HPS component in Qsys.
The architecture of SoC FPGAs support a number of system topologies between the HPS, FPGA core and external memory interfaces (EMIF). Depending on your application, the HPS-FPGA system topology may fall under one or more of the following categories:
Loosely coupled systems with Independent HPS and FPGA Memory
- In this topology, the hard processor system and the FPGA core operate relatively independently with their own separate memories. The HPS has a dedicated hard memory controller that the HPS has exclusive access to and the FPGA fabric has access to one or more internal or external memories. The HPS and FPGA subsystems are coupled with the HPS subsystem controlling and monitoring hardware in the FPGA subsystem.
- An example of this type of system is a typical control path-data path networking system. Here the bulk of the data flow streams into the FPGA core via High-Speed Serial Interface (HSSI) and is buffered on chip or in external memory. The HPS is responsible for management and control functions and accesses its program code from its own external memory. There are occasionally accesses between the two subsystems for exception handling, simple control, status reads/writes.
Tightly coupled systems with shared internal memory
- In this topology, the HPS and FPGA subsystems are characterized by high throughput traffic between the HPS and FPGA subsystems. The traffic comprises small packets that it can be stored in the on chip memory in the FPGA and HPS subsystems, thus avoiding the need to store the shared data in an external SDRAM memory.
- An example of this system is in a control path-data path networking or packet processing system where the processor accesses and modifies packet headers in the datapath for Layer 2 MAC processing, Operations Administration and Maintenance (OAM), header processing etc.
Tightly coupled systems with Shared External Memory (Co-processing)
- In this topology, HPS and the FPGA perform co-processing using the same data set and this data set is large and needs to be stored off chip in an external memory and accessed by both the HPS and FPGA.
- An example of this system is a DSP where both the processor and the FPGA implement the algorithm but the data is primarily on-chip.
Depending on your topology, you can choose one of the two hardware reference designs as a starting point for your hardware design.
GUIDELINE: Altera® recommends that you use the Golden System Reference Design (GSRD) as a starting point for a loosely coupled system.
The Golden System Reference Design (GSRD) is an example of a loosely coupled system. The hardware design has the optimum default settings and timing that you can use as a basis of your "getting started" system.
GUIDELINE: Altera recommends that you use the Cyclone® V Datamover Design Example design to optimize your hardware design and software solutions to achieve high performance real time application with HPS ARM processor.
If your design resembles a tightly-coupled system that requires high throughput of data via the FPGA-HPS bridges, you can refer to the Cyclone® V Datamover example to optimize the hardware and software portion of your design. This example includes software design examples for Linux, VxWorks and Baremetal.
To determine which system topology best suits your application, you must first determine how to best partition your application into hardware and software.
GUIDELINE: Profiling your software, using any profiling tool (e.g. DS-5 streamline profiler) will help you identify functions that are good candidates for hardware acceleration and isolate those functions that are best implemented in software.
- HPS Column I/O: Contains the HPS Dedicated Function Pins and HPS Dedicated I/O with loaner capability
- HPS Row I/O: Contains the HPS External Memory Interface (EMIF) I/O and HPS General Purpose Input (GPI) pins
- HPS Dedicated Function Pins:These I/O has only one function and cannot be used for other purposes.
- HPS Dedicated I/O with loaner capability : These I/O are primarily used by the HPS, but can be used on an individual basis by the FPGA if the HPS is not using them.
- HPS External Memory Interface (EMIF) I/O : These I/O are used for connecting to the HPS external memory interface (EMIF). Refer to the “External Memory Interface in Cyclone® V Devices” or “External Memory Interface in Arria® V Devices” chapter in the respective device handbook for more information regarding the layout of these I/O pins.
- HPS General Purpose Input (GPI) Pins : These pins are also known as HLGPI pins. These input-only pins are located in the same bank as the HPS EMIF I/O. Note that the smallest Cyclone V SoC package U19 (484 pins) does not have any HPS GPI pins.
- FPGA I/O: There are general purpose I/O that can be used for FPGA logic, FPGA External Memory Interfaces and High Speed Serial Interfaces.
The table below summarizes the characteristics of each I/O type.
|HPS Dedicated Function Pins||HPS Dedicated I/O with loaner capability||HPS External Memory Interface||HPS General Purpose Input||FPGA I/O|
|Number of Available I/O||11||Up to 67 ( Cyclone® V SoC) and 94 ( Arria® V SoC)||Up to 86||14 (except for Cyclone® V SoC U19 package )||Up to 288 ( Cyclone® V SoC) and Up to 592 ( Arria® V SoC)|
|Voltages Supported||3.3V, 3.0V, 2.5V, 1.8V, 1.5V||3.3V, 3.0V, 2.5V, 1.8V, 1.5V||LVDS I/O for DDR3, DDR2 and LPDDR2 protocols||Same as the I/O bank voltage used for HPS EMIF||3.3V, 3.0V, 2.5V, 1.8V, 1.5V, 1.2V|
|Purpose||Clock, Reset, HPS JTAG||Boot source, High speed HPS peripherals||Connect to SDRAM||General Purpose Input||General Purpose I/O|
|Timing Constraints||Fixed||Fixed||Fixed for legal combinations||Fixed||User defined|
|Recommended Peripherals||JTAG||QSPI, NANDx8, eMMC, SD/MMC, UART, USB, EMAC||DDR3, DDR2 and LPDDR2 SDRAM||GPI||Slow speed peripherals (I2C, SPI, EMAC-MII)|
Because the HPS contains more peripherals than can all be connected to the HPS Dedicated I/O, the HPS component in Qsys offers pin multiplexing settings as well as the option to route most of the peripherals into the FPGA fabric. Any unused pins for the HPS Dedicated I/O with loaner capability meanwhile can be used as general purpose I/O by the FPGA.
Note that a HPS I/O Bank can only support a single supply of either 1.2V, 1.35V, 1.5V, 1.8V, 2.5V, 3.0V, or 3.3V power supply, depending on the I/O standard required by the specified bank. 1.35V is supported for HPS Row I/O bank only.
GUIDELINE: Ensure that you route USB, EMAC and Flash interfaces to HPS Dedicated I/O first, starting with USB.
It is recommended that you start by routing high speed interfaces such as USB, Ethernet, and flash to the HPS Dedicated I/O first. USB in particular must be routed to HPS Dedicated I/O because it is not available to the FPGA fabric. The flash boot source must also be routed to the HPS dedicated I/O (and not any FPGA I/O) since these since these are the only I/O that are functional before the FPGA I/O have been configured.
GUIDELINE: Enable the HPS GPI pins in the Qsys HPS Component if needed
By default, the HPS GPI interface is not enabled in Qsys. To enable this interface, you must select the checkbox "Enable HLGPI interface" in the Qsys HPS Component for Cyclone® V/ Arria® V. These pins are then exposed as part of the Qsys HPS Component Conduit Interface and can be individually assigned at the top level of the design.
GUIDELINE: Ensure that you have I/O settings for the HPS Dedicated I/O (drive strength, I/O standard, weak pull-up enable, etc.)
The HPS pin location assignments are managed automatically when you generate the Qsys system containing the HPS. As for the HPS SDRAM, the I/O standard and termination settings are done once you run the “hps_sdram_p0_pin_assignments.tcl” script that is created once the Qsys HPS Component has been generated.
The only HPS I/O constraints you must manage are for HPS Dedicated Function Pins and HPS Dedicated I/O. Constraints such as drive strength, I/O standards, and weak pull-up enables are added to the Quartus® Prime project just like FPGA constraints and are applied to the HPS at boot time when the second stage bootloader configures the I/O. For FPGA I/O, the I/O constraints are applied to the FPGA configuration file.
The main clock and resets for the HPS subsystem are HPS_CLK1, HPS_CLK2, HPS_nPOR, HPS_nRST and HPS_PORSEL. HPS_CLK1 sources the Main PLL that generates the clocks for the MPU, L3/L4 sub-systems, debug sub-system and the Flash controllers. It can also be programmed to drive the Peripheral and SDRAM PLLs. HPS_CLK2 meanwhile can be used as an alternative clock source to the Peripheral and the SDRAM PLLs.
HPS_nPOR provides a cold reset input, and HPS_nRST provides a bidirectional warm reset resource. As for the HPS_PORSEL, it is an input pin that can be used to select either a standard POR delay or a fast POR delay for the HPS block.
GUIDELINE: Verify MPU and peripheral clocking using Qsys
Use Qsys to initially define your HPS component configuration. Set the HPS input clocks, and peripheral source clocks and frequencies. Note any Qsys warning or error messages and address them by modifying clock settings or verifying that a particular warning will not adversely affect your application.
GUIDELINE: Choose an I/O voltage level for the HPS Dedicated Function I/O
HPS_CLK1, HPS_CLK2, HPS_nPOR and HPS_nRST are powered by VCCRSTCLK_HPS. These HPS Dedicated Function Pins are LVCMOS/LVTTL at either 3.3V, 3.0V, 2.5V or 1.8V. The I/O signaling voltage for these pins are determined by the supply level applied to VCCRSTCLK_HPS.
GUIDELINE: With the HPS in use (powered), supply a free running clock on HPS_CLK1 for SoC device HPS JTAG access.
Access to the HPS JTAG requires an active clock source driving HPS_CLK1.
GUIDELINE: When daisy chaining the FPGA and HPS JTAG for a single device, ensure that the HPS JTAG is first device in the chain (located before the FPGA JTAG).
Placing the HPS JTAG before the FPGA JTAG allows the ARM DS-5 debugger to initiate warm reset to the HPS. However, in case of cold reset the entire JTAG chain will be broken until the cold reset completes, as discussed in the next section.
GUIDELINE: Consider board design to isolate HPS JTAG interface
Due to an erratum on Cyclone® V/ Arria® V SoC, the HPS Test Access Port (TAP) controller will be reset by a cold reset. If the HPS JTAG and FPGA JTAG are daisy chained together, the entire JTAG chain will be broken until the cold reset completes. As such, it is recommended that the board is designed in to allow for HPS JTAG to be bypassed. Note that this is not required if access to the JTAG chain is not required during HPS cold reset.
GUIDELINE: HPS_nRST is an open-drain, bidirectional dedicated reset I/O.
HPS_nRST is an active low, open-drain-type, bidirectional I/O. Externally asserting a logic low to the HPS_nRST pin initiates a warm reset of the HPS subsystem. HPS warm and cold reset can also be asserted from internal sources such as software-initiated resets and reset requests from the FPGA fabric. When the HPS is internally placed under warm or cold reset, the HPS component becomes a reset source and drives the HPS_nRST pin low, resetting any connected board-level components. Externally asserting the HPS_nPOR pin also results in the HPS asserting reset on the HPS_nRST pin.
GUIDELINE: Observe the minimum assertion time specifications of HPS_nPOR and HPS_nRST.
Reset signals on the HPS_nPOR and HPS_nRST pins must be asserted for a minimum number of HPS_CLK1 cycles as specified in the HPS section of the Cyclone® V/ Arria® V Device Datasheet.
GUIDELINE: Avoid cascading PLLs between the HPS and FPGA
Cascading PLLs between the FPGA and HPS has not been characterized. Unless you perform the jitter analysis, do not chain the FPGA and HPS PLLs together as a stable clock coming out of the last PLL in the FPGA cannot be guaranteed. Output clocks from the HPS are not intended to be fed into PLLs in the FPGA.
A critical component of the HPS subsystem is the external SDRAM memory. For Cyclone® V and Arria® V SoC device, the HPS has a dedicated SDRAM Subsystem that interfaces with the HPS External Memory Interface I/O.
The following design considerations will help with properly designing the interface between the memory and HPS subsystem. The following documentation is essential to successfully connecting external SDRAM to the HPS subsystem.
External Memory Interface Handbook, Volume 3: Reference Material
The External Memory Interface (EMIF) Handbook: Chapter 5 includes the functional description of the HPS memory controller. The supported interface options are listed for DDR3, DDR2 and LPDDR2.
GUIDELINE: Ensure that the HPS memory controller Data Mask (DM) pins are enabled
In the HPS Component in Qsys, ensure that the checkbox to enable the data mask pins is enabled. If this control is not enabled, data corruption will occur any time a master accesses data in SDRAM that is smaller than the native word size of the memory.
Determine your SDRAM Memory type and bit width. Cyclone® V and Arria® V SoC devices offer DDR3, DDR2 and LPDDR2 SDRAM support for the HPS.
GUIDELINE: Ensure that you choose only DDR3, DDR2, or LPDDR2 components or modules in configurations supported by the Cyclone® V or Arria® V HPS EMIF for your specific device/package combination.
Altera® 's External Memory Interface Spec Estimator is a parametric tool that allows you to compare supported external memory interface types, configurations and maximum performance characteristics in Altera® FPGA and SoC devices.
First, filter the “Family” to select only Cyclone® V / Arria® V SoC device. Then, follow on by using the filter on “Interface Type” to choose only “HPS Hard Controller”
GUIDELINE: Ensure that in the Qsys HPS Component, the Memory Clock Frequency is supported by the device speed grade.
Refer to the Altera® 's External Memory Interface Spec Estimator to obtain the maximum supported memory clock frequency for the device speed grade.
The Cyclone® V and Arria® V SoC HPS External Memory Interface I/O locations are fixed, depending on the type of memory used. You can refer to the device Pin Out files, under the “HMC Pin Assignment for DDR3/DDR2” and “HMC Pin Assignment for LPDDR2” for exact I/O pins used by the respective memory interface pins.
Consider the following when integrating the Cyclone® V / Arria® V SoC HPS EMIF with the rest of the SoC system design.
GUIDELINE: Follow the guidelines for optimizing bandwidth for all masters accessing the HPS SDRAM
Accesses to SDRAM connected to the HPS EMIF go through the L3 Interconnect (except for FPGA-to-SDRAM bridge). When designing and configuring high bandwidth DMA masters and related buffering in the FPGA core, refer to Design Considerations for FPGA based Accelerators. The principles covered in that section apply to all high bandwidth DMA masters (e.g. Qsys DMA Controller components, integrated DMA controllers in custom peripherals) and related buffering in the FPGA core that access HPS resources (e.g. HPS SDRAM) through the FPGA-to-SDRAM and FPGA-to-HPS bridge ports, not just tightly-coupled HPS hardware accelerators.
The Cyclone® V / Arria® V HPS EMIF do not support the external memory interface toolkit. To debug the HPS EMIF, you can change the settings inside the preloader software to enable Runtime Calibration Report and Debug Level info. In addition, you can use the preloader software to check the status of HPS SDRAM PLL.
Refer to Using Preloader To Debug the HPS SDRAM for more information.
The diagram below illustrates the HPS Bandwidth based on the Cyclone® V Datamover design example. Note that you may obtain different bandwidth figures depending on factors such as device speed grade, clock settings and data width.
Choose the DMA implementation best suited to your design
- HPS DMA: primarily used to move data to and from other slow-speed HPS modules, such as SPI and I2C, as well as to do memcopy functions to and from HPS memories.
- Soft DMAs: primarily used to move data to and from peripherals in the FPGA.
FPGA DMA masters have access to HPS resources through the FPGA-to-HPS Bridge and FPGA-to-SDRAM Interface, configurable in the HPS Qsys Component. The HPS SDRAM controller multi-port-front end (MPFE) provides arbitration for these resources and enforce Quality of Service (QoS) settings. When planning for and designing DMA masters and related buffering that access resources through the HPS interconnect, study the architecture of the HPS interconnect and consider the following guidance and resources available for optimizing bandwidth through the interconnect.
GUIDELINE: Utilize the Cyclone® V Datamover Example to tune for performance
The Cyclone® V Datamover Design Example is a useful reference for optimizing your hardware design to achieve high performance real time application with HPS ARM processor.
GUIDELINE: Exploit interleaved accesses when accessing the HPS SDRAM through the FPGA-to-HPS Bridge or FPGA-to-SDRAM InterfaceWhere possible in your application, interleave read and write accesses to maximize bandwidth to the SDRAM.
The HPS bridges and FPGA-to-SDRAM interfaces exposed to the FPGA are synchronous and clock crossing is performed within the interface itself. As a result, you only need to ensure that both the FPGA-facing logic and your user design close timing in Timequest. Interrupts are considered asynchronous by the HPS, and as a result the HPS logic resynchronizes them to the internal HPS clock domain so there is no need to close timing for them.
Conduits carry signals that do not fit into any standard interface supported by Qsys. Examples of these are HPS peripheral external interfaces routed into the FPGA fabric or the HPS DMA peripheral request interfaces.
Data shared between the HPS and the FPGA logic can be modified at any point. If the application requires that any changes to the data must be propagated via the entire system so that every master has the most up to date value of the data then this transaction is said to need coherency.
The first design consideration to take into account is to understand which data transfers need to be coherent. By default all access between the FPGA and HPS are assumed to be non-coherent unless coherency is explicitly managed by software or using coherent hardware features of the HPS (SCU & ACP).
- Will data generated by my FPGA peripheral need to be accessed by the MPU?
- Will data generated by the MPU need to be accessed by the peripheral in the FPGA?
If either is true then the data must be coherent. You can use the ACP to keep the FPGA coherent with cacheable data in the HPS.
There are several mechanisms via which coherency are maintained through the system:
The HPS maintains cache coherency at a level 1 memory subsystem level within the MPU subsystem. The snoop control unit (SCU) built into the MPU subsystem maintains cache coherency between the two L1 data caches using the modified-exclusive-shared-invalid (MESI) coherency protocol.
The accelerator coherency port (ACP) of the SCU provides a means for other masters in the system, including logic implemented in the FPGA fabric, to perform cache coherent accesses. Accesses to the ACP are only unidirectional in terms of cache coherency meaning at the time of the access the data is up to date, but the SCU is not responsible for maintaining coherency of that data over time. For example, if a master in the FPGA reads data from the ACP and then a processor updates that same data in memory, then the FPGA no longer contains the most up to date copy of the data.
Performance explorations of accelerators using ACP show that as size of packets transferred by AXI master via the ACP port increases the accelerator performance increases but only up to a point. After that point it become no longer possible to cache the entire data packet as such the accelerator performance undergoes performance degradation.
GUIDELINE: Use ACP for managing coherency for small data size accesses, manage coherency for large data in software.
The AXI protocol allows masters to issue cacheable accesses whereas the Avalon-MM protocol does not support this feature. In order for a master in the FPGA to perform a cacheable access, the FPGA master must adhere to the AXI protocol and be capable of performing cacheable accesses (ARCACHE/AWCACHE set to 1 and ARUSER / AWUSER set to 1)
The L2 cache performs error detection and correction in groups of 64 bits without the use of byte enables.
GUIDELINE: As a result accesses to the ACP must be 64-bit aligned as well as be full 64-bit accesses with no byte lanes disabled on write accesses. The main L3 switch and the ACP port are both 64 bits wide, so it is only necessary to provide 64-bit aligned cache coherent accesses that are 64 bits wide after resizing.
Data resizing can occur in the L3 interconnect between requesting master and the ACP. As a result, a 32-bit access can be compatible with the L2 cache ECC logic if the access is aligned to 8-byte boundaries and the master performs bursts of size 2, 4, 8, or 16. Data resizing can also occur within the FPGA-to-HPS bridge.
GUIDELINE: The simplest way to ensure that accesses from the FPGA will meet the L2 cache ECC requirements is to implement 64-bit masters in the FPGA fabric and configure the FPGA-to-HPS bridge to expose a 64-bit slave port. This will ensure that no resizing of AXI transactions will be necessary. Full 64-bit accesses have to be made by the logic in the FPGA as well.
The Altera® Complete Design Suite (ACDS) contains many IP and system-level debug tools used in FPGA hardware designs.
- SignalTap II - On-chip logic analyzer constructed from FPGA resources
- Bus functional models
- Avalon-MM v2 protocol
- AXI v3 protocol
- System console - Services-based API for controlling soft logic and moving data to/from the FPGA
- IP Creation in RTL
- Testbench and BFM verification of the IP
- In silicon testing of the IP using system console to drive stimuli into memory-mapped or streaming interface
- In silicon testing of the IP using low level software run on the processor in the HPS
In the case of Signal Tap and system console, if both use the FPGA JTAG interface to communicate data then they can be used simultaneously. For example, you may instrument a trigger condition in Signal Tap and cause the trigger condition to occur via the JTAG-to-Avalon bridge IP controlled by System console. These tools are also capable of being used simultaneously with the HPS tools that communicate over JTAG.
There are two JTAG interfaces on the Cyclone® V/ Arria® V SoC device. The first interface is connected to the FPGA side of the device, while the second interface is connected to the HPS debug access port (DAP).
On initial power on, the Cyclone® V/ Arria® V HPS Boot ROM samples the BSEL pins to determine which boot flash interface to use. Then, the Boot ROM programs only the default I/O used for the boot flash. All other I/O is left in tristate.
GUIDELINE: Ensure that you assign your boot flash device to the correct I/O assignments based on the device Pin Out file.
For design considerations and recommendations on power consumption and thermal analysis, SoC device pin connections, supply design and decoupling, refer to the Arria® V and Cyclone® V Device Design Guidelines.
The following sections are supplemental for SoC devices.
Follow the guidelines in the Early Power Estimation website for using the PowerPlay Early Power Estimation (EPE) spreadsheets.
In addition, consider the following guidelines for Cyclone® V/ Arria® V SoC devices when using the EPE spreadsheet.
GUIDELINE: Select “Maximum” for the Power Characteristics setting.
When estimating power consumption for the purposes of designing an adequate power supply that can meet the maximum power requirements across process, voltage and temperature (PVT), use the device maximum power characteristics.
GUIDELINE: Add HPS peripherals assigned to FPGA I/O.
This tab is where you describe the various configurations of I/O Elements (IOEs) in your application. Use the IO-IP tab to describe the controller IP behind each set of I/O.
For HPS peripherals assigned to FPGA I/O, add rows to the spreadsheet as necessary to describe the different HPS peripheral I/O characteristics in your design.
GUIDELINE: Select the Frequency, Application, and if applicable, the Application Mode for each CPU in the HPS tab of the spreadsheet.
The Application/Application Mode settings for each CPU allow you to select from a list of industry standard benchmarks to model CPU utilization in your application. You can also select “Custom” for defining a unique set of CPU utilization parameters across the ALUs and cache memories.
GUIDELINE: Update the sheet with HPS SDRAM Type, Frequency and Width.
Note that the selection of SDRAM type also updates the I/O voltage for Bank 6A to 6B.
GUIDELINE: Update the sheet with HPS I/O Bank Voltage and Peripheral Usage
Before you select Peripheral voltage from the drop down list, ensure that you have at least one HPS I/O Bank that is configured to the same voltage.
GUIDELINE: Use a separate programmable regulator for FPGA supply in order to support the ability to power down the FPGA while keeping the HPS running.
Cyclone® V/ Arria® V SoC devices offer the ability to power down the FPGA while keeping the HPS running. In order to do this, the FPGA VCC must be sourced from a programmable regulator that supports a control interface such as I2C. The Cyclone® V SoC Development Kit is an example of a development board that supports this feature.
You can refer to Cyclone® V SoC Smart Configuration design example to understand how to control the FPGA power supply regulator using the I2C connection from the HPS.
Cyclone® V / Arria® V SoC devices support a HPS boot clock from 10-50 MHz in PLL bypass mode, and up to 400MHz in PLL Locked mode. During power up or cold reset, the boot ROM samples the value of the CSEL pins and if needed, configure the HPS PLL to provide a faster boot clock frequency.
Refer to the table with CSEL Options and corresponding External Oscillator Frequency in the Booting and Configuration Chapter of the Cyclone® V or Arria® V HPS Technical Reference Manual.
Power-Up and Power-Down Sequencing
Cyclone® V/ Arria® V SoC devices have the following additional power rails to consider for power sequencing.
GUIDELINE: Consider ramp times for maximum transient currents on supplies when designing the Power Distribution Network (PDN).
When using the PDN Tool to calculate the required target impedance of your application’s PDN for the core fabric’s VCC supply, model the ramp time of the maximum transient current on VCC using the Core Clock Frequency and Current Ramp Up Period parameters. This procedure relaxes the target impedance requirements relative to the default step function analysis, resulting in a more efficient PDN with fewer decoupling capacitors.
Initial transient current estimates can be obtained from the EPE Spreadsheet, and more accurate analysis is possible with the PowerPlay Power Analysis Tool in Quartus Prime once the design is closer to completion.
Refer to AN 750: Using the Altera PDN Tool to Optimize Your Power Delivery Network Design.
Follow the guidelines in the Power Analysis and Optimization section of the Arria® V and Cyclone® V Device Design Guidelines. In addition, consider the following options for the HPS portion of the device.
Processor and memory clock speeds
The biggest contribution to power consumption from the HPS is the processor clock speed and the type, size and speed of the external SDRAM program memory. Careful selection of these system parameters to satisfy the functional and performance requirements of the application helps to minimize system power consumption.
MPU Standby Modes and Dynamic Clock Gating
CPU standby modes and dynamic clock gating logic can be utilized throughout the MPU subsystem. Each CPU can be placed in standby mode, Wait for Interrupt, or Wait for Event mode to further minimize power consumption.
For more information on standby modes, refer to the Cortex-A9 Processor Power Control section in the Cortex-A9 Technical Reference Manual. Power Optimization Examples are available at the SoC Design Examples web page.
Managing Peripheral Power
When configuring the HPS component in Qsys, enable only those peripherals your application will use. Configure the peripherals for the lowest clock speed while maintaining functional and performance requirements. Additional power can be saved under software control by placing inactive peripherals in reset and gating off their clock sources.
Managing Power by Shutting Down Supplies
Cyclone® V SoC and Arria® V SoC support the ability to power down the FPGA portion of the device, while keeping the HPS running. Refer to the Cyclone® V SoC Smart Configuration design example on how to control the FPGA power supply regulator using the I2C connection from the HPS.
GUIDELINE: Ensure that the HPS is powered up and held in reset before performing a boundary scan test of the FPGA and HPS I/O.
The HPS JTAG does not support boundary scan tests (BST). In order to perform boundary scan testing on HPS I/O pins, you will need to use the FPGA JTAG.
This section outlines the design guidelines for HPS Interfaces like EMAC PHY, USB, QSPI, SD/MMC, NAND Flash, UART, I2C and SPI.
- Reduced Gigabit Media Independent Interface (RGMII) using Shared I/O
- Media Independent Interface (MII) interface to FPGA fabric
- Gigabit Media Independent Interface (GMII) interface to FPGA fabric
Any combination of supported PHY interface types can be configured across multiple HPS EMAC instances. For RGMII using HPS Dedicated I/O, develop an early I/O floor-planning template design to ensure that there are enough HPS Dedicated I/O to accommodate the chosen PHY interfaces in addition to other HPS peripherals planned for HPS Dedicated I/O usage.
It is possible to adapt the MII/GMII PHY interfaces exposed to the FPGA fabric by the HPS component to other PHY interface standards such as RMII, RGMII, SGMII, MII and GMII through the use of soft adaptation logic in the FPGA and features in the general-purpose FPGA I/O and transceiver FPGA I/O.
- Desired Ethernet rate, available I/O and available transceivers
- PHY devices that offer the skew control feature
- Device Driver availability
The Cyclone® V/ Arria® V SoC Hard Processor System (HPS) can connect its embedded Ethernet MAC (EMAC) PHY interfaces directly to industry standard Gigabit Ethernet PHYs using the RGMII interface at any supported I/O voltage using the HPS Dedicated I/O pins. These voltages typically include 1.8V, 2.5V and 3.0V. If the HPS Dedicated I/O pins are used for the PHY interface, then no FPGA routing resources are used and timing is fixed, simplifying timing on the interface. This document describes the design guidelines for RGMII, the most typical interfaces.
You can also connect PHYs to the HPS EMACs through the FPGA fabric using the GMII and MII bus interfaces for Gigabit and 10/100 Mbps access respectively. A GMII-to-SGMII adapter is also available to automatically adapt to transceiver-based SGMII optical modules.
This section discusses design considerations for RGMII PHY interface through the HPS Dedicated I/O.
Reduced Gigabit Media Independent Interface (RGMII) (Reduced GMII) is the most common interface as it supports 10 Mbps, 100 Mbps, and 1000 Mbps connection speeds at the PHY layer. RGMII uses four-bit wide transmit and receive datapaths, each with its own source synchronous clock. All transmit data and control signals are source synchronous to TX_CLK, and all receive data and control signals are source synchronous to RX_CLK.
For all speed modes, TX_CLK is always sourced by the MAC, and RX_CLK is always sourced by the PHY. In 1000 Mbps mode, TX_CLK and RX_CLK are 125 MHz, and Dual Data Rate (DDR) signaling is used. In
10 Mbps and 100 Mbps modes, TX_CLK and RX_CLK are 2.5 MHz and 25 MHz, respectively, and rising edge Single Data Rate (SDR) signaling is used.
I/O Pin Timing
This section addresses RGMII interface timing from the perspective of meeting requirements in the 1000 Mbps mode. The interface timing margins are most demanding in 1000 Mbps mode, thus it is the only scenario we consider here.
At 125 MHz, the period is 8 ns, but because both edges are used, the effective period is only 4 ns. The TX and RX busses are completely separate and source synchronous, simplifying timing. The RGMII spec calls for CLK to be delayed from DATA at the receiver in either direction by a minimum 1.0 ns and a maximum 2.6 ns.
In other words, the TX_CLK from the MAC to the PHY must be delayed from the output to the PHY input and the RX_CLK from the PHY output to the MAC input. The signals are transmitted source synchronously within the +/-500 ps RGMII skew spec in each direction as measured at the output pins. The minimum delay needed in each direction is 1ns but it is recommended to target a delay of 1.5 ns to 2 ns to keep timing margin.
Transmit path setup/hold
Only setup and hold for TX_CLK to TX_CTL and TXD[3:0] matter for transmit. The Cyclone® V/ Arria® V HPS Dedicated I/O does not feature programmable delay.
For TX_CLK from the Cyclone® V/ Arria® V SoC, you must introduce the 1.0 ns PHY minimum input setup time in the RGMII spec. It is strongly recommended to increase this to delay to 1.5 ns to 2.0 ns. Many PHYs offer programmable skew, and some support RGMII 2.0 which defaults to skew enabled on both transmit and receive datapaths.
Between PHY delay and FPGA I/O delay features, you must ensure either 2 ns of delay to CLK versus CTL and D[3:0] or 1.2 ns typical minimum setup skew typical of most PHYs. Consult the datasheet for your PHY vendor for more details.
GUIDELINE: Ensure your design includes the necessary Quartus settings to configure the HPS EMAC outputs for the required delays.
On the Cyclone® V/ Arria® V SoC Development Kit and the associated Golden Hardware Reference Design (the GHRD is the hardware component of the GSRD) PHY skew is implemented with the Micrel PHY. Refer to the board .xml file and PHY driver code in the Golden System Reference Design (GSRD).
Receive path setup/hold
Only setup and hold for RX_CLK to RX_CTL and RXD[3:0] are necessary to consider for receive timings. For Cyclone® V/ Arria® V SoC HPS Dedicated I/O no other consideration on the PHY side or board trace delay is required.
GUIDELINE: Hardware developers should specify the required FPGA skew so that software developers can add the skew to the device driver code.
The aforementioned board .xml file will be used to compile the Linux device tree for the Cyclone® V / Arria® V SoC GSRD.
Using FPGA I/O for an HPS EMAC PHY interface can be helpful when there is not enough HPS Dedicated I/O left to accommodate the PHY interface or when you want to adapt to a PHY interface not natively supported by the HPS EMAC.
GUIDELINE: Specify the PHY interface transmit clock frequency when configuring the HPS component within Qsys.
For either GMII or MII, including adapting to other PHY interfaces, specify the maximum transmit path clock frequency for the HPS EMAC PHY interface: 125 MHz for GMII, 25 MHz for MII. This configuration results in the proper clock timing constraints being applied to the PHY interface transmit clock upon Qsys system generation.
MII and GMII are only available in Cyclone® V/ Arria® V SoC by driving the EMAC signals into the FPGA core routing logic and then ultimately to FPGA I/O pins or to internal registers in the FPGA core.
GUIDELINE: Apply timing constraints and verify timing with TimeQuest.
Because routing delays can vary widely in the FPGA core and I/O structures, it is important to read the timing reports, and especially for GMII, create timing constraints. GMII has a 125 MHz clock and is single data rate unlike RGMII. GMII does not have the same considerations for CLK-to-DATA skew though; its signals are automatically centered by design by being launched with the negative edge and captured with the rising edge.
GUIDELINE: Register interface I/O at the FPGA I/O boundary.
With core and I/O delays easily exceeding 8ns, it is recommended to register these buses in each direction in I/O Element (IOE) registers, so they remain aligned as they travel across the core FPGA logic fabric. On the transmit data and control, maintain the clock-to-data/control relationship by latching these signals on the falling edge of the emac[0,1,2]_gtx_clk output from the HPS EMAC. Latch the receive data and control at the FPGA I/O inputs on the rising edge of the RX_CLK sourced by the PHY.
GUIDELINE: Consider transmit timing in MII mode.
MII is 25 MHz when the PHY is in 100 Mbps mode and 2.5 MHz when the PHY is in 10 Mbps mode, so the shortest period is 40 ns. The PHY sources the clock for both transmit and receive directions. Because the transmit timing is relative to the TX_CLK clock provided by the PHY, the turnaround time may be of concern, but this is usually not an issue due to the long 40 ns period.
Keep in mind this is going through the FPGA, then out for the data – the round-trip delay must be less than 25 ns as there is a 15 ns input setup time. Note that the transmit data and control are launched into the FPGA fabric by the HPS EMAC transmit path logic on the negative edge of the PHY-sourced TX_CLK, which removes 20 ns of the 40 ns clock-to-setup timing budget.
With the round trip clock path delay on the data arrival timing incurring PHY-to-SoC board propagation delay plus the internal path delay from the SoC pin to and through the HPS EMAC transmit clock mux taking away from the remaining 20 ns setup timing budget, it may be necessary to retime the transmit data and control to the rising edge of the phy_txclk_o clock output registers in the FPGA fabric for MII mode transmit data and control.
GUIDELINE: Use the GMII-to-RGMII Adapter IP available in Qsys.
Configure the HPS component in Qsys for an EMAC as “FPGA” I/O instance. Do not export the resulting HPS component GMII signals in Qsys. Instead, add the Altera® GMII to RGMII Adapter IP to the Qsys subsystem and connect to the HPS component’s GMII signals. The GMII to RGMII Adapter IP makes use of the Altera® HPS EMAC Interface Splitter IP in Qsys to split out the “emac” conduit from the HPS component for use by the GMII to RGMII Adapter. See the Embedded Peripherals IP User Guide for information on how to use the Altera® GMII-to-RGMII Adapter IP.
GUIDELINE: Provide a glitch-free clock source for the 10/100 Mbps modes.
In an RGMII PHY interface, the TX_CLK is always sourced by the MAC, but the HPS component’s GMII interface expects TX_CLK to be provided by the PHY device in 10/100 Mbps modes. The GMII to RGMII adaptation logic must provide the 2.5/25 MHz TX_CLK on the GMII’s emac[0,1]_tx_clk_in input port, and the switch between 2.5 MHz and 25 MHz must be accomplished in a glitch-free manner as required by the HPS EMAC. An FPGA PLL can be used to provide the 2.5 MHz and 25 MHz TX_CLK along with an ALTCLKCTRL block to select between counter outputs glitch-free.
It is possible to adapt the MII HPS EMAC PHY signals to an RMII PHY interface at the FPGA I/O pins using logic in the FPGA.
GUIDELINE: Provide a 50MHz REF_CLK source.
An RMII PHY uses a single 50 MHz reference clock (REF_CLK) for both transmit and receive data and control. Provide the 50 MHz REF_CLK either with a board-level clock source, a generated clock from the FPGA fabric, or from a PHY capable of generating the REF_CLK.
GUIDELINE: Adapt the transmit and receive data and control paths.
The HPS EMAC PHY interface exposed in the FPGA fabric is MII, which requires separate transmit and receive clock inputs of 2.5 MHz and 25 MHz for 10 Mbps and 100 Mbps modes of operation, respectively. Both transmit and receive datapaths are 4-bits wide. The RMII PHY uses the 50 MHz REF_CLK for both its transmit and receive datapaths and at both 10 Mbps and 100 Mbps modes of operation. The RMII transmit and receive datapaths are 2-bits wide. At 10 Mbps, transmit and receive data and control are held stable for 10 clock cycles of the 50 MHz REF_CLK. You must provide adaptation logic in the FPGA fabric to adapt between the HPS EMAC MII and external RMII PHY interfaces: 4-bits @ 25 MHz/2.5 MHz to/from 2-bits@ 50 MHz, 10x oversampled in 10 Mbps mode.
GUIDELINE: Provide a glitch-free clock source on the HPS EMAC MII tx_clk_in clock input.
The HPS component’s MII interface requires a 2.5/25 MHz transmit clock on its emac[0,1,2]_tx_clk_in input port, and the switch between 2.5 MHz and 25 MHz must be done glitch free as required by the HPS EMAC. An FPGA PLL can be used to provide the 2.5 MHz and 25 MHz transmit clock along with an ALTCLKCTRL block to select between counter outputs glitch-free.
It is possible to adapt the GMII HPS EMAC PHY signals to an SGMII PHY interface at the FPGA transceiver I/O pins using logic in the FPGA and the multi-gigabit transceiver I/O. While it is possible to design custom logic for this adaptation, this section describes using Qsys adapter IP.
GUIDELINE: Use the GMII to SGMII Adapter IP available in Qsys.
Configure the HPS component in Qsys for an EMAC as “FPGA” I/O instance. Do not export the resulting HPS component GMII signals in Qsys. Instead, add the Altera® GMII to SGMII Adapter IP to the Qsys subsystem and connect to the HPS component’s GMII signals. The GMII to SGMII Adapter IP makes use of the Altera® HPS EMAC Interface Splitter IP in Qsys to split out the “emac” conduit from the HPS component for use by the GMII to SGMII Adapter. The adapter IP instantiates the Altera® Triple Speed Ethernet (TSE) MAC IP, configured in 1000 BASE-X/SGMII PCS PHY-only mode (i.e., no soft MAC component). See the Embedded Peripherals IP User Guide for information on how to use the Altera GMII to SGMII Adapter IP.
The MDIO PHY management bus has two signals per MAC: MDC and MDIO. MDC is the clock output, which is not free running. At 2.5 MHz, it has a 400 ns minimum period. MDIO is a bi-directional data signal with a High-Z bus turnaround period.
When the MAC writes to the PHY, the data is launched on the falling edge, meaning there is 200 ns -10 ns = 190 ns for flight time, signal settling, and setup at the receiver. Because data is not switched until the following negative edge, there is also 200 ns of hold time. These requirements are very easy to meet with almost any board topology. When the MAC reads from the PHY, the PHY is responsible for outputting the read data from 0 to 300 ns back to the MAC, leaving 100 ns less 10 ns setup time, or 90 ns for flight time, signal settling, and setup at the receiver. This requirement is also very easy to meet.
GUIDELINE: Implement pull-up resistor on board for MDC/MDIO.
Both signals require an external pull-up resistor, typically 1K but PHY data-sheets may vary.
GUIDELINE: Ensure interface timing is met.
There is a 10ns setup and hold requirement for MDIO for data with respect to MDC.
GUIDELINE: Use appropriate board-level termination on PHY outputs.
Not many PHYs offer I/O tuning for their outputs to the Cyclone® V/ Arria® V SoC, so it is wise to double check this signal path with a simulator. Place a series resistor on each signal near the PHY output pins to reduce the reflections if necessary.
GUIDELINE: Ensure reflections at PHY TX_CLK and EMAC RX_CLK inputs are minim ized to prevent double-clocking.
Be cognizant if the connection is routed as a “T” as signal integrity must be maintained such that no double-edges are seen at REF_CLK loads. Ensure reflections at REF_CLK loads are minimized to prevent double-clocking.
GUIDELINE: Use a Signal Integrity (SI) simulation tool.
It is fairly simple to run SI simulations on these unidirectional signals. These signals are almost always point-to-point, so simply determining an appropriate series resistor to place on each signal is usually enough. Many times, this resistor is not necessary, but the device drive strength and trace lengths as well as topology should be studied when making this determination.
The Cyclone® V/ Arria® V SoC Hard Processor system can connect its embedded USB MACs directly to industry-standard USB 2.0 ULPI PHYs using the HPS Dedicated I/O that support 1.8V, 2.5V, 3.0V and 3.3V I/O standards. No FPGA routing resources are used and timing is fixed, which simplifies design. This guide describes the design guidelines covering all supported speeds of PHY operation: High-Speed (HS) 480 Mbps, Full-Speed (FS) 12 Mbps, and Low-Speed (LS) 1.5 Mbps.
GUIDELINE: We recommend you design the board to support both USB PHY modes where the device supplies the clock versus where an external clock is the source.
The interface between the ULPI MAC and PHY on the Cyclone® V/ Arria® V SoC consists of DATA[7:0], DIR and NXT from the MAC to the PHY and STP from the MAC to the PHY. Lastly a static clock of 60 MHz is driven from the PHY and is required for operation, including some register accesses from the HPS to the USB MAC. Ensure the PHY manufacturer recommendations for RESET and power-up are followed.
GUIDELINE: Ensure that the USB signal trace lengths are minimized.
At 60 MHz, the period is 16.67 ns and in that time, for example, the clock must travel from the external PHY to the MAC and then the data and control signals must travel from the MAC to the PHY. Because there is a round-trip delay, the maximum length of the CLK and ULPI signals are important. Based on timing data the maximum length is recommended to be less than 7 inches. This is based on a PHY with a 5 ns Tco specification. If the specification is slower the total length must be shortened accordingly.
GUIDELINE: Ensure that signal integrity is considered.
Signal integrity is also important but mostly on the CLK signal driven from the PHY to the MAC in the HPS subsystem. Because these signals are point-to-point with a maximum length, they can usually run unterminated but it is recommended to simulate the traces to make sure the reflections are minimized. Using the 50-ohm output setting from the FPGA is typically recommended unless the simulations show otherwise. A similar setting should be used from the PHY vendor if possible.
GUIDELINE: Design properly for OTG operation.
When On-the-Go (OTG) functionality is used, the SoC can become a host or endpoint. When in host mode consider the power delivery, such as when you are supporting a USB Flash drive, or potentially a USB Hard Drive. These power requirements and reverse currents must be accounted for, typically through the use of external diodes and current limiters such as those used on the Cyclone® V SoC or Arria® V SoC development kits.
Up to four QSPI chip selects can be used with Cyclone® V/ Arria® V SoC. The device can boot only from QSPI connected to the chip select zero.
GUIDELINE: Ensure that the QSPI_SS signals are used in numerical order.
Quartus® Prime assumes that the QSPI_SS signals are used in order. It is not possible to use SS0 and SS2, for example, without using SS1.
GUIDELINE: Include a voltage translator if you plan on support the SD 1.8V feature. A translator is necessary because the HPS I/O cannot change voltage levels dynamically like the SD card.
SD cards initially operate at 3.3V, and some cards can switch to 1.8V after initialization. In addition, some MMC cards can operate at both 1.8V as well as 3.3V. Because the BSEL values are constant during the boot process, transceivers are required to support level-shifting and isolation for cards that can operate at 1.8 V.
Follow the guidelines in the Voltage Switching section of the SD/MMC Controller chapter for Cyclone® V SoC and Arria® V SoC. Some MMC cards can operate with only 1.8V I/O operation and initial operation at 3.3V is not required. In this situation, a level shifter is not needed.
HPS I/O Bank Voltage
SD Card Voltage
Level Shifter Needed
GUIDELINE: Ensure that timing is considered for initial ID mode and data transfer mode as well asnormal operation.
SD cards initially operate at 400 KHz maximum when they are going through the ID process. After that there is a data transfer mode, during which the clock can operate up to 12.5 MHz. In normal operation, the clock can operate up to 50 MHz. The Boot ROM takes care to ensure that clocking is properly configured during ID and transfer modes.
Refer to the CSEL Settings for the SD/MMC Controller table in the Booting and Configuration appendix for Cyclone® V SoC and Arria® V .
GUIDELINE: Ensure that the selected NAND flash device is an 8-bit ONFI 1.0 (or later) compliant device.
- The external flash device to be 8-bit ONFI 1.0 compliant.
- Single-level cell (SLC) or multi-level cell (MLC)
- Page size: 512 bytes, 2 KB, 4 KB or 8 KB
- Pages per block: 32, 64, 128, 256, 384 or 512
- Error correction code (ECC) sector size can be programmed to 512 bytes (for 4, 8 or 16 bit correction) or 1024 bytes (24-bit correction)
You cannot export the NAND interface to FPGA.
GUIDELINE: Properly connect flow control signals when routing the UART signals through the FPGA fabric.
GUIDELINE: Instantiate the open-drain buffer when routing I2C signals through the FPGA fabric.
When routing I2C signals through the FPGA, note that the I2C pins from the HPS to the FPGA fabric (i2c*_out_data, i2c*_out_clk) are not open-drain and are logic level inverted. Thus, when you want to drive a logic level zero onto the I2C bus, these pins are high. This implementation is useful as they can be used to tie to an output enable of a tri-state buffer directly. You must use the altiobuf to implement the open-drain buffer.
GUIDELINE: Ensure that the pull-ups are added to the external SDA and SCL signals in the board design.
Because the I2C signals are open drain, pull-ups are required to make sure that the bus is pulled high when no device on the bus is pulling it low.
GUIDELINE: Consider routing SPI slave signals to FPGA fabric
Due to an erratum in the Cyclone® V/ Arria® V SoC device, the SPI output enable is not connected to the SPI HPS pins. As a result, the HPS SPIS_TXD pin cannot be tri-stated by setting the slv_oe bit (bit 10) in the ctrlr0 register to 1.
Routing the SPI Slave signals to FPGA exposes the output enable signal and allows you to connect it to an FPGA tri-state pin.
To successfully build your software development platform, it is recommended that you start with a baseline project; a known good configuration of an HPS system, and then modify the baseline project to suit your end application.
The following diagram presents the recommended procedure to follow in order to determine the software development platform components.
Summarizing, the follow consists of following steps:
- Select the desired device
- Use the Golden Hardware Reference Design (GHRD) as a hardware project starting point
- Select the Operating System: baremetal, Linux or Partner real time operating system
- Write and/or update end application and/or drivers
The GHRD is a Quartus® Prime project that contains a full HPS design for the Cyclone® V SoC / Arria® V SoC Development Kit. The GHRD has connections to a boot source, SDRAM memory and other peripherals on the development board.
For every new released version of SoC EDS, the GHRD is included in the SoC EDS tools. The GHRD is regression tested with every major release of the Altera® Complete Design Suite (ACDS) and includes the latest bug fixes for known hardware issues. As such, the GHRD serves as a known good configuration of a SoC FPGA hardware system.
The GHRD has a minimal set of peripherals in the FPGA fabric, because the HPS provides a substantial selection of peripherals. HPS-to-FPGA and FPGA-to-HPS interfaces are configured to a 64-bit data width.
GUIDELINE: It is recommended that you use the latest GHRD as a baseline for new SoC FPGA hardware projects. You may then modify the design to suit your application ends.
- GSRD for Linux page for the latest version which is best known configuration
- <SoC EDS installation directory>\examples\hardware\cv_soc_devkit_ghrd - for the version supported by the corresponding SoC EDS version, used as a basis for the provided HWLIBS design examples in SoC EDS.
There are a number of operating systems that support the Cyclone® V SoC / Arria® V SoC, including Linux OS. For more information on Altera® ’s SoC Partner OS ecosystem, refer to the SoC Partner Ecosystem link below.
Partner OS providers offer board support packages and commercial support for the SoC FPGA devices. The Linux community also offers board support packages and community support for the SoC FPGA device.
There are many factors that go into the selection of an operation system for SoC FPGAs including the features of the operating system, licensing terms, collaborative software projects and framework based on the OS, available device drivers and reference software, in-house legacy code and familiarity with the OS, real time requirements of your system, functional safety and other certifications required for your application.
To select an appropriate OS for your application, it is recommended that you familiarize yourself with the features and support services offered by the commercial and open source operating systems available for the SoC FPGA. Altera® ’s OS partners, industry websites are a good source of information you can use to help make your selection.
There are a number of misconceptions when it comes to real time performance of operating systems versus bare metal applications. For a Cortex A-class of processor there are a number of features that real time operating systems provide that make efficient use of the processor’s resources in addition to the facilities provided to manage the run-time application. You may find that these efficiencies result in sufficient real time performance for your application, enabling you to inherit a large body of available device drivers, middleware packages, software applications and support services. It is important to take this account when selecting an operating system.
The HPS can be used in a bare-metal configuration (without an OS) and Altera offers the HWLIBs (Hardware Libraries) that consist of both high-level APIs, and low level macros for most of the HPS peripherals.
However, to use a bare metal application for the HPS, you must be familiar with developing run time capabilities to ensure that your bare metal application makes efficient use of resources available in your CPU subsystem.
- A typical bare-metal application uses only a single core, you must develop run time capabilities to manage process between both cores and the cache subsystem if you want to fully utilize the CPU subsystem.
- As your application increases in complexity you may need to build capabilities to manage and schedule processes, handle inter-process communication and synchronize between events within your application.
To this end, even a small lightweight RTOS offers simple scheduling, inter-process communication and interrupt handling capabilities that will make efficient use of the resources in your CPU subsystem.
The Dual Core ARM Cortex-A9 MPCore in the Cyclone® V / Arria® V HPS can support both Symmetrical Multi-processing (SMP) and Asymmetrical Multi-processing (AMP) configuration modes.
In SMP mode, a single OS instance controls both cores. The SMP configuration is supported by a wide variety of OS manufacturers and is the most common and straightforward configuration mode for multiprocessing.
Commercially developed operating systems offer features that take full advantage of the CPU cores resources and use them in an efficient manner resulting in optimum performance and ease of use. For instance, SMP enabled operating systems offer the option of setting processor affinity. This means that each task/thread can be assigned to run on a specific core. This feature allows the software developer to better control the workload distribution for each Cortex-A9 core and making the system more responsive as an alternative to AMP.
GUIDELINE: Familiarize yourself with the performance and optimizations available in commercial operating systems to see if an SMP-enabled OS or RTOS meets your performance and real time requirements.
In the AMP (Asymmetrical Multi-Processing) configuration, two different operating systems or two instances of a single operating system run on the two cores. Because the two instances of the operating systems have no inherent knowledge of how they share CPU resources, there are several complexities that need to be taken into account in order to ensure the applications make efficient use of the resources available in your CPU subsystem.
This section presents design guidelines to be used when you have selected Linux as the OS for your end application.
- GHRD (Golden Hardware Reference Design) - A Quartus Prime project
- Reference U-Boot based Bootloader
- Reference Linux BSP
- Sample Linux Applications
The GSRD for Linux is a well-tested known good design showcasing a system using both HPS and FPGA resources, intended to be used as a baseline project.
GUIDELINE: To successfully build your software development platform, it is recommended that you use the GSRD as a baseline project, then modify it to suit your application needs.
The GSRDs target the Altera® SoC Development Boards and are provided both in source and pre-compiled form. They can be obtained from the GSRD User Manuals link given below.
GUIDELINE: It is recommended that all new projects use the latest version of GSRD as a baseline.
The figure below presents a high level view of the development flow for projects based on the GSRD. Refer to the GSRD User Manuals link given below for more details.
The figure below presents a detailed build flow for the GSRD. Refer to the GSRD User Manuals link given below for more details.
The above build flow is the one used for the GSRD for Linux but it can be tweaked to match the individual needs of each project. For example:
- Linux kernel could be built separately without using Yocto Bitbake.
- Linux filesystem could be built separately without using Yocto Project.
- Linux Device Tree could be managed without using the Device Tree Generator. For example, it can be manually edited.
The Linux Device Tree is a data structure that describes the underlying hardware to the Linux operating system kernel. By passing this data structure the OS kernel, a single OS binary may be able to support many variations of hardware. This flexibility is particularly important when the hardware includes an FPGA.
- Start with the SoC FPGA reference Device Trees provided in the Linux kernel source code that targets the Altera SoC Development Kits. They cover the HPS portion of the device but do not cover the FPGA portion which changes on a per-project basis. SD/MMC, QSPI and NAND versions are provided with the kernel source code.
- Edit the Device Tree as necessary to accommodate any board changes as compared to the Altera SoC Development Kit.
- Edit the Device Tree as necessary to accommodate the Linux drivers targeting FPGA Soft IP.
Refer to the DeviceTree Generator User Guide link given below for more details about the Linux Device Tree Generator.
Altera® hardware libraries (HWLibs) are low level bare metal software libraries provided with SoC EDS and various components of the HPS. The HWLibs are also typically used by Altera’s OS partners to build board support packages for operating systems.
- SoC Abstraction Layer (SoCAL): Symbolic register abstraction later that enables direct access and control of HPS device registers within the address space.
- Hardware Manager (HWMgr): APIs that provide more complex functionality and drivers for higher level use case scenarios.
Note that not all hardware is covered by SoCAL and HWMgr, therefore writing custom code maybe necessary depending on application. Software applications that use HWlibs should have run time provisions to manage the resources of the CPU subsystem, the cache and memory. These provisions are typically what the operating systems provide.
GUIDELINE: It is recommended using HWLibs only if you are familiar with developing a run time provision to manage your application.
GUIDELINE: Use the HWLIBS examples from <SoC EDS installation folder>/embedded/examples/software/ as a starting point for your bare-metal development.
Partner OS providers offer board support packages and commercial support for the SoC FPGA devices. Typically the support includes example getting started projects and associated documentation.
- Real Time Operating System or Baremetal Application
The BootROM and Preloader stages are needed for all Cyclone® V SoC / Arria® V SoC applications. U-boot and Linux are used by the GSRD, but a custom application may implement a different flow, such as using the Preloader to load a baremetal application directly.
- Perform additional HPS initialization
- Bring up SDRAM
- Load the next boot stage from Flash to SDRAM and jump to it
- SPL - part of U-Boot. Provided with SoC EDS under GPL (Open Source) License
- MPL - provided with SoC EDS as an example using the HWLibs (Altera bare-metal libraries). Uses BSD license.
The Bootloader has typical responsibilities that are similar with the Preloader, except it does not need to bring up SDRAM. Because the Bootloader is already residing in SDRAM, it is not limited by the size of the OCRAM. Therefore, it can provide a lot of features, such as network stack support.
- RBF File(s) - containing register settings for SDRAM also dedicated I/O and FPGA pin configuration.
- U-Boot source code - for rest of the settings
GUIDELINE: Decide which software development tools (compiler, assembler, linker archiver etc.) will be used. Identify what version of the tools will be used.
- ARMCC Bare-metal Compiler
- Mentor Graphics CodeSourcery Lite GCC-based bare-metal Compiler
- Linux Linaro Compiler
There are also other development tools offerings from third party providers.
GUIDELINE: Decide which software debug tools will be used. The ARM DS-5 Altera edition includes a fully featured Eclipse-based debugging environment. There are also other debugging tools offerings from third party providers such as Lauterbach T32.
- An embedded USB-Blaster II chip could be available on-board such as on the Cyclone® V SoC / Arria® V SoC Development Kit.
- External JTAG hardware may be required when using the Lauterbach T32 tools.
- Non-real-time: by storing trace data in system memory (e.g. SDRAM) or the embedded trace buffer, then stopping the system, downloading the trace information and analyzing it.
- Real-time: by using an external adapter trace data from the trace port. The target board needs to support this scenario.
Typically, the debug tools also offer tracing of the embedded software program execution, but external hardware may be required. For example, the DS-5 provided with the SoC EDS supports both non-real-time and real time tracing. When used for real-time tracing, an additional external trace unit called “DSTREAM” is required. Lauterbach T32 is a similar example, in that it needs additional external hardware for real-time tracing.
This section describes the considerations that are useful for board bring up.
During initial stages of bringup, if a JTAG connection cannot be established to the target, it may be beneficial to set BSEL to 0x0 “Reserved” setting to prevent the BootROM from trying to boot from a specific boot source. Then a test program could be downloaded and ran with a debugger.
GUIDELINE: Determine which boot source is to be supported.
- SD/MMC Flash
- QSPI Flash
- NAND Flash
- FPGA Fabric
- SD cards are cheap, universally available, and have large storage capacities. Industrial versions available, with improved reliability. They are managed NAND flash, so wear leveling and bad block management are performed internally.
- eMMC devices have smaller packages, are available in large capacities, and can be more reliable than SD. They are not removable, with can be a plus, allowing a more rugged operation.
- QSPI devices are very reliable, typically with a minimum 100,000 cycles of erase cycles per sector. However they have a reduced capacity as compared to the other options. They are typically used as a boot source, but not as an application filesystem.
- NAND devices are available in large sizes, but they are unmanaged NAND, which means that techniques such as wear leveling and bad block management need to be implemented in software.
- FPGA boot allows HPS to boot without the need of an external Flash device. The FPGA boot memory can be synthesized our of FPGA resources (typically pre-initialized embedded memory blocks) or can be memory connected to the FPGA such as an external SRAM or SDRAM. In order to boot from FPGA, the FPGA must be configured using a traditional configuration mechanism.
GUIDELINE: Select the boot flash device.
- Will the flash device work with the HPS BootROM ?: The HPS can only boot from flash devices supported in the BootROM.
- Is the device verified to work and supported by software like Preloader, U-Boot and Linux ?: For supported devices, Altera® provides the Preloader, U-Boot and Linux software. For other devices, this software must be developed by the user.
- Is the flash device supported by the HPS Flash Programmer?: The HPS Flash Programmer enables writing to flash using a JTAG connection, primarily to program the initial pre-loader/bootloader image. If the device is not supported by the HPS Programmer, other flash programming methods may be used, such as using the HPS to program flash. For example, the flash programming capabilities of U-Boot can be used.
Refer to the Supported Flash Devices for Cyclone® V SoC / Arria® V SoC webpage for more information.
GUIDELINE: Configure the BSEL pins for the selected boot source.
The boot source is selected by means of BSEL pins.
It may be beneficial to change the boot source for debugging purposes, even if the board does not have available another boot source. For example on a board booting from QSPI it may be beneficial to select the reserved boot so that BootROM would not do anything. Or select boot from FPGA and put a test image in the FPGA fabric.
If the system allows it (space constraints etc.) plan to provide either switches or at least resistors to be able to change BSEL as needed.
GUIDELINE: Determine boot clock source.
- Value of external clock to the HPS (i.e. OSC1 clock)
- Boot flash source interface operation frequency
The possible combinations are described in the Cyclone® V HPS Technical Reference Manual / Arria® V HPS Technical Reference Manual and are selected with the CSEL pins. Note that CSEL pins are not used when booting from FPGA fabric.
GUIDELINE: Provide a method to configure CSEL options.
Note that for debugging purposes, it may be beneficial to allow setting of various CSEL values even if the end product will require just one CSEL setting. If possible, design the board in such a way that the CSEL configuration can be varied even if a single value will eventually be used. This configurability may be useful for debugging and could be done by resistors, jumpers or switches.
GUIDELINE: Ensure that the board is configured properly to support flash programming.
The HPS Flash Programmer is a tool provided with SoC EDS that can be used to program QSPI and NAND flash devices on Cyclone® V / Arria® V SoC boards. The tool is intended to write relatively small amounts of data (for example the preloader) since it works over JTAG and has a limited speed.
If the HPS Flash Programmer tool is to be used, confirm that it supports the device you are planning to use. The supported devices are listed in the SoC EDS User Guide: Altera® SoC Embedded Design Suite User Guide.
- Program Flash using a debugger (for example DS-5)
- Program Flash from U-Boot
- Program Flash from Linux (or other OS) console
- Program Flash by means of dedicated hardware
GUIDELINE: Select a NAND flash that is ONFI 1.0 compliant.
When booting from NAND, ensure that the selected device is ONFI 1.0 compliant.
The NAND device used for booting must also have a x86 interface, and only a single pair of ce# and rb# pins.
Although some non-ONFI 1.0 compliant devices are compatible with the BootROM, the HPS Flash Programmer only supports ONFI compliant devices.
GUIDELINE: Ensure that the QSPI and SD/MMC/eMMC devices have a mechanism to be reset when the HPS is reset.
The QSPI and SD/MMC/eMMC flash devices can potentially be put in a state by software where the BootROM cannot access them successfully, which may trigger a boot failure on the next reset. This problem can occur because the HPS is reset, but the flash part is not reset.
It is therefore required to reset the QSPI and SD/MMC/eMMC boot flash devices each time there is an HPS reset (warm or cold).
Note that some of the devices do not have a reset pin. In such a case you need to power cycle the flash using for example a MOSFET. Pay attention to minimum required reset pulse duration.
- Traditional FPGA configuration
- HPS-initiated FPGA configuration
HPS-initiated configuration uses fast passive parallel (FPP) mode allowing the HPS to configure the FPGA using storage locations accessible to the HPS such as QSPI, SD/MMC and NAND flash. The FPGA configuration flows for the Cyclone® V/ Arria® V SoC are the same for the Cyclone® V/ Arria® V FPGA devices where an external configuration data source is connected to the control block in the FPGA.
The traditional FPGA configuration flow is where the FPGA is configured by an external source such as JTAG, active serial or fast passive parallel.
When the device is powered and the HPS begin executing the software in the boot ROM, all the device I/O default to an input tri-state mode of operation. The boot ROM configures the dedicated boot I/O based on the sampled BSEL pins.
Refer to the following reference materials for additional information.
The SoC FPGAs support the following types of flash devices: QSPI, NAND, SD/MMC/eMMC.
ECC is implemented throughout the entire HPS subsystem on all RAMs, including the external HPS EMIF, L2 cache data RAMs and all peripheral RAMs. The controller ECC employs standard Hamming logic to detect and correct single-bit errors and detect double-bit errors. Parity protection is provided for the Cortex-A9 MPCore L1 cache memories and L2 tag RAM. ECC can be selectively enabled on the HPS EMIF and internal HPS RAMs. Diagnostic test modes and error injection capability are available under software control. ECC is disabled by default upon power-up or cold reset.
The generated boot code configures, initializes and enables ECC according to user options selected during BSP generation. Custom firmware and bare metal application code access to the ECC features is facilitated with the Altera® -provided HWLIBS library, which provides a simple API for programming HPS software features.
For more information, refer to Chapter 7: Boot Tools User Guide and Chapter 8: Hardware Library of the Altera® SoC Embedded Design Suite User Guide.
Each RAM in the HPS subsystem has its own ECC controller with a unique set of features and requirements; however there are some general system integration design considerations.
The System Manager contains a set of ECC-related registers for system-level control and status for all the ECC controllers in the HPS subsystem. ECC-related interrupts are also managed through this set of registers.
Refer to System Manager chapter of the Cyclone® V HPS Technical Reference Manual / Arria® V HPS Technical Reference Manual
The L2 cache memory is ECC protected, while the tag RAMs are parity protected. L2 cache ECC is enabled through a control register in the System Manager.
GUIDELINE: The L1 and L2 cache must be configured as write-back and write-allocate for any cacheable memory region with ECC enabled.
For BSPs supported through the Altera® SoC ECDS, you can configure your BSP for ECC support with the bsp-editor utility. For baremetal firmware, refer to the Cyclone® V HPS Technical Reference Manual / Arria® V HPS Technical Reference Manual, Chapter 9 on Cortex-A9 Microprocessor Unit Subsystem, L2 Cache Controller Address Map for Cyclone® V/ Arria® V section.
GUIDELINE: Cache-coherent accesses through the L3 interconnect using the ACP must perform 64-bit wide, 64-bit aligned write accesses when ECC is enabled in the L2 Cache Controller
Enabling ECC does not affect the performance of the L2 cache, but accesses using the ACP must be 64-bit wide, 64-bit aligned in memory. This includes FPGA matters accessing the ACP over the FPGA-to-HPS Bridge. Refer to the Cyclone® V HPS Technical Reference Manual / Arria® V HPS Technical Reference Manual, Chapter 8 covering HPS-FPGA Bridges, FPGA-to-HPS Access to ACP section, Table 8-3 for a list of possible combinations of bridge width and FPGA master width, alignment and burst size and length.
All peripheral RAMs in the HPS subsystem are ECC protected. The NAND flash controller ECC hardware is not used when a read-modify-write operation is performed from the flash device’s page buffer. Software must update the ECC during such read-modify-write operations. For a read-modify-write operation to work with hardware ECC, the entire page must be read into system memory, modified, then written back to flash without relying on the flash device’s read-modify-write feature.
The NAND flash controller cannot do ECC validation during a copy-back command. The flash controller copies the ECC data but does not validate it during the copy operation.
To debug the HPS EMIF, you can change the settings in the preloader to enable runtime calibration report, debug level information and check the status of HPS SDRAM PLL.
To enable runtime calibration report, use your preferred editor to open the <project_folder>\software\spl_bsp\uboot-socfpga\board\altera\socfpga\sdram\sequencer_defines.h file and configure the RUNTIME_CAL_REPORT value to 1.
- Enable with Hardware Diagnostic Option in bsp-editor. Note:
Example driver is only available in
Quartus® Prime version 14.0 and
- PRBS31 Data pattern
- Write to random address => Read from random address
- Can select different coverage by changing parameter in spl.c
- Path for sdram_test.c: <project_folder>\software\spl_bsp\uboot-socfpga\arch\arm\cpu\armv7\socfpga\sdram_test.c
- Change the test_rand_address function
writel to write to HPS register
- writel(value, address);
readl to read from HPS register
- Read HPS PLL Status Register in
clock_manager.c and print out in
- Define global variable in clock_manager.c and “extern variable” in spl.c
- Unable to printout value in clock_manager.c as the UART has not been initialized yet
The HPS bridges can be enabled from the Preloader (SPL/MPL) or U-Boot and in some cases from Linux.
To enable the HPS FPGA2SDRAM bridge from the Preloader or U-Boot, follow the steps below:
- The Preloader checks the status of the FPGA and will automatically enable bridges configured in the QSYS / BSP if the FPGA is configured.
- The Preloader supports programming the FPGA before running automatic bridge enable tests and code.
- The bridge_enable_handoff command can be run from the U-boot command prompt to enable bridges.
- This functions puts the HPS and SDRAM into a safe state before enabling all bridges after appropriate checks
- “run bridge_enable_handoff”
For more information, refer to the KDB solution: How can I enable the FPGA2SDRAM bridge on Cyclone® V SoC and Arria® V SoC Devices?
Technical support for operating system board support package (BSPs) that are ported to the Altera® SoC development kit is provided by the operating system provider.
Support for the Altera® SoC Embedded Development Suite (EDS) and the design tools for FPGA development is provided by Altera® . The EDS includes the ARM Development Studio 5 (DS-5) Altera® Edition Toolkit.
Support of the Altera® development kit is provided by Altera.
Technical Support for other boards is provided by the respective board provider or distributor.
Hardware libs (baremetal)
DS-5 Altera® Edition
FPGA Design Tools
Open Source / Linux
For additional information, please refer to these links given below.
Driven by the feature-updating nature of today’s open source embedded software, most of our software documentation is also hosted on community web.
For more information, please refer to the links given below.
|February 2017||2017.02.20||Initial Release|