Intel Stratix 10 Hard Processor System Technical Reference Manual
Intel Stratix 10 Hard Processor System Technical Reference Manual Revision History
Chapter | Date of Last Update |
---|---|
Table 2 | November 6, 2017 |
Table 3 | November 6, 2017 |
Table 4 | November 6, 2017 |
Table 5 | November 6, 2017 |
Table 6 | November 6, 2017 |
Table 7 | November 6, 2017 |
Table 8 | November 6, 2017 |
Table 9 | November 6, 2017 |
Table 10 | November 6, 2017 |
Table 11 | November 6, 2017 |
Table 12 | March 2, 2018 |
Table 13 | November 6, 2017 |
Table 14 | November 6, 2017 |
Table 15 | November 6, 2017 |
Table 16 | November 6, 2017 |
Table 17 | March 2, 2018 |
Table 18 | November 6, 2017 |
Table 19 | March 2, 2018 |
Table 20 | November 6, 2017 |
Table 21 | November 6, 2017 |
Table 22 | November 6, 2017 |
Table 23 | November 6, 2017 |
Table 24 | November 6, 2017 |
Table 25 | March 2, 2018 |
Table 26 | November 6, 2017 |
Version |
Changes |
---|---|
2017.11.06 | Added S10 Address Map and Register Definitions to the "Introduction to the Hard Processor System Address Map" section. |
2017.06.20 | Corrected FPGA-to-SDRAM data width in "Features of the HPS", "HPS-FPGA Memory-Mapped Interfaces" and "Stratix 10 HPS SDRAM L3 Interconnect" sections. The corrected data width is 32, 64, or 128 bits; not fixed 128 bits |
2017.05.08 | Maintenance release |
2016.10.28 |
|
2016.08.01 |
Initial release |
Version |
Changes |
---|---|
2017.11.06 | Added address map and register description links for the Cortex-A53 MPCore™ Processor in the Address Map and Register Descriptions section. |
2017.05.08 | Renamed " Arm® Cortex-A53 Timers" section to "Generic Timers" and renamed "Global Timer" section to "System Counter." Content in each section was updated. |
2016.10.28 |
|
2016.08.01 | Initial release |
Version | Changes |
---|---|
2017.11.06 |
|
2017.05.08 | Added the following sections
in the Cache Coherency:
|
2016.10.28 | Enhanced Cache Coherency System Diagram |
2016.08.01 | Initial Release |
Version |
Changes |
---|---|
2017.11.06 |
|
2017.05.08 |
|
2016.10.28 | Added the following sections:
|
2016.08.01 | Initial release |
Document Version |
Changes |
---|---|
2017.11.06 |
|
2017.05.08 | Added the following
information:
|
2016.10.31 | Maintenance release |
2016.08.01 | Initial beta release |
Document Version | Changes |
---|---|
2017.11.06 | Added address map and register description links for the HPS-FPGA bridges. |
2017.05.08 | Added:
|
2016.10.28 | Maintenance release |
2016.08.01 |
Initial release. |
Version |
Changes |
---|---|
2017.11.06 |
|
2017.05.08 | Added the Programming Model |
2016.10.28 | Added a top-level system diagram |
2016.08.01 | Initial release |
Version |
Changes |
---|---|
2017.11.06 | Added S10 Address Map and Register Definitions to the "On-Chip RAM Address Map and Register Definitions" section. |
2017.05.08 | Maintenance release |
2016.10.28 | Added information about exclusive access support |
2016.08.01 |
Initial release |
Version | Changes |
---|---|
2017.11.06 | Added address map and register description links for the Error Checking and Correction Controller in the Address Map and Register Descriptions section. |
2017.05.08 | Maintenance release |
2016.10.28 |
|
2016.08.01 | Initial Release |
Version | Changes |
---|---|
2017.11.06 |
|
2017.05.08 | New sections added: |
2016.10.28 | Maintenance release |
2016.08.01 | Initial release |
Document Version | Changes |
---|---|
2018.03.02 | Added the clarifying footnote for HPS_COLD_RESET and f2s_bridge_rst_n in Table: HPS Reset Domains and section Reset Signals respectively. |
2017.11.06 |
|
2017.05.08 | Maintenance release |
2016.10.28 | Maintenance release |
2016.08.01 | Initial release |
Version | Changes |
---|---|
2017.11.06 |
|
2017.05.08 | New topic added: Preloader Handoff Information |
2016.10.28 | Updated Figure 45 |
2016.08.01 | Initial release |
Document Version |
Changes |
---|---|
2017.11.06 |
|
2017.05.08 | Maintenance release |
2016.10.28 | Initial release |
Version | Changes |
---|---|
2017.11.06 | Added address map and register description links for NAND Flash Controller. |
2017.05.08 | Added the Programming Model. |
2016.10.28 |
|
2016.08.01 |
Initial release |
Version | Changes |
---|---|
2017.11.06 | Added address map and register description links for SD/MMC Controller . |
2017.05.08 | Added the Programming Model. |
2016.10.28 |
|
2016.08.01 |
Initial release |
Document Version | Changes |
---|---|
2018.03.02 | Added the missing step in section EMAC FPGA Interface Initialization. |
2017.11.06 | Added address map and register description links for Ethernet Media Access Controller. |
2017.05.08 | Maintenance release |
2016.10.28 | Maintenance release |
2016.08.01 | Initial release |
Version | Changes |
---|---|
2017.11.06 | Added address map and register description links for USB 2.0 OTG Controller. |
2017.05.08 | Maintenance release |
2016.10.28 | Sections added: |
2016.08.01 | Initial release |
Document Version | Changes |
---|---|
2018.03.02 | Corrected Figure: SSP Serial Format Continuous Transfer. |
2017.11.06 | Added address map and register description links for SPI Controller. |
2017.05.08 | Section added: |
2016.10.28 | Maintenance release |
2016.08.01 | Initial release |
Version | Changes |
---|---|
2017.11.06 | Added address map and register description links for I2C Contoller. |
2017.05.08 | Section added: |
2016.10.28 | Maintenance release |
2016.08.01 | Initial release |
Version | Changes |
---|---|
2017.11.06 | Added address map and register description links for UART Controller. |
2017.05.08 | Maintenance release |
2016.10.28 | Maintenance release |
2016.08.01 | Initial release |
Version | Changes |
---|---|
2017.11.06 | Added address map and register description links for General-Purpose I/O Interface. |
2017.05.08 | Maintenance release |
2016.10.28 | Maintenance release |
2016.08.01 | Initial release |
Version | Changes |
---|---|
2017.11.06 | Added address map and register description links for Timer. |
2017.05.08 | Maintenance release |
2016.10.28 | Maintenance release |
2016.08.01 | Initial release |
Version | Changes |
---|---|
2017.11.06 | Added address map and register description links for Watchdog Timer. |
2017.05.08 | Updated sections: |
2016.10.28 | Maintenance release |
2016.08.01 | Initial release |
Version | Changes |
---|---|
2017.11.06 |
|
2017.05.08 | Added the Programming Model section. |
2016.10.28 |
|
2016.08.01 |
Initial release |
Version |
Changes |
---|---|
2017.11.06 |
|
2017.05.08 |
Initial release |
Introduction to the Hard Processor System
The Intel® Stratix® 10 system-on-a-chip (SoC) is composed of two distinct portions: a 64-bit quad core Arm® Cortex® -A53 hard processor system (HPS) and an FPGA. The HPS architecture integrates a wide set of peripherals that reduce board size and increase performance within a system.
- Dedicated I/O interfaces
- FPGA fabric interfaces
- FPGA secure device manager (SDM) interfaces
- Quad core Arm® Cortex® -A53 MPCore™ processor
- Level 3 (L3) interconnect
- Cache Coherency Unit (CCU)
- System Memory Management Unit (SMMU)
- SDRAM L3 Interconnect, consisting of an SDRAM scheduler and an SDRAM adapter
- DMA Controller
- On-chip RAM
- Debug components
- PLLs
- Flash memory controllers
- Support peripherals
- Interface peripherals
The HPS incorporates third-party intellectual property (IP) from several vendors.
- FPGA fabric
- PLLs
- User I/O
- Hard memory controllers
- Secure Device Manager (SDM)
The HPS and FPGA portions of the device each have their own pins. The HPS has dedicated I/O pins. You can also route most of the HPS peripherals into the FPGA fabric to use the FPGA I/O. You can configure pin placement assignments when you instantiate the HPS component in Intel® Platform Designer System Integration Tool.
- FPGA configures first and then optionally boots the HPS (also called FPGA Configuration First).
- HPS boots first and then configures the FPGA (also called HPS Boot First or Early I/O Configuration).
In HPS Boot first mode, the SDM boots the HPS and configures the HPS and SDM I/O. The HPS loads the FPGA core image from HPS memory.
For more information, refer to the "Boot and Configuration" appendix.
Features of the HPS
The main modules of the HPS are:
- Quad-core Arm® Cortex-A53 MPCore processor
- Cache Coherency Unit (CCU)
- System Memory Management Unit (SMMU)
- System interconnect that includes:
- Three memory-mapped interfaces between the HPS and
FPGA:
- HPS-to-FPGA bridge: 32-, 64-, or 128-bit wide Arm® Advanced Microcontroller Bus Architecture ( AMBA® ) Advanced eXtensible Interface ( AXI® )-4
- Lightweight HPS-to-FPGA bridge: 32-bit wide AXI® -4
- FPGA-to-HPS bridge: 128-bit wide AXI Coherency Extensions-Lite (ACE-Lite)
- Three memory-mapped FPGA-to-SDRAM AXI® -4 interfaces, 32, 64, or 128 bits wide, allow the FPGA to directly share the HPS-connected SDRAM
- Three memory-mapped interfaces between the HPS and
FPGA:
- General-purpose direct memory access (DMA) controller
- 256 KB on-chip RAM
- Error checking and correction controllers for on-chip RAM and peripheral RAMs
- Clock manager
- Reset manager
- System manager
- Dedicated I/O pin multiplexer (MUX)
- NAND flash controller
- Secure digital/multimedia card (SD/MMC) controller
- Three Ethernet media access controllers (EMACs)
- Two USB 2.0 on-the-go (OTG) controllers
- Two serial peripheral interface (SPI) master controllers
- Two SPI slave controllers
- Five inter-integrated circuit (I2C)
controllers:
- Three can provide support for EMAC
- Two for general purpose
- Two UARTs
- Two general-purpose I/O (GPIO) interfaces with a total of 48 dedicated I/O
- Four system timers
- Four watchdog timers
-
Arm®
CoreSight™
debug components:
- Debug access port (DAP)
- Trace port interface unit (TPIU)
- System trace macrocell (STM)
- Embedded trace macrocell (ETM)
- Embedded trace router (ETR)
- Embedded cross trigger (ECT)
HPS Block Diagram and System Integration
HPS Block Diagram
Cortex-A53 MPCore Processor
The Cortex® -A53 MPCore™ supports high-performance applications and provides the capability for secure processing and virtualization. Each CPU in the processor has the following features:
- Support for 32- and 64-bit instruction sets
- In-order pipeline with symmetric dual-issue of most instructions
-
Arm®
NEON™
single instruction, multiple data (SIMD)
coprocessor with a floating point unit (FPU)
- Single- and double-precision IEEE-754 floating point math support
- Integer and polynomial math support
- Symmetric multiprocessing (SMP) and asymmetric multiprocessing (AMP) modes
- Arm® v8 Cryptography Extension
- Level 1 (L1) cache
- 32 KB two-way set associative instruction cache
- Single Error Detect (SED) and parity checking support for L1 instruction cache
- 32 KB four-way set associative data cache
- Error checking and correction (ECC), Single Error Correct, Double Error Detect (SECDED) protection for L1 data cache
- Memory Management Unit (MMU) that communicates with the system MMU (SMMU)
- Generic timer
- Governor module that controls clock and reset
- Debug modules
- Performance Monitor Unit
- Embedded Trace Macrocell (ETMv4)
- CoreSight cross trigger interface
The four CPUs share a 1 MB L2 cache with ECC, SECDED protection. A snoop control unit (SCU) maintains coherency between the CPUs and communicates with the system cache coherency unit (CCU).
At a system level, the Cortex® -A53 MPCore™ interfaces to a generic interrupt controller (GIC), CCU, and system memory management unit (SMMU).
Cache Coherency Unit
- Coherency directory to track the state of the 1 MB L2 cache
- Snooping support for tracking coherent lines and sending coherency transaction requests, including cache maintenance operations
- Support for distributed virtual memory (DVM) using the Arm® AXI Coherency Extensions (ACE) protocol. Distributed virtual memory broadcast messages are sent to the Cortex® -A53 MPCore™ and translation control unit (TCU) in the system memory management unit (SMMU)
- Quality-of-service (QoS) support for transaction prioritization using a weight bandwidth allocation
- Interconnect debug capability through master and slave bridge status registers
- Interrupt support for CCU transaction and counter events
System Memory Management Unit
The system MMU features include:
- A central TCU that supports five distributed TBUs for the following
masters:
- FPGA
- DMA
- EMAC0-2, collectively
- USB0-1, NAND, SD/MMC, ETR, collectively
- Secure Device Manager (SDM)
- Caches for storing page table entries and intermediate table walk data:
- 512-entry macro translation lookaside buffer ( TLB) page table entry cache in the TCU
- 128-entry micro TLB for table walk data in the FPGA TBU and 32-entry micro TLB for all other distributed TBUs
- Single-bit error detection and invalidation on error detection for caches
- Communication with the MMU of the Arm® Cortex® -A53 MPCore™
- System-wide address translation
- Address virtualization
- Support for 32 contexts
- Two stages of translation or combined (stage 1 and stage 2) translation
- Support for up to 49-bit virtual addresses and up to 48-bit physical and intermediate physical addresses
- Programmable QoS to support page table walk arbitration
- Fault handling, logging and interrupts for translation errors
- Debug support
HPS Interfaces
HPS–FPGA Memory-Mapped Interfaces
The HPS–FPGA memory-mapped interfaces provide the major communication channels between the HPS and the FPGA fabric. The HPS–FPGA memory-mapped interfaces include:
- FPGA–to–HPS bridge—a high–performance bus with a fixed data width of 128 bits, allowing the FPGA fabric to master transactions to the slaves in the HPS. This interface allows the FPGA fabric to have full visibility into the HPS address space. This interface supports single-direction I/O coherency with the HPS MPU.
- HPS–to–FPGA bridge—a high–performance interface with a configurable data width of 32, 64, or 128 bits, allowing the HPS to master transactions to slaves in the FPGA fabric.
- Lightweight HPS–to–FPGA bridge—an interface with a 32–bit fixed data width, allowing the HPS to master transactions to slaves in the FPGA fabric. This bridge is primarily used for control and status register accesses.
- FPGA-to-SDRAM port—three high–performance AXI-4 interfaces with data widths of 32, 64, or 128 bits, allowing the user logic in the FPGA to access SDRAM through the HPS SDRAM L3 Interconnect.
Other HPS Interfaces
- TPIU trace—sends trace data created in the HPS to the FPGA fabric.
- FPGA System Trace Macrocell (STM)—an interface that allows the FPGA fabric to send hardware events to be stored in the HPS trace data.
- FPGA cross–trigger—an interface that allows the CoreSight trigger system to send triggers to IP cores in the FPGA, and vise versa.
- DMA peripheral interface—multiple peripheral–request channels.
- Interrupts—allow soft IP cores to supply interrupts directly to the MPU interrupt controller.
- MPU standby and events—signals that notify the FPGA fabric that the MPU is in standby mode and signals that wake up Cortex–A53 processors from a wait for event (WFE) state.
- HPS debug interface – an interface that allows the HPS debug control domain (debug APB) to extend into FPGA.
System Interconnect
- NIUs connect to the master and slave interfaces throughout the NoC
- Datapath switches transport data across the network, from initiator NIUs to target NIUs
- Service network allows you to update master and slave peripheral security features and access NoC registers
The interconnect is divided into the L3 domain and L4 domain. The L3 interconnect is the high performance tier of the NoC, used to move high-bandwidth data between masters and slaves in the HPS. The L4 interconnect is a lower-performance tier of the NoC used to connect mid-to-low performance peripherals.
The interconnect is also connected to the Cache Coherency Unit (CCU). The CCU provides additional routing between the MPU, FPGA-to-HPS bridge, L3 interconnect, and SDRAM L3 interconnect.
In addition to providing routing connectivity and arbitration between masters and slaves in the HPS, the NoC features firewall security, QoS mechanisms, and observation probe points throughout the interconnect.
Stratix 10 HPS SDRAM L3 Interconnect
The SDRAM L3 interconnect connects the HPS to the hard memory controller (HMC) that is located in the FPGA portion of the device. The SDRAM L3 interconnect is composed of the SDRAM adapter and the SDRAM scheduler, which are secured by firewalls. It supports AMBA® AXI® QoS for the FPGA fabric interfaces.
The SDRAM L3 interconnect implements the following high-level features:
- Support for double data rate 4 (DDR4), DDR3, and low power double data rate 3 (LPDDR3) SDRAM devices
- Software-configurable priority scheduling per port
- 8-bit Single Error Correction, Double Error Detection (SECDED) ECC with write-back, and error counters
- Fully-programmable timing parameter support for all JEDEC®‑specified timing parameters
- All ports support memory protection and mutual-exclusive accesses
- FPGA-to-SDRAM interface—a configurable interface from the FPGA to the SDRAM scheduler, consisting of three ports
Stratix 10 HPS SDRAM Scheduler
Stratix 10 HPS SDRAM Adapter
On-Chip RAM
On-Chip RAM
The on-chip RAM offers the following features:
- 256 KB size
- 64-bit slave interface
- ECC support provides detection of single–bit and double–bit errors and correction for single-bit errors
- Memory scrambling on tamper events
Flash Memory Controllers
- NAND Flash Controller
- SD/MMC Controller
NAND Flash Controller
The NAND flash controller is based on the Cadence® Design IP® NAND Flash Memory Controller and offers the following functionality and features:
- Supports up to two chip selects
- Integrated descriptor-based direct memory access (DMA) controller
- Supports Open NAND Flash Interface (ONFI) 1.0
- Programmable page sizes of 512 bytes, 2 KB, 4 KB, or 8 KB
- Supports 32, 64, or 128 pages per block
- Programmable hardware ECC
- Supports 8- and 16-bit data width
SD/MMC Controller
The Secure Digital (SD), Multimedia Card (MMC), (SD/MMC) and CE-ATA host controller is based on the Synopsys® DesignWare® Mobile Storage Host controller and offers the following features:
- Supports eMMC
- Integrated descriptor-based DMA
- Supports CE-ATA digital protocol commands
- Supports only single card
- Single data rate (SDR) mode only
- Programmable card width: 1-, 4-, and 8-bit
- Programmable card types: SD, SDIO, or MMC
- Up to 64 KB programmable block size
- Supports up to 50 MHz flash operating frequency
Support Peripherals
Clock Manager
- Manages clocks for HPS
- Supports clock gating at the signal level
- Supports dynamic clock tuning
Reset Manager
The reset domains and sequences support several security features. The SDM brings the reset manager out of reset; and after that, the reset manager brings the rest of the HPS system out of reset. The reset manager performs the following functions:
- Manages resets for HPS
- Controls the HPS sequencing during resets
System Manager
The System Manager provides configuration of system-level functions that are required by other modules.
- Peripheral control registers
- ECC interrupt registers
- FPGA interface and general purpose configuration signals
- Boot scratch registers
- Combined ECC status and interrupts from different modules
- Memory-mapped control signals to other modules
- Watchdog stop functionality on debug request
- FPGA interface disable and enable control signals
- AXI/ AHB® control signals (hprot, awcache, arcache) to master ports of SD/MMC, NAND, USB and EMAC
Timers
- Free-running timer mode
- Supports a time-out period of up to 43 seconds when the timer clock frequency is 100 MHz
- Interrupt generation
Watchdog Timers
The HPS provides four watchdogs connected to the L4 busses in addition to the watchdogs built into the MPU. The four watchdog timers have a 32-bit timer resolution and are based on the Synopsys DesignWare APB Watchdog Timer peripheral.
A watchdog timer can be programmed to generate a reset request on a timeout. Alternatively, the watchdog can be programmed to assert an interrupt request on a timeout, and if the interrupt is not serviced by software before a second timeout occurs, generate a reset request.
DMA Controller
The DMA controller provides high-bandwidth data transfers for modules without integrated DMA controllers. The DMA controller is based on the ARM Corelink™ DMA Controller (DMA‑330) and offers the following features:
- Micro-coded to support flexible transfer types
- Memory-to-memory
- Memory-to-peripheral
- Peripheral-to-memory
- Scatter-gather
- Supports up to eight channels
- Supports up to 32 peripheral request interfaces
Error Checking and Correction Controller
ECC controllers provide single- and double-bit error memory protection for integrated on-chip RAM and peripheral RAMs within the HPS.
- USB OTG controllers
- SD/MMC controller
- EMAC controllers
- DMA controller
- NAND flash controller
- On-chip RAM
- Single-bit error detection and correction
- Double-bit error detection
- Interrupts generated on single- and double-bit errors
Interface Peripherals
EMACs
- IEEE 802.3-2008 compliant
- Supports 10, 100, and 1000 Mbps standard
- Supports full and half duplex modes
- IEEE 1588-2002 and 2008 precision networked clock synchronization
- IEEE 802.3-az, version D2.0 of Energy Efficient Ethernet (EEE)
- Supports IEEE 802.1Q Virtual local area network (VLAN) tag detection for reception frames
- VLAN insertion, replacement, or deletion
- Supports a variety of flexible address filtering modes
- Programmable frame length support for full jumbo frames up to 9.6 KB
- The Gigabit media independent interface/Media independent interface (GMII/MII) interface includes optional FIFO loopback to support debugging
- Network statistics with RMON/MIB counters (RFC2819/RFC2665)
- PHY interface support for Reduced Gigabit Media Independent Interface (RGMII) and Reduced Media Independent Interface (RMII) on HPS I/O pins
- PHY interface support for GMII and MII on FPGA I/O pins:
- Additional PHY interface support on FPGA I/O pins using adapter logic in the FPGA fabric to adapt the GMII/MII interface from the HPS to interfaces such as Serial Gigabit Media Independent Interface (SGMII), RGMII or RMII
- PHY Management control through Management data input/output (MDIO) interface or I2C interface
- Integrated DMA controller
USB Controllers
The HPS provides two USB 2.0 Hi-Speed On-the-Go (OTG) controllers from Synopsys DesignWare. The USB controller signals cannot be routed to the FPGA like those of other peripherals; instead they are routed to the dedicated I/O.
Each of the USB controllers offers the following features:
- Complies with the
following specifications:
- USB OTG Revision 1.3
- USB OTG Revision 2.0
- Embedded Host Supplement to the USB Revision 2.0 Specification
- Supports software-configurable modes of operation between OTG 1.3 and OTG 2.0
- Supports all USB 2.0
speeds:
- High speed (HS, 480-Mbps)
- Full speed (FS, 12-Mbps)
- Low speed (LS,
1.5-Mbps) Note: In host mode, all speeds are supported; however, in device mode, only high speed and full speed are supported.
- Local buffering with Error Correction Code (ECC) support Note:The USB 2.0 OTG controller does not support the following interface standards:
- Enhanced Host Controller Interface (EHCI)
- Open Host Controller Interface (OHCI)
- Universal Host Controller Interface (UHCI)
- Supports USB 2.0 Transceiver Macrocell Interface Plus (UTMI+) Low Pin Interface (ULPI) PHYs (SDR mode only)
- Supports up to 16 bidirectional endpoints, including control
endpoint 0 Note: Only seven periodic device IN endpoints are supported.
- Supports up to 16 host channels Note: In host mode, when the number of device endpoints is greater than the number of host channels, software can reprogram the channels to support up to 127 devices, each having 32 endpoints (IN + OUT), for a maximum of 4,064 endpoints.
- Supports generic root hub
- Supports automatic ping capability
I2C Controllers
- Support both 100 KBps and 400 KBps modes
- Support both 7-bit and 10-bit addressing modes
- Support master and slave operating mode
- Direct access for host processor
- DMA controller may be used for large transfers
UARTs
- 16550-compatible UART
- Support automatic flow control as specified in 16750 standard
- Direct access for host processor
- DMA controller may be used for large transfers
- Separate thresholds for DMA request and handshake signals to maximize throughput
- 128-byte transmit and receive FIFO buffers
- Programmable baud rate up to 6.25 MBaud (with 100MHz reference clock)
- Programmable character properties, such as number of data bits per character (5-8), optional parity bit (with odd or even select) and number of stop bits (1, 1.5 or 2)
SPI Master Controllers
- Programmable data frame size of 4 - 32 bits
- Supports full- and half-duplex modes
- Supports up to four chip selects
- Direct access for host processor
- DMA controller may be used for large transfers
- Programmable master serial bit rate
- Support for receive sample delay
- Support for multi-master mode
- Choice of Motorola® SPI, Texas Instruments® Synchronous Serial Protocol or National Semiconductor® Microwire protocol
SPI Slave Controllers
- Programmable data frame size from 4 - 32 bits
- Support for full- and half-duplex modes
- Direct access for host processor
- DMA controller may be used for large transfers
GPIO Interfaces
- Digital de-bounce
- Configurable interrupt mode
- Configurable hardware and software control for each signal
- Level and edge interrupts
CoreSight Debug and Trace
- Real-time program flow instruction trace through a separate Embedded Trace Macrocell (ETM) for each processor
- Host debugger JTAG interface
- Connections for cross-trigger and STM-to-FPGA interfaces, which enable soft IP cores to generate of triggers and system trace messages
- Custom message injection through STM into trace stream for delivery to host debugger
- Capability to route trace data to any slave accessible to the ETR master, which is connected to the L3 interconnect
Hard Processor System I/O Pin Multiplexing
The Intel® Stratix® 10 SoC has a total of 48 flexible I/O pins that are used for HPS operation, external flash memories, and external peripheral communication. A pin multiplexing mechanism allows the SoC to use the flexible I/O pins in a wide range of configurations.
Endian Support
The HPS is natively a little–endian system. All HPS slaves are little endian.
The processor masters are software configurable to interpret data as little endian, big endian, or byte–invariant (BE8). All other masters, including the USB 2.0 interface, are little endian. Registers in the MPU and L2 cache are little endian regardless of the endian mode of the CPUs.
The FPGA–to–HPS, HPS–to–FPGA, FPGA–to–SDRAM, and lightweight HPS–to–FPGA interfaces are little endian.
If a processor is set to BE8 mode, software must convert endianness for accesses to peripherals and DMA linked lists in memory. The processor provides instructions to swap byte lanes for various sizes of data.
The ARM DMA controller is software configurable to perform byte lane swapping during a transfer.
Introduction to the Hard Processor System Address Map
- You can access the complete address map and register definitions for this IP and the entire HPS through the Intel® Stratix® 10 Hard Processor System Programmer's Reference Manual.
- You can also access an HTML webhelp version of the Intel® Stratix® 10 Hard Processor System Address Map and Register Definitions by clicking either of these links:
Cortex-A53 MPCore Processor
The Arm® Cortex® -A53 MPCore™ is composed of four Arm® v8-A architecture central processing units (CPUs), a level 2 (L2) cache, and debugging modules. Advanced functions, such as floating point operations and cryptographic extensions, are supported.
Features of the Cortex-A53 MPCore
The Arm® Cortex® -A53 MPCore™ Processor contains four CPUs that implement the Arm® v8-A architecture instruction set. Each CPU has identical integration.
- Support for 32- and 64-bit instruction sets
- In-order pipeline with symmetric dual-issue of most instructions
-
Arm®
NEON™
single instruction, multiple data (SIMD) coprocessor with a
floating point unit (FPU)
- Single- and double-precision IEEE-754 floating point math support
- Integer and polynomial math support
- Symmetric multiprocessing (SMP) and asymmetric multiprocessing (AMP) modes
- Arm® v8 Cryptography Extension
- Level 1 (L1) cache
- 32 KB two-way set associative instruction cache
- Single Error Detect (SED) and parity checking support for L1 instruction cache
- 32 KB four-way set associative data cache
- ECC, Single Error Correct, Double Error Detect (SECDED) protection for L1 data cache
- Memory Management Unit (MMU) that communicates with system MMU
(SMMU)
- 10-entry fully-associative instruction micro translation lookaside buffer (TLB)
- 10-entry fully-associative data micro TLB
- 512‑entry unified TLB
- Generic timer
- Governor module that controls clock and reset
- Debug modules
- Performance Monitor Unit
- Embedded Trace Macrocell (ETMv4)
- CoreSight cross trigger interface
Some integration is also shared among the four CPUs in the Cortex® -A53 MPCore processor.
- 1 MB Arm® L2 cache controller with ECC, SECDED protection
- Snoop Control Unit (SCU) that maintains coherency between CPUs and communicates with the system CCU
- Global timer
- Generic Interrupt Controller (GIC-400, version r0p1)
- System cache coherency unit (CCU)
- System memory management unit (SMMU, ARM MMU-500, version r2p0)
Processor |
Version |
---|---|
Cortex-A53 MPCore |
r0p4 |
Advantages of Cortex-A53 MPCore
The Cortex® -A53 MPCore™ processor seamlessly supports 32-bit and 64-bit instruction sets. It implements the full Arm® v8-A architecture and has a highly efficient 8-stage in-order pipeline enhanced with advanced fetch and data access techniques that provide high performance and low power.
Cortex-A53 MPCore Block Diagram
Cortex-A53 MPCore System Integration
- Requests from the Cortex® -A53 MPCore™ processor are sent to the cache coherency unit (CCU) by the 128-bit ACE bus master. The CCU supports memory read and write requests and I/O memory-mapped read and write requests. The CCU allows masters to maintain I/O coherency with the Cortex® -A53 MPCore™ subsystem.
- The System MMU (SMMU) resides outside of the Cortex® -A53 MPCore™ . It consists of a translation control unit (TCU) which controls and manages the address translations of each master's translation buffer unit (TBU). The TLB data of the Cortex® -A53 MPCore™ is managed by the SMMU.
- The debug access port (DAP) interfaces directly to the processor and can perform invasive or non-invasive debug.
- The Generic Interrupt Controller (GIC) resides outside of the Cortex® -A53 MPCore™ and sends interrupt requests to the processor through a dedicated bus.
Cortex-A53 MPCore Functional Description
Feature |
Configuration |
---|---|
Arm® v8-A architecture, Cortex® -A53 CPUs |
4 |
Instruction cache size per CPU |
32 KB, 2-way set associative with a line size of 64 bytes per line |
Data cache size per CPU |
32 KB, 4-way set associative with a line size of 64 bytes per line |
L2 cache size shared among four CPUs |
1 MB, 16-way set associative with a line size of 64 bytes per line |
Media Processing Engine with NEON™ technology in each CPU |
Included with support for floating-point operations |
Arm® v8-A cryptographic extensions in each CPU |
Included |
Embedded Trace Macrocell (ETMv4) in each CPU |
Included |
Cache protection |
Included for L1 and L2 cache. See "Cache Protection" section for more information. |
Virtualization
- A hypervisor, running in EL2, that is responsible for switching between virtual machines. A virtual machine is comprised of non-secure EL1 and non-secure EL0.
- A number of guest operating systems, that each run in non-secure EL1, on a virtual machine
- For each guest operating system, applications that usually run in non-secure EL0 on a virtual machine
- Virtual values of a small number of identification registers. A read of one of these registers by a guest OS or the applications for a guest OS returns the virtual value.
- Trap various operations, including memory management operations and accesses other registers. A trapped operation generates an exception that is taken to EL2.
- Route interrupts to:
- The current guest OS
- A guest OS that is not currently running
- The hypervisor
- Stage 1 maps the virtual address (VA) to an intermediate physical address (IPA). This translation is managed at EL1, usually by a guest OS. The guest OS believes that the IPA is the physical address (PA).
- Stage 2 maps the IPA to the PA. This translation is managed at EL2. The guest OS might be completely unaware of this stage. For more information on the translation regimes, see the System Memory Management Unit chapter.
- Hypervisor call (HVC) exception
- Traps to EL2
- All of the virtual interrupts:
- Virtual SError
- Virtual IRQ
- Virtual FIQ
The Cortex-A53 MPCore™ processor contains virtualization registers that allow you to configure translation tables, hypervisor operations, exception levels, and virtual interrupts, For more information, please refer to the Arm® Cortex-A53 MPCore™ Processor Technical Reference Manual.
Virtual Interrupts
When a virtual interrupt is enabled, its corresponding physical exception is taken to EL2, unless EL3 has configured that physical exception to be taken to EL3.
Physical Interrupt | Corresponding Virtual Interrupt |
---|---|
SError | Virtual SError |
IRQ | Virtual IRQ |
FIQ | Virtual FIQ |
Software executing in EL2 can use virtual interrupts to signal physical interrupts to non-secure EL1 and non-secure EL0.
- Software executing at EL2 routes a physical interrupt to EL2.
- When a physical interrupt of that type occurs, the exception handler executing in EL2 determines whether the interrupt can be handled in EL2 or requires routing to a guest OS in EL1. If an interrupt requires routing to a guest OS and the guest OS is running, the hypervisor asserts the appropriate virtual interrupt to signal the physical interrupt to the guest OS. If the guest OS is not running, the physical interrupt is marked as pending for the guest OS. When the hypervisor next switches to the virtual machine that is running that guest OS, the hypervisor uses the appropriate virtual interrupt type to signal the physical interrupt to the guest OS.
Memory Management Unit
The MMUs support 40-bit physical address size and two stages of translation. You can enable or disable each stage of address translation independently.
The Cortex® -A53 MPCore™ communicates with the system memory management unit (SMMU) when pages are invalidated in a CPU's MMU.
For more information regarding the SMMU, refer to the System Memory Management Unit chapter.
Translation Lookaside Buffers
TLB Type |
Memory Type |
Number of Entries |
Associativity |
---|---|---|---|
Micro TLB |
Instruction |
10 |
Fully associative |
Micro TLB |
Data |
10 |
Fully associative |
Main TLB |
Instruction and Data |
512 |
Four-way set-associative |
- 4-way set associative 64-entry walk cache that holds the result of a stage 1 translation. The walk cache holds entries fetched from the secure and non-secure state.
- 4-way set associative 64-entry intermediate physical address (IPA) cache. This cache holds map points between intermediate physical addresses and physical addresses. Only non-secure exception level 1 (EL1) and exception level 0 (EL0) stage 2 translations use this cache.
The micro TLBs are the first level of caching for the translation table information. The unified main TLB handles misses from the micro TLBs.
When the main TLB performs maintenance operations it flushes both the instruction and data micro TLBs.
Translation Match Process
The ARMv8-A architecture supports multiple mappings of the virtual address space, which are translated differently. The TLB entries store all the required context information to facilitate a match and avoid a TLB flush or a context or virtual machine switch.
Each TLB entry contains a virtual address, block size, physical address, and a set of memory properties that include the memory type and access permissions. Each entry is associated with a particular application space ID (ASID), or is global for all application spaces.
The TLB entry also contains a field to store the virtual memory ID (VMID) for accesses made from the non-secure EL0 and EL1. A memory space identifier in the TLB entry records whether the request occurred at the:
- EL3, if EL3 is in the AArch64 execution state
- Non-secure EL2 exception level
- Secure and non-secure EL0 or EL1 and EL3, when EL3 is in AArch32 execution state
- The virtual address matches that of the requested address
- The memory space matches the memory space state of the requests. The memory space can be
one of four values:
- Secure EL3, when EL3 is in the AArch64 execution state
- Non-secure EL2
- Secure EL0 or EL1, and EL3 when EL3 is in the AArch32 execution state
- Non-secure EL0 or EL1
- The ASID matches the current application space ID held in the CONTEXTIDR, TTBR0, or TTBR1 register or the entry is marked global.
- The VMID matches the current VMID held in the VTTBR register.
Level 1 Caches
Instruction Cache
- Instruction fetches are sequential
- A two-instruction transparent target instruction cache and 256-entry branch target address cache provides reduced branch latency
- An 8-entry return stack accelerates branch returns
- The read interface to the 1 MB L2 cache is 128-bits wide
A cache line is 64 bytes and only holds one instruction type. Different instruction types cannot be mixed in the same cache line.
Each cache line can hold the following:
- 16—A32 instructions
- 16—32-bit T32 instructions
- 16—A64 instructions
- 32—16-bit T32 instructions
The instruction cache supports single error detection (SED) parity checking.
Data Cache
The data cache is 4-way set associative with a cache line length of 64 bytes. It is organized as a physically indexed and physically tagged cache. The micro TLB for the data cache converts virtual addresses to physical addresses before it executes a cache access.
- Supports 256-bit writes and 128-bit reads to L2 cache
- Utilizes prefetch engine and read buffer
- Supports three outstanding data cache misses
- Provides error checking and correction (ECC) on L1 data and parity checking on control bits
ACE Transactions
Attribute | ACE Transaction | |||||
---|---|---|---|---|---|---|
Memory Type | Shareability | Domain | Load | Store | Load Exclusive | Store Exclusive |
Device | N/A | System | ReadNoSnoop | WriteNoSnoop | ReadNoSnoop and ARLOCKM set to HIGH | WriteNoSnoop and AWLOCKM set to HIGH |
Normal, inner Non-cacheable, outer Non-cacheable | Non-shared | System | ReadNoSnoop | WriteNoSnoop | ReadNoSnoop and ARLOCKM set to HIGH | WriteNoSnoop and AWLOCKM set to HIGH |
Inner-shared | ||||||
Outer-shared | ||||||
Normal, inner Non-cacheable, outer Write-Back or Write-Through | Non-shared | System | ReadNoSnoop | WriteNoSnoop | ReadNoSnoop | ReadNoSnoop |
Inner-shared | System | ReadNoSnoop | WriteNoSnoop | ReadNoSnoop and ARLOCKM set to HIGH | WriteNoSnoop and AWLOCKM set to HIGH | |
Outer-shared | ||||||
Normal, inner Write-Through, outer Write-Back, Write-Through | Non-shared | System | ReadNoSnoop | WriteNoSnoop | ReadNoSnoop | ReadNoSnoop |
Inner-shared | System | ReadNoSnoop | WriteNoSnoop | ReadNoSnoop and ARLOCKM set to HIGH | WriteNoSnoop and AWLOCKM set to HIGH | |
Outer-shared | ||||||
Non-cacheable, or Normal inner Write-Back outer Non-cacheable or Write-Through | Non-shared | System | ReadNoSnoop | WriteNoSnoop | ReadNoSnoop | ReadNoSnoop |
Inner-shared | System | ReadNoSnoop | WriteNoSnoop | ReadNoSnoop with ARLOCKM set to HIGH | WriteNoSnoop with ARLOCKM set to HIGH | |
Outer-shared | ||||||
Normal, inner Write-Back, outer Write-Back | Non-shared | Non-shareable | ReadNoSnoop | WriteNoSnoop | ReadNoSnoop | WriteNoSnoop |
Inner-shared | Inner Shareable | ReadShared | ReadUnique or CleanUnique if required, then a WriteBack when the line is evicted | ReadShared with ARLOCKM set to HIGH | CleanUnique with ARLOCKM set to HIGH if required, then a WriteBack when the line is evicted | |
Outer-shared | Outer Shareable |
Data Prefetching
The prefetcher is enabled by default at reset. You may configure the sequence length that triggers the prefetcher or the number of outstanding requests the prefetcher can make by programming the CPU Auxiliary Control (CPUACTLR) register.
Level 2 Memory System
- 1 MB L2 cache, shared among four processors
- 16-way set associative cache structure
- 64 bytes per line
- Snoop Control Unit (SCU) that provides data coherency and ECC protection
- Interfaces to system through a 128-bit AMBA® 4 ACE bus
Snoop Control Unit
- When the processors are set to SMP
mode, the SCU maintains data cache coherency between the processors. Note: The SCU does not maintain coherency of the instruction caches.
- The SCU reduces latency by using buffers to execute cache-to-cache transfers between CPUs without accessing external memory.
- The SCU can accept up to eight requests from the system.
The SCU communicates with the system-level cache coherency unit (CCU) to maintain coherency between the two modules.
Implementation Details
When the processor writes to any coherent memory location, the SCU ensures that the relevant data is coherent (updated, tagged or invalidated). Similarly, the SCU monitors read operations from a coherent memory location. If the required data is already stored within the other processor’s L1 cache, the data is returned directly to the requesting processor. If the data is not in L1 cache, the SCU issues a read to the L2 cache. If the data is not in the L2 cache memory, the read is finally forwarded to main memory. The primary goal is to maximize overall memory performance and minimize power consumption.
The SCU maintains bidirectional coherency between the L1 data caches belonging to the processors. When one processor performs a cacheable write, if the same location is cached in the other L1 cache, the SCU updates it.
Non‑coherent data passes through as a standard read or write operation.
If multiple CPUs attempt simultaneous access to the L2 cache, the SCU arbitrates among them.
Cryptographic Extensions
Cryptographic functions are an extension of the SIMD support and operate on the vector register file. This extension provides instructions for the acceleration of encryption and decryption to support:
- AES
- SHA1
- SHA2-256
The cryptographic extension also provides multiply instructions that operate on long polynomials.
NEON Multimedia Processing Engine
The NEON™ multimedia processing engine (MPE) provides hardware acceleration for media and signal processing applications. Each CPU includes an ARM NEON MPE that supports SIMD processing.
Single Instruction, Multiple Data (SIMD) Processing
Features of the NEON MPE
The NEON™ processing engine accelerates multimedia and signal processing algorithms such as video encoding and decoding, 2‑D and 3‑D graphics, audio and speech processing, image processing, telephony, and sound synthesis.
The Cortex® -A53 MPCore NEON MPE performs the following types of operations:
- SIMD and scalar single-precision floating-point computations
- Scalar double-precision floating-point computation
- SIMD and scalar half-precision floating-point conversion
- 8‑, 16‑, 32‑, and 64‑bit signed and unsigned integer SIMD computation
- 8‑bit or 16‑bit polynomial computation for single‑bit coefficients
The following operations are available:
- Addition and subtraction
- Multiplication with optional accumulation (MAC)
- Maximum- or minimum-value driven lane selection operations
- Inverse square root approximation
- Comprehensive data-structure load instructions, including register-bank-resident table lookup
Floating Point Unit
Each CPU in the Arm® Cortex-A53 MPCore™ processor includes full support for IEEE-754 floating point vector operations.
The floating-point unit (FPU) can execute half-, single-, and double-precision variants of the following operations:
- Add
- Subtract
- Multiply
- Divide
- Multiply and accumulate (MAC)
- Square root
The FPU also converts between floating-point data formats and integers, including special operations to round towards zero required by high-level languages.
ACE Bus Interface
The ACE bus interface operates in the mpu_ccu_clk domain, which is mpu_clk/2. This bus provides an AXI3 compatibility mode with support for privilege level accesses through the ARPROTM[0] and AWPROTM[0] signals.
The Cortex® -A53 MPCore™ processor does not generate any FIXED bursts and all WRAP bursts fetch a complete cache line starting with the critical word first. A burst does not cross a cache line boundary. The cache linefill fetch length is always 64 bytes. The Cortex® -A53 generates only a subset of all possible AXI transactions on the master interface.
For WriteBack transfers the supported transfers are:
- WRAP 4 128-bit for read transfers (linefills).
- INCR 4 128-bit for write transfers (evictions).
- INCR N (N:1, 2, or 4) 128-bit write transfers (read allocate).
For non-cacheable transactions:
- INCR N (N:1, 2, or 4) 128-bit for write transfers.
- INCR N (N:1, 2, or 4) 128-bit for read transfers.
- INCR 1 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit for read transfers.
- INCR 1 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit for write transfers.
- INCR 1 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit for exclusive write transfers.
- INCR 1 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit for exclusive read transfers.
For Device transactions:
- INCR N (N:1, 2, or 4) 128-bit read transfers.
- INCR N (N:1, 2, or 4) 128-bit write transfers.
- INCR 1 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit read transfers.
- INCR 1 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit write transfers.
- INCR 1 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit exclusive read transfers.
- INCR 1 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit exclusive write transfers.
For translation table walk transactions INCR 1 32-bit, and 64-bit read transfers.
The following characteristics apply to AXI transactions:
- WRAP bursts are only 128-bit.
- INCR 1 can be any size for read or write.
- INCR burst, more than one transfer, are only 128-bit.
- No transaction is marked as FIXED.
- Write transfers with all, some or no byte strobes HIGH can occur.
Abort Handling
The following list details items you should take into consideration about abort handling.
- All load accesses synchronously abort.
- All STREX, STREXB, STREXH, STREXD, STXR, STXRB, STXRH, STXP, STLXR, STLXRB, STLXRH and STLXP instructions use the synchronous abort mechanism.
- All store accesses to device memory, or normal memory that is inner non-cacheable, inner write-through, outer non-cacheable, or outer write-through use the asynchronous abort mechanism, except for STREX, STREXB, STREXH, STREXD, STXR, STXRB, STXRH, STXP, STLXR, STLXRB, STLXRH, and STLXP.
- All store accesses to normal memory that is both inner cacheable and outer cacheable and any evictions from L1 or L2 cache do not cause an abort in the processor. Instead, an nEXTERRIRQ interrupt is asserted because the access that aborts might not relate directly back to a specific CPU in the cluster.
- L2 linefills triggered by an L1 Instruction fetch assert the nEXTERRIRQ interrupt if the data is received from the interconnect in a dirty state. Instruction data can be marked as dirty as a result of self-modifying code or a line containing a mixture of data and instructions. If an error response is received on any part of the line, the dirty data might be lost.
Cache Protection
The L1 instruction cache provides parity checking with single error detection (SED). Double bit errors are not detected or corrected.
The L1 data cache and L2 cache provide single error correction and double error detection (SECDED). If a single-bit error is detected, the access that caused the error is stalled while the correction takes place. After correction, the access that was stalled continues or is retried.
RAM | Protection Type | Protection Granule | Correction Behavior |
---|---|---|---|
L1 Instruction cache tag | Parity, SED | 31 bits | Both lines in the cache set are invalidated, and then the line requested is refetched from L2 cache or external memory. |
L1 Instruction cache data | Parity, SED | 20 bits | Both lines in the cache set are invalidated, and then the line requested is refetched from L2 cache or external memory. |
TLB | Parity, SED | 52 bits | The entry is invalidated, and a new pagewalk is started to refetch it. |
L1 Data cache tag | Parity, SED | 32 bits | The line is cleaned and invalidated from the L1 cache. SCU duplicate tags are used to get the correct address. The line is refetched from L2 cache or external memory. |
L1 Data cache data | ECC, SECDED | 32 bits | The line is cleaned and invalidated from the L1 cache, with single bit errors corrected as part of the eviction. The line is refetched from L2 cache or external memory. |
L1 data cache dirty bit | Parity, SED with correction by re-loading data | 1 bit | The line is cleaned and invalidated from the L1 cache, with detection of dirty bit corruption through parity checking. Only the dirty bit is protected. The other bits are performance hints, therefore do not cause a functional failure if they are incorrect. Error is corrected by reloading the data. |
SCU L1 duplicate tag | ECC, SECDED | 33 bits | The tag is rewritten with the correct value, and the access is retried. If the error is uncorrectable then the tag is invalidated. |
L2 tag | ECC, SECDED | 33 bits | The tag is rewritten with the correct value, and the access is retried. If the error is uncorrectable then the tag is invalidated. |
L2 data | ECC, SECDED | 64 bits | Data is corrected inline, and the access might stall for an additional cycle or two while the correction takes place. After correction, the line might be evicted from the processor. |
Error Reporting
Detected errors are reported in the CPUMERRSR or L2MERRSR registers and also signaled on the PMUEVENT bus. Detected errors include errors that are successfully corrected, and those that cannot be corrected. If multiple errors occur on the same clock cycle then only one of them is reported.
Errors that cannot be corrected, and therefore might result in data corruption, also cause an abort. Your software can register this error and can either attempt to recover or can restart the system.
When an L1 data or L2 dirty cache line with an error on the data RAMs is evicted from the processor, the write on the master interface still takes place. However, if the error is uncorrectable, then the incorrect data is not written externally.
Generic Interrupt Controller
The Arm® Generic Interrupt Controller (GIC-400) resides within the system complex outside of the Cortex® -A53 processor. The GIC is shared by all of the Cortex® -A53 CPUs.
The GIC has software-configurable settings to detect, manage and distribute interrupts in the SoC.
- Interrupts are enabled or disabled and prioritized through control registers.
- Interrupts can be prioritized and signaled to different processors.
- You can configure interrupts as secure or non-secure by assigning them to group 0 or group1, respectively.
- Virtualization extensions within the GIC allow you to manage virtualized interrupts.
GIC Block Diagram
Each CPU generates a signal for every private peripheral interrupt ID (PPI ID). There is only one input signal for each SPI interrupt ID shared among the four CPUs. The GIC supports virtual interrupts as well.
The GIC notifies each CPU of an interrupt or virtual interrupt through output signals sent to the Cortex® -A53 MPCore™ .
The configuration and control for the GIC is memory-mapped and accessed through the cache coherency unit (CCU).
GIC Clock
The GIC operates in the mpu_periph_clk domain which is mpu_clk/4.
GIC Reset
The GIC is reset on a cold or warm reset. All interrupt configurations are cleared upon a cold or warm reset.
GIC Interrupt Map for the Stratix 10 SoC HPS
GIC Interrupt Number |
Source Block |
Interrupt Name |
Description |
---|---|---|---|
47 |
System Manager |
SERR_Global | Global system error |
48 |
CCU |
interrupt_ccu | CCU combined interrupt |
49 |
FPGA |
F2S_FPGA_IRQ0 | F2S FPGA Interrupt 0 |
50 |
FPGA |
F2S_FPGA_IRQ1 | F2S FPGA Interrupt 1 |
51 |
FPGA |
F2S_FPGA_IRQ2 | F2S FPGA Interrupt 2 |
52 |
FPGA |
F2S_FPGA_IRQ3 | F2S FPGA Interrupt 3 |
53 |
FPGA |
F2S_FPGA_IRQ4 | F2S FPGA Interrupt 4 |
54 |
FPGA |
F2S_FPGA_IRQ5 | F2S FPGA Interrupt 5 |
55 |
FPGA |
F2S_FPGA_IRQ6 | F2S FPGA Interrupt 6 |
56 |
FPGA |
F2S_FPGA_IRQ7 | F2S FPGA Interrupt 7 |
57 |
FPGA |
F2S_FPGA_IRQ8 | F2S FPGA Interrupt 8 |
58 |
FPGA |
F2S_FPGA_IRQ9 | F2S FPGA Interrupt 9 |
59 |
FPGA |
F2S_FPGA_IRQ10 | F2S FPGA Interrupt 10 |
60 |
FPGA |
F2S_FPGA_IRQ11 | F2S FPGA Interrupt 11 |
61 |
FPGA |
F2S_FPGA_IRQ12 | F2S FPGA Interrupt 12 |
62 |
FPGA |
F2S_FPGA_IRQ13 | F2S FPGA Interrupt 13 |
63 |
FPGA |
F2S_FPGA_IRQ14 | F2S FPGA Interrupt 14 |
64 |
FPGA |
F2S_FPGA_IRQ15 | F2S FPGA Interrupt 15 |
65 |
FPGA |
F2S_FPGA_IRQ16 | F2S FPGA Interrupt 16 |
66 |
FPGA |
F2S_FPGA_IRQ17 | F2S FPGA Interrupt 17 |
67 |
FPGA |
F2S_FPGA_IRQ18 | F2S FPGA Interrupt 18 |
68 |
FPGA |
F2S_FPGA_IRQ19 | F2S FPGA Interrupt 19 |
69 |
FPGA |
F2S_FPGA_IRQ20 | F2S FPGA Interrupt 20 |
70 |
FPGA |
F2S_FPGA_IRQ21 | F2S FPGA Interrupt 21 |
71 |
FPGA |
F2S_FPGA_IRQ22 | F2S FPGA Interrupt 22 |
72 |
FPGA |
F2S_FPGA_IRQ23 | F2S FPGA Interrupt 23 |
73 |
FPGA |
F2S_FPGA_IRQ24 | F2S FPGA Interrupt 24 |
74 |
FPGA |
F2S_FPGA_IRQ25 | F2S FPGA Interrupt 25 |
75 |
FPGA |
F2S_FPGA_IRQ26 | F2S FPGA Interrupt 26 |
76 |
FPGA |
F2S_FPGA_IRQ27 | F2S FPGA Interrupt 27 |
77 |
FPGA |
F2S_FPGA_IRQ28 | F2S FPGA Interrupt 28 |
78 |
FPGA |
F2S_FPGA_IRQ29 | F2S FPGA Interrupt 29 |
79 |
FPGA |
F2S_FPGA_IRQ30 | F2S FPGA Interrupt 30 |
80 |
FPGA |
F2S_FPGA_IRQ31 | F2S FPGA Interrupt 31 |
81 |
FPGA |
F2S_FPGA_IRQ32 | F2S FPGA Interrupt 32 |
82 |
FPGA |
F2S_FPGA_IRQ33 | F2S FPGA Interrupt 33 |
83 |
FPGA |
F2S_FPGA_IRQ34 | F2S FPGA Interrupt 34 |
84 |
FPGA |
F2S_FPGA_IRQ35 | F2S FPGA Interrupt 35 |
85 |
FPGA |
F2S_FPGA_IRQ36 | F2S FPGA Interrupt 36 |
86 |
FPGA |
F2S_FPGA_IRQ37 | F2S FPGA Interrupt 37 |
87 |
FPGA |
F2S_FPGA_IRQ38 | F2S FPGA Interrupt 38 |
88 |
FPGA |
F2S_FPGA_IRQ39 | F2S FPGA Interrupt 39 |
89 |
FPGA |
F2S_FPGA_IRQ40 | F2S FPGA Interrupt 40 |
90 |
FPGA |
F2S_FPGA_IRQ41 | F2S FPGA Interrupt 41 |
91 |
FPGA |
F2S_FPGA_IRQ42 | F2S FPGA Interrupt 42 |
92 |
FPGA |
F2S_FPGA_IRQ43 | F2S FPGA Interrupt 43 |
93 |
FPGA |
F2S_FPGA_IRQ44 | F2S FPGA Interrupt 44 |
94 |
FPGA |
F2S_FPGA_IRQ45 | F2S FPGA Interrupt 45 |
95 |
FPGA |
F2S_FPGA_IRQ46 | F2S FPGA Interrupt 46 |
96 |
FPGA |
F2S_FPGA_IRQ47 | F2S FPGA Interrupt 47 |
97 |
FPGA |
F2S_FPGA_IRQ48 | F2S FPGA Interrupt 48 |
98 |
FPGA |
F2S_FPGA_IRQ49 | F2S FPGA Interrupt 49 |
99 |
FPGA |
F2S_FPGA_IRQ50 | F2S FPGA Interrupt 50 |
100 |
FPGA |
F2S_FPGA_IRQ51 | F2S FPGA Interrupt 51 |
101 |
FPGA |
F2S_FPGA_IRQ52 | F2S FPGA Interrupt 52 |
102 |
FPGA |
F2S_FPGA_IRQ53 | F2S FPGA Interrupt 53 |
103 |
FPGA |
F2S_FPGA_IRQ54 | F2S FPGA Interrupt 54 |
104 |
FPGA |
F2S_FPGA_IRQ55 | F2S FPGA Interrupt 55 |
105 |
FPGA |
F2S_FPGA_IRQ56 | F2S FPGA Interrupt 56 |
106 |
FPGA |
F2S_FPGA_IRQ57 | F2S FPGA Interrupt 57 |
107 |
FPGA |
F2S_FPGA_IRQ58 | F2S FPGA Interrupt 58 |
108 |
FPGA |
F2S_FPGA_IRQ59 | F2S FPGA Interrupt 59 |
109 |
FPGA |
F2S_FPGA_IRQ60 | F2S FPGA Interrupt 60 |
110 |
FPGA |
F2S_FPGA_IRQ61 | F2S FPGA Interrupt 61 |
111 |
FPGA |
F2S_FPGA_IRQ62 | F2S FPGA Interrupt 62 |
112 |
FPGA |
F2S_FPGA_IRQ63 | F2S FPGA Interrupt 63 |
113 |
DMA |
dma_IRQ0 | DMA Interrupt 0 |
114 |
DMA |
dma_IRQ1 | DMA Interrupt 1 |
115 |
DMA |
dma_IRQ2 | DMA Interrupt 2 |
116 |
DMA |
dma_IRQ3 | DMA Interrupt 3 |
117 |
DMA |
dma_IRQ4 | DMA Interrupt 4 |
118 |
DMA |
dma_IRQ5 | DMA Interrupt 5 |
119 |
DMA |
dma_IRQ6 | DMA Interrupt 6 |
120 |
DMA |
dma_IRQ7 | DMA Interrupt 7 |
121 |
DMA |
dma_irq_abort | DMA Abort Interrupt |
122 |
EMAC0 |
emac0_IRQ | EMAC0 Interrupt |
123 |
EMAC1 |
emac1_IRQ | EMAC1 Interrupt |
124 |
EMAC2 |
emac2_IRQ | EMAC2 Interrupt |
125 |
USB0 |
usb0_IRQ | USB0 Interrupt |
126 |
USB1 |
usb1_IRQ | USB1 Interrupt |
127 |
SDRAM scheduler |
HMC_error | Hard Memory Controller Error |
128 |
SDMMC |
sdmmc_IRQ | SD/MMC Interrupt |
129 |
NAND |
nand_IRQ | NAND Interrupt |
130 |
Reserved |
Reserved |
- |
131 |
SPI0 master |
spim0_IRQ | SPI0 Master Interrupt |
132 |
SPI1 master |
spim1_IRQ | SPI1 Master Interrupt |
133 |
SPI0 slave |
spis0_IRQ | SPI0 Slave Interrupt |
134 |
SPI1 slave |
spis1_IRQ | SPI1 Slave Interrupt |
135 |
I2C0 |
i2c0_IRQ | I2C0 Interrupt |
136 |
I2C1 |
i2c1_IRQ | I2C1 Interrupt |
137 |
I2C2 |
i2c2_IRQ | I2C2 Interrupt (I2C2 can be used with EMAC0) |
138 |
I2C3 |
i2c3_IRQ | I2C3 Interrupt (I2C3 can be used with EMAC1) |
139 |
I2C4 |
i2c4_IRQ | I2C4 Interrupt (I2C4 can be used with EMAC2) |
140 |
UART0 |
uart0_IRQ |
UART0 Interrupt |
141 |
UART1 |
uart1_IRQ | UART1 Interrupt |
142 |
GPIO0 |
gpio0_IRQ |
GPIO 0 Interrupt |
143 |
GPIO1 |
gpio1_IRQ |
GPIO 1 Interrupt |
144 |
Reserved |
- |
- |
145 |
Timer0 |
timer_l4sp_0_IRQ |
Timer0 Interrupt |
146 |
Timer1 |
timer_l4sp_1_IRQ |
Timer1 Interrupt |
147 |
Timer2 |
timer_osc1_0_IRQ |
Timer 2 Interrupt |
148 |
Timer3 |
timer_osc1_1_IRQ |
Timer 3 Interrupt |
149 |
Watchdog0 |
wdog0_IRQ | Watchdog0 Interrupt |
150 |
Watchdog1 |
wdog1_IRQ | Watchdog1 Interrupt |
151 |
Clock Manager |
clkmgr_IRQ | Clock Manager Interrupt |
152 |
SDRAM MPFE |
seq2core | Calibration Interrupt |
153 |
CoreSight CPU0 CTI |
CTIIRQ[0] | Cortex-A53 MPCore™ Processor CPU 0 Cross Trigger Interface Interrupt |
154 |
CoreSight CPU1 CTI |
CTIIRQ[1] | Cortex-A53 MPCore™ Processor CPU 1 Cross Trigger Interface Interrupt |
155 |
CoreSight CPU2 CTI |
CTIIRQ[2] | Cortex-A53 MPCore™ Processor CPU 2 Cross Trigger Interface Interrupt |
156 |
CoreSight CPU3 CTI |
CTIIRQ[3] | Cortex-A53 MPCore™ Processor CPU 3 Cross Trigger Interface Interrupt |
157 |
Watchdog2 |
wdog2_IRQ | Watchdog 2 Interrupt |
158 |
Watchdog3 |
wdog3_IRQ | Wathdog 3 Interrupt |
159 |
Cortex® -A53 | nEXTERRIRQ | Cortex-A53 MPCore™ External Error Interrupt |
160 |
System MMU |
gbl_flt_irpt_s | Global Secure Fault Interrupt |
161 |
System MMU |
gbl_flt_irpt_ns | Global Non-secure Fault Interrupt |
162 |
System MMU | perf_irpt_FPGA_TBU | FPGA TBU Performance Counter Interrupt |
163 |
System MMU | perf_irpt_DMA_TBU | DMA TBU Performance Counter Interrupt |
164 |
System MMU | perf_irpt_EMAC_TBU | EMAC TBU Performance Counter Interrupt |
165 |
System MMU | perf_irpt_IO_TBU | Peripheral I/O Master TBU Performance Counter Interrupt |
167 |
Reserved |
Reserved |
- |
168 |
System MMU | comb_irpt_ns |
System MMU Combined Non-secure Interrupt |
169 |
System MMU | comb_irpt_s | System MMU Combined Secure Interrupt |
170 |
System MMU | cxt_irpt_0 | System MMU Non-secure Context Interrupt 0 |
171 |
System MMU | cxt_irpt_1 | System MMU Non-secure Context 1 Interrupt |
172 |
System MMU | cxt_irpt_2 | System MMU Non-secure Context 2 Interrupt |
173 |
System MMU | cxt_irpt_3 | System MMU Non-secure Context 3 Interrupt |
174 |
System MMU | cxt_irpt_4 | System MMU Non-secure Context 4 Interrupt |
175 |
System MMU | cxt_irpt_5 | System MMU Non-secure Context 5 Interrupt |
176 |
System MMU | cxt_irpt_6 | System MMU Non-secure Context 6 Interrupt |
177 |
System MMU | cxt_irpt_7 | System MMU Non-secure Context 7 Interrupt |
178 |
System MMU | cxt_irpt_8 | System MMU Non-secure Context 8 Interrupt |
179 |
System MMU | cxt_irpt_9 | System MMU Non-secure Context 9 Interrupt |
180 |
System MMU | cxt_irpt_10 | System MMU Non-secure Context 10 Interrupt |
181 |
System MMU | cxt_irpt_11 | System MMU Non-secure Context 11 Interrupt |
182 |
System MMU | cxt_irpt_12 | System MMU Non-secure Context 12 Interrupt |
183 |
System MMU | cxt_irpt_13 | System MMU Non-secure Context 13 Interrupt |
184 |
System MMU | cxt_irpt_14 | System MMU Non-secure Context 14 Interrupt |
185 |
System MMU | cxt_irpt_15 | System MMU Non-secure Context 15 Interrupt |
186 |
System MMU | cxt_irpt_16 | System MMU Non-secure Context 16 Interrupt |
187 |
System MMU | cxt_irpt_17 | System MMU Non-secure Context 17 Interrupt |
188 |
System MMU | cxt_irpt_18 | System MMU Non-secure Context 18 Interrupt |
189 |
System MMU | cxt_irpt_19 | System MMU Non-secure Context 19 Interrupt |
190 |
System MMU | cxt_irpt_20 | System MMU Non-secure Context 20 Interrupt |
191 |
System MMU | cxt_irpt_21 | System MMU Non-secure Context 21 Interrupt |
192 |
System MMU | cxt_irpt_22 | System MMU Non-secure Context 22 Interrupt |
193 |
System MMU | cxt_irpt_23 | System MMU Non-secure Context 23 Interrupt |
194 |
System MMU | cxt_irpt_24 | System MMU Non-secure Context 24 Interrupt |
195 |
System MMU | cxt_irpt_25 | System MMU Non-secure Context 25 Interrupt |
196 |
System MMU | cxt_irpt_26 | System MMU Non-secure Context 26 Interrupt |
197 |
System MMU | cxt_irpt_27 | System MMU Non-secure Context 27 Interrupt |
198 |
System MMU | cxt_irpt_28 | System MMU Non-secure Context 28 Interrupt |
199 |
System MMU | cxt_irpt_29 | System MMU Non-secure Context 29 Interrupt |
200 |
System MMU | cxt_irpt_30 | System MMU Non-secure Context 30 Interrupt |
201 |
System MMU | cxt_irpt_31 | System MMU Non-secure Context 31 Interrupt |
202 |
Cortex® -A53 | nPMUIRQ[0] | Cortex® -A53 Processor CPU 0 Performance Monitor Interrupt |
203 |
Cortex® -A53 | nPMUIRQ[1] | Cortex® -A53 Processor CPU 1 Performance Monitor Interrupt |
204 |
Cortex® -A53 | nPMUIRQ[2] | Cortex® -A53 Processor CPU 2 Performance Monitor Interrupt |
205 |
Cortex® -A53 | nPMUIRQ[3] | Cortex® -A53 Processor CPU 3 Performance Monitor Interrupt |
Generic Timers
The generic timer of each CPU contains a set of timer registers to capture a variety of events:
- non-secure physical events
- secure physical events
- physical events
- virtual events
The four timers provided in each CPU are:
- EL1 non-secure physical timer register
- EL1 secure physical timer register
- EL2 virtual physical timer register
- Hypervisor timer register
You can configure the generic timers as count-up or count-down timers and they can operate in real-time and during virtual memory operation. You can also program a starting value for each generic timer.
Each of these timers has a 64‑bit comparator that generates a private interrupt when the counter reaches the specified value. These interrupts are sent as a private peripheral interrupt with separate PPI ID.
Timer | PPI ID |
---|---|
EL1 non-secure physical timer | 30 |
EL1 secure physical timer | 29 |
EL2 virtual physical timer | 27 |
Hypervisor timer | 26 |
For more information about the generic timers, please refer to the Arm® Cortex® -A53 MPCore Processor Technical Reference Manual, and the Arm® Architecture Reference Manual ARMv8, for ARMv8-A Architecture.
System Counter
This system counter measures the passing of time in real-time. A 64-bit bus interface carries the system timer value to each CPU and this value is used as a basis for the four generic timers. The system timer operates in the mpu_periph_clk domain, which is mpu_clk/4.
Debug Modules
The Cortex® -A53 MPCore™ provides the following assistance for debug:
- Support for JTAG interface
- Embedded trace interface, which includes program and event trace
- Cross-trigger interface (CTI) that communicates between processors and other HPS debug modules
ARMv8 Debug
Each of the four CPUs in the Arm® Cortex® -A53 MPCore™ supports self-hosted debug and external debug. When you use self-hosted debug, you can use debug instructions to move the CPU into a debug state. When you use external debug, you can configure debug events to trigger the CPU to enter a debug state that is controlled by an external debugger.
Interactive Debugging Features
Each Cortex® -A53 MPCore™ CPU has built-in debugging capabilities, including six hardware breakpoints (two with Context ID comparison capability) and four watchpoints. The interactive debugging features can be controlled by external JTAG tools or by processor-based monitor code.
Performance Monitor Unit
Each Arm® v8-A CPU has a Performance Monitoring Unit (PMU) that enables events such as cache misses and executed instructions to be counted over a period of time. The PMU supports 58 events to gather statistics on the operation of the processor and memory system. You can use up to six counters in the PMU to count and record events in real time. The PMU counters are 32-bit and are enabled based on events.
You can access each CPU's PMU counters through the system interface or from an external debugger. The events are also supplied to the Embedded Trace Macrocell (ETM) and can be used for trigger or trace.
Embedded Trace Macrocell
- Support for:
- 8-byte instruction size
- 1-byte virtual machine ID size
- 4-byte context ID size
- Cycle counting in the instruction trace
- Branch broadcast tracing
- Three exception levels in secure state
- Three exception levels in non-secure state
- Four events in trace
- Return stack support
- Tracing OS Error exception support
- 7-bit trace ID
- 64-bit global timestamp size
- ATB trigger support
- Low-power behavior override
- Stall control support
Embedded Trace Macrocell Reset
Program Trace
Each processor has an independent program trace monitor (PTM) that provides real-time instruction flow trace. The PTM is compatible with a number of third-party debugging tools.
The PTM provides trace data in a highly compressed format. The trace data includes tags for specific points in the program execution flow, called waypoints. Waypoints are specific events or changes in the program flow.
The PTM recognizes and tags the waypoints listed in Table 36.
Type |
Additional Waypoint Information |
---|---|
Indirect branches |
Target address and condition code |
Direct branches |
Condition code |
Instruction barrier instructions |
— |
Exceptions |
Location where the exception occurred |
Changes in processor instruction set state |
— |
Changes in processor security state |
— |
Context ID changes |
— |
Entry to and return from debug state when Halting debug mode is enabled |
— |
The PTM optionally provides additional information for waypoints, including the following:
- Processor cycle count between waypoints
- Global timestamp values
- Target addresses for direct branches
Event Trace
Events from each processor can be used as inputs to the PTM. The PTM can use these events as trace and trigger conditions.
Cross Trigger Interface
The Cortex® -A53 processor has a cross trigger interface (CTI) that communicates with ARM's CoreSight module. The CTI enables debug logic, ETM and PMU to interact with each other and with other HPS debugging components including the FPGA fabric. The ETM can export trigger events and perform actions on trigger inputs. Also, a breakpoint in one CPU can trigger a break in the other.
Cache Coherency Unit
The Cache Coherency Unit (CCU) resides outside of the Cortex® -A53 MPCore™ processor and maintains data coherency within the SoC system. Masters in the system, including HPS peripheral and user logic in the FPGA, can access coherent memory through the CCU. The FPGA interfaces to the CCU through the FPGA-to-HPS bridge.
The CCU provides I/O coherency. I/O coherency, also called one-way coherency, allows a CCU master to see the coherent memory visible to the Cortex® -A53 processor but does not allow the Cortex® -A53 processor to see memory changes outside of its cache.
The masters that communicate with the CCU can read coherent memory from the L1 and L2 caches, but cannot write directly to the L1 cache. If a master performs a cacheable write to the CCU, the L2 cache updates. Any of the cacheable write locations that reside in the L1 data cache are invalidated because the L2 cache has the latest copy of those addresses.
The CCU communicates with the SCU within the Cortex® -A53 MPCore™ to provide coherency with the SCU.
For more information, please refer to the Cache Controller Unit Chapter.
Clock Sources
System Clock Name |
Use |
---|---|
mpu_clk |
Main clock for the Arm® Cortex® -A53 MPCore processor. This synchronous clock drives each CPU including the L1 cache, the L2 cache contoller and the snoop control unit clock. |
mpu_ccu_clk |
Synchronous clock for the L2 RAM. The L2 RAM is clocked at ½ of the mpu_clk frequency. The 128-bit Cortex® -A53 MPCore™ ACE bus and the system cache coherency unit (CCU), also operate in the mpu_ccu_clk domain. |
mpu_periph_clk |
Synchronous clock for the peripherals internal to the Arm® Cortex® -A53 MPCore MPU system complex. The peripherals include the generic interrupt controller and internal timers. They are clocked at ¼ of the mpu_clk frequency. |
cs_pdbg_clk |
Asynchronous clock for debug and performance monitor counters. |
Cortex -A53 MPCore Programming Guide
Enabling Cortex -A53 MPCore Clocks
After the Cortex® -A53 MPCore™ comes out of reset, the mpuclken bit in the mainpllgrp of the Clock Manager is set to 1 by default and the processor clock group is enabled. To disable the processor clock group at any time you can write a 1 to the mpuclken bit of the enr register in privileged mode.
When the processor comes out of reset, the secure internal oscillator is enabled. To use a different source, set the mpu bit in the bypassr register of the mainpllgrp of registers in the Clock Manager. Next, select the source and frequency by programming the mpuclk register in the mainpllgrp of register in the Clock Manager.
Enabling and Disabling Cache
You can enable the instruction and data caches in the System Control Register (SCTLR).
If you disable the instruction cache, all instruction fetches are treated as non-cacheable. Only instruction cache maintenance operations continue to be maintained when the instruction cache is disabled.
You cannot enable and disable the L1 and L2 data caches separately because they are controlled by the same enable. If you disable the data cache, loads and stores are treated as non-cacheable. Only cache maintenance operations continue to be maintained in the data caches.
Cortex -A53 MPCore Address Map
- You can access the complete address map and register definitions for this IP and the entire HPS through the Intel® Stratix® 10 Hard Processor System Programmer's Reference Manual.
- You can also access an HTML webhelp version of the Intel® Stratix® 10 Hard Processor System Address Map and Register Definitions by clicking either of these links:
Cache Coherency Unit
The CCU comprises a coherency interconnect, cache coherency controller (CCC), an I/O coherency bridge (IOCB) and support for distributed virtual memory (DVM).
The Intel® Stratix® 10 Hard Processor System (HPS) cache coherency unit (CCU) ensures consistency of shared data. Dedicated master peripherals in the HPS and those built in FPGA logic access coherent memory through the CCU. Cacheable transactions from the system interconnect route to the CCU.
The CCU provides I/O coherency with the Arm® Cortex® -A53 MPCore™ cache subsystem. I/O coherency, also called one-way coherency, allows HPS peripheral and FPGA masters (I/O masters) to see the same coherent view of system memory as the Cortex® -A53 MPCore™ processor cores, but does not allow the Cortex® -A53 MPCore™ processor cores to be coherent with any caches residing in I/O masters. The CCU also contains error protection logic and logic for optimal performance during coherent accesses. The CCU forwards non-coherent accesses directly to the addressed slave port.
The following master ports interface to the CCU:
- Cortex® -A53 MPCore™ processor
- FPGA-to-HPS bridge
- Translation Control Unit (TCU) located in the SMMU
- HPS
peripheral
master ports interfacing to the system interconnect:
- EMAC0/1/2
- USB0/1
- DMA
- SD/MMC
- NAND
- Embedded Trace Router (ETR)
- External SDRAM memory
- On-chip RAM
- Generic Interrupt Controller (GIC)
- Peripheral slaves and master CSR slave ports
- SDRAM register group
Cache Coherency Unit Features
- Coherency directory to track the state of the 1 MB L2 cache in the Arm® Cortex® -A53 MPCore™
- Snoop filter support
- Speculative fetch support for lower latency accesses
- Single-bit error correction and double-bit error detection (SECDED) in the coherency directory
- Support for distributed virtual memory (DVM) using the Arm® Advanced Microcontroller Bus Architecture ( AMBA® ) Advanced eXtensible Interface ( AXI® ) Coherency Extensions, also known as the ACE protocol. The CCU sends distributed virtual memory broadcast messages to the Cortex® -A53 MPCore™ and the TCU in the SMMU.
- Quality of service (QoS) support for transaction prioritization using a weight bandwidth allocation
- Flexible address range programming for each master-to-slave connection
- Interconnect debug capability through master and slave bridge status registers
- Interrupt support for CCU transaction and counter events
Cache Coherency Unit Block Diagram
The CCU's coherency interconnect routes master agent transactions to the cache coherency components within the interconnect and ultimately, to the slave agents.
The CCU manages one-way coherency with the Cortex® -A53 MPCore™ . The CCU allows the master agents (masters connected to the CCU) to see the coherent memory of the Cortex® -A53 MPCore™ processor cores but does not allow the processor cores to be coherent with any caches external to the Cortex® -A53 MPCore™ processor.
The "CCU Block Diagram" shows the master agents of the CCU, the CCU components and the slaves that connect to it. The following master agent ports interface to the CCU:
- The
Cortex®
-A53
MPCore™
port:
- Connects the Cortex® -A53 MPCore™ subsystem to the CCU
- Supports memory read and write requests, as well as I/O memory-mapped read and write requests
- Includes read and write channels and their corresponding response channels
- Supports channels for snoop requests, snoop responses and signals used as part of the coherency protocol to indicate response arrival.
- The FPGA-to-HPS ACE-lite port connects the FPGA-to-HPS bridge to the CCU and supports I/O coherent requests to the CCU.
- The peripheral master port supports I/O coherent and non-coherent requests to the CCU from masters connected to the level 3 (L3) interconnect.
- The TCU port provides a page table walk interface to transfer I/O coherent requests to the CCU. This interface includes a DVM interface to send translation look-aside buffer (TLB) control information between the Cortex® -A53 MPCore™ and the system MMU.
- The external SDRAM port sends read and write transactions from the CCU to external memory through the L3 SDRAM interconnect.
- The SDRAM register port is a dedicated interface to the L3 SDRAM scheduler, L3 SDRAM adapter, and hard memory controller registers.
- The RAM port is a dedicated interface to the on-chip RAM.
- The GIC port is a dedicated interface to the general interrupt controller (GIC).
- The peripheral slave I/O port sends memory-mapped read and write requests to slave peripherals connected to the L3 interconnect.
The coherency bridge accepts requests from the ACE, ACE-lite + DVM and ACE-lite buses of the master agent ports. The coherency bridge sends these requests to the cache coherency controller.
The CCU directory tracks the state of the 1 MB L2 cache in the Arm® Cortex® -A53 MPCore™ .
The bridges control address range and QoS, and track the transmitting logic and FIFO status. You can control and view these features through registers in the CCU.
Routers within the CCU coherency interconnect send transactions to the appropriate coherency components within the CCU or to the appropriate slave port bridge where they are de-packetized and converted to the appropriate slave agent bus protocol.
Cacheable accesses from the Cortex® -A53 MPCore™ processor route directly to the CCU where the coherency directory is updated. The CCU forwards non-cacheable accesses directly to the slave.
Master agents with ACE-lite and ACE-lite + DVM bus interfaces send transactions to the I/O coherency bridge (IOCB). The IOCB sends coherent requests to the cache coherency controller (CCC) where a directory lookup determines if the address resides within a cache line of the MPU L2 cache. The coherency directory tracks the state of the 1 MB L2 cache.
The distributed virtual memory (DVM) controller supports the AMBA® ACE DVM protocol. The DVM controller broadcasts and synchronizes control packets for TLB invalidations, cache invalidations and similar requests.
Cache Coherency Unit Connectivity
Slave Agents | Master Agents | |||
---|---|---|---|---|
Cortex® -A53 MPCore™ | FPGA-to-HPS Bridge |
Peripherals (EMACs, USB, DMA, NAND, SDMMC, ETR) |
Translation Control Unit (TCU) | |
External SDRAM memory | X | X | X | X |
On-chip RAM | X | X | X | X |
Peripheral slaves | X | X | X | X |
SDRAM registers | X | X | X | |
Generic Interrupt Controller | X | X |
Cache Coherency Unit System Integration
The coherency interconnect in the CCU accepts both coherent and non-coherent transactions from masters in the system. The coherency interconnect routes non-coherent transactions to the appropriate agent target. Coherent transactions are initially routed to either the CCC or the IOCB in the CCU.
All accesses from the Cortex® -A53 MPCore™ are routed through the CCU so the coherency directory can be updated. TCU and FPGA-to-HPS bridge accesses and peripheral master accesses coming from the L3 interconnect are routed to the CCU if they are cacheable. Non-cacheable accesses route directly to the slave.
For more information about TBUs and distributed virtual memory support, refer to the "Distributed Virtual Memory Controller" section.
The CCU interfaces with the L3 interconnect and the SDRAM L3 interconnect. The SDRAM L3 interconnect provides a 64-bit register bus interface to the CCU for accessing the L3 SDRAM adapter, L3 SDRAM scheduler and hard memory controller registers. The CCU accesses external memory through a 128-bit interface to the SDRAM L3 interconnect.
Cache Coherency Unit Functional Description
Bridges
Each bridge has a set of corresponding registers that you can configure. All of the registers for a bridge configuration begin with a specific bridge register prefix.
Bridge | Bridge Register Prefix | Bridge Description |
---|---|---|
Cortex® -A53 MPCore™ bridge | bridge_cpu0_mprt_0_37 | Bridges the Cortex® -A53 MPCore™ processor to the coherency interconnect |
FPGA-to-HPS bridge | bridge_fpga1acel_mprt_4_118 | Bridges the FPGA-to-HPS interface to the coherency interconnect |
TCU bridge | bridge_tcu_mprt_3_70 | Bridges the TCU to the coherency interconnect |
Peripheral master bridge | bridge_iom_mprt_5_63 | Bridges the master peripherals in the L3 interconnect to the coherency interconnect. |
SDRAM registers bridge | bridge_ddrreg_sprt_8_118 | Bridges the coherency interconnect to the SDRAM register interface |
GIC bridge | bridge_gic_sprt_10_100 | Bridges the coherency interconnect to the generic interrupt controller (GIC) |
Peripheral slave bridge | bridge_ios_sprt_12_63 | Bridges the coherency interconnect to the peripheral slaves in the L3 interconnect. |
SDRAM bridge | bridge_mem0_sprt_13_118 | Bridges the coherency interconnect to the external SDRAM |
On-chip RAM bridge | bridge_ram_sprt_14_80 | Bridges the coherency interconnect to the on-chip RAM |
Bridge Registers
- <prefix>_reg_<suffix>
- <prefix>_reg
Register <reg> Name | Descriptive Name | Description |
---|---|---|
btus | Bridge TX Upsizer Status | These read-only registers track the status of a bridge transmitter upsizer and downsizer logic. |
txid | TX Bridge ID | This register holds a unique 8-bit identifier for the transmitting portion of a bridge. The txid identifier is the same value as the rxid for a bridge. |
btrl | Streaming TX Rate Limiter | This register exists for each host interface of the transmit bridge for QoS. Configure this register to control the rate of traffic injection from the host into the coherency interconnect. |
brs | Bridge Receive FIFO Status | This register tracks the status of the bridge's receive FIFO from the coherency interconnect. |
brus | Bridge RX Upsizer Status | These read-only registers track the status of a bridge receiver upsizer and downsizer logic. |
rxid | RX Bridge ID | This register holds a unique 8-bit identifier for the receiver portion of a bridge. The rxid identifier is the same value as the txid for a bridge. |
am_sts, as_sts | Status Flags Register | The am_sts register shows the status of the master bridge reads and writes. The as_sts register shows the status of the slave bridge reads and writes. |
am_bridge_id, as_bridge_id | Bridge ID Register | The am_bridge_id and as_bridge_id registers list the unique identifier assigned to the master and slave bridges, respectively. |
am_nocver | Interconnect Version ID Register | This read-only register lists the version of the coherency interconnect. |
am_err, as_err | Status and Error Register | The am_err and as_err registers record the first error event in its corresponding master and slave bridge, respectively. |
am_intm, as_intm | Interrupt Mask Register | You can configure the am_intm and as_intm registers to mask errors recorded in the am_err and as_err registers, respectively. |
p_n, where n is a number from 0 to 3 | QoS Profile Data | This register configures the weight of the bridge QoS. |
am_adbase | Base Address Register | This register specifies a base address for a slave address range that a master can access. Use this register in conjunction with the am_admask register to configure the range. When a master initiates a transaction, an address match occurs when it satisfies the equation: AxADDRS & AM_ADMASK[1]==AM_ADMASE[i] |
am_admask | Address Mask Register | This register specifies a mask value for a slave address range that a master can access. Use this register in conjunction with the am_adbase register to configure the range. When a master initiates a transaction, an address match occurs when it satisfies the equation: AxADDRS & AM_ADMASK[1]==AM_ADMASE[i] |
SYSCOREQ_reg | Coherency Connect Request Register | This register connects a master agent to the CCU system. |
SYSCOACK_reg | Coherency Connect Request Status Register | This read-only register indicates whether a master agent is connected to the CCU system. |
Cache Coherency Controller
The coherency directory acts as a snoop filter and allows the CCC to locally determine the state of the cache line without sending snoops to the L2 cache.
Coherency Directory
The cache coherency unit uses a directory-based coherency protocol. The CCU has a memory structure that tracks the state of the L2 cache lines.
The coherency directory stores cache line addresses and state information about each address. The directory does not store cache line data. It is not a cache. The coherency directory only contains address and state information that other master agents snoop when making coherent accesses. The directory acts as a snoop-filter and assists the cache coherency controller in locally determining the state of a cache line without sending snoops to the L2 cache.
The directory-based protocol provides lower latency accesses, reduced network bandwidth, reduced snoop traffic for the Cortex® -A53 MPCore™ , and higher peak bandwidth of the system.
When the Cortex® -A53 MPCore™ replaces a cache line, it sends an evict request to the coherency directory for any clean lines it is dropping. The directory no longer tracks those addresses after the eviction request completes.
ECC Protection
The data RAM provides 8-bits for ECC. You can program registers to directly access the directory RAM, including the ECC check bits. The hardware supports multiple ways to test ECC logic within the system, including taking an existing directory entry and flipping one or more bits before writing the entry back into the array.
You can disable ECC detection and correction through the ECC Disable Register (agent_ccc0_ccc_ecc_disable) at offset 0x30028 in the CCU.
Speculative Fetch
Speculative fetching can reduce the latency of the request but may expend memory bandwidth with unnecessary memory reads. It is recommended that you enable speculative fetch for latency sensitive requests. Disable this feature for requests that are less latency sensitive.
Speculative fetching for the Cortex® -A53 MPCore™ processor is enabled when the HPS is released from reset. To disable speculative fetch, clear the corresponding bit in the Speculative Fetch register (agent_ccc0_ccc_spec_fetch_0).
I/O Coherency Bridge
These masters send both non-coherent and I/O coherent traffic to the IOCB. If a master issues a WriteUnique or WriteLineUnique ACE protocol request and that address corresponds to a cache line, the IOCB notifies the Cortex® -A53 MPCore™ processor to invalidate that data. The IOCB prefetches coherent permissions for requests from the coherency directory so that it can execute these requests in parallel with non-coherent requests and maintain high bandwidth.
Distributed Virtual Memory Controller
DVM protocol broadcasts and synchronizes control packets for TLB invalidations, instruction cache invalidations, and similar requests.
- When the SMMU sends a DVM message, the message broadcasts to the Cortex® -A53 MPCore™ in the form of a snoop request. The TCU within the SMMU broadcasts snoops, gathers responses and replies to the Cortex® -A53 MPCore™ .
- The coherency interconnect also performs DVM synchronization tasks, which include sending synchronization snoops, gathering completion requests from the TCU in the SMMU, and eventually signaling back that the request has completed.
As part of the SMMU, TBUs sit between the master peripherals and the L3 interconnect. The FPGA-to-HPS interface also passes through a TBU before interfacing with the CCU.
Each TBU contains a micro translation look-aside buffer (TLB) that holds cached page table walk results from the translation control unit (TCU). For every virtual memory transaction that a master initiates, its TBU compares the virtual address against the translations stored in its buffer to see if a physical translation exists. If a translation does not exist, the TCU performs a page table walk. This SMMU integration allows the master peripheral's driver to pass virtual addresses directly to the master peripheral without having to perform virtual to physical address translations through the operating system.
For more information about distributed virtual memory support and the SMMU, refer to the System Memory Management Unit chapter.
Cache Coherency Unit Traffic Management
Quality of Service
In a weighted allocation policy, the CCU divides the resource bandwidth among all contending flows based on a pre-programmed set of weights.
You can set a higher weight for more important masters by programming the QoS Profile Data register (*p_n) for that master.
For example, if master_0 has a weight set to X for a slave access and master_1 has a weight set to Y, master_0 receives X/(X+Y)% of the total available bandwidth at the slave. This calculation assumes that all other masters that can access the CCU are idle.
The coherency interconnect uses dynamic weight adjustment algorithms that are fully distributed and provides full end-to-end weighted fairness.
The CCU uses round-robin arbitration when masters that share the same QoS priority and weight are simultaneously accessing the same slave.
Transmit Rate Limiters
The rate limiter is a token-based, flow-control mechanism that prevents a packet from being sent into the network unless enough time has passed since the last packet. This feature provides fair bandwidth sharing among agents. You can program the Streaming TX Rate Limiter register (btrl) to control the rate of traffic injection from an agent into the coherency interconnect.
You configure the rate limiter by selecting a maximum transmission rate and adding a maximum token size for that transmission rate in the Streaming TX Rate Limiter register (btrl) register.
In the diagram above, the token count increases over time at a specified rate. The token count saturates when it hits its maximum, which in this example is 3. When a packet is sent on this interface, the token count decrements. This mechanism ensures that the packet transmission rate does not exceed the rate limit except within a small window defined by the token count.
The token count only decrements by one for command transfers. For data transfers, each data beat decrements the token count.
Rate Limiter Configuration
Bits | Name | Description |
---|---|---|
31:21 | Reserved | Reserved |
20 | Rate Limit Logic Enable | Setting this bit enables the rate limiter logic that arbitrates master transfers. |
19:16 | Token Bucket Size | Program this field to indicate the maximum number of tokens that may accumulate at an interface when rate limiters are enabled. |
15:0 | Rate Limit Value (N) |
Program this field to indicate the peak rate limit for traffic from the host interface to the coherency interconnect. If the value N represents the rate limit value, then the rate equation is: rate=N/(216) |
The rate limit value(N) is a 16-bit adder where the overflow bit is the token arrival bit.
For example, if you want to specify a rate of 1 token every 5 cycles (or 20%), program N to be 13107 (decimal) or 0x3333. When added together 5 times, the value is approximately 216, so one packet can be sent every 5 cycles.
Cache Coherency Unit Interrupts
- Bridge Interrupt Mask register (*am_intm*): Each bridge has a corresponding Bridge Interrupt Mask register that controls interrupt triggers for read and write channel events and capture counter overflows.
- CCC Interrupt Mask register (agent_ccc0_ccc_interrupt_mask) at offset 0x30190: This register controls event counter overflow, single-bit ECC error and multiple bit ECC error interrupts.
Cache Coherency Unit Clocks
System Clock | Synchronous/Asynchronous | Description |
---|---|---|
mpu_ccu_clk | Synchronous | Main clock for CCU. Fixed at 1/2× mpu_clk |
mpu_periph_clk | Synchronous | Clock for CCU interface to GIC. Fixed at 1/4×mpu_clk |
Cache Coherency Unit Reset
- bridge_cpu0_mprt_0_37_am_adbase_mem_ddrreg_sprt_ddrregspace0_0
- bridge_cpu0_mprt_0_37_am_adbase_mem_ios_sprt_iospace0a_0
- bridge_cpu0_mprt_0_37_am_adbase_mem_ios_sprt_iospace0b_0
- bridge_cpu0_mprt_0_37_am_adbase_mem_ios_sprt_iospace1a_0
- bridge_cpu0_mprt_0_37_am_adbase_mem_ios_sprt_iospace1c_0
- bridge_cpu0_mprt_0_37_am_adbase_mem_ios_sprt_iospace1d_0
- bridge_cpu0_mprt_0_37_am_adbase_mem_ios_sprt_iospace1e_0
- bridge_iom_mprt_5_63_am_adbase_mem_ios_sprt_iospace0a_0
- bridge_iom_mprt_5_63_am_adbase_mem_ios_sprt_iospace1a_0
- bridge_iom_mprt_5_63_am_adbase_mem_ios_sprt_iospace1b_0
- bridge_iom_mprt_5_63_am_adbase_mem_ios_sprt_iospace1c_0
- bridge_iom_mprt_5_63_am_adbase_mem_ios_sprt_iospace1d_0
- bridge_iom_mprt_5_63_am_adbase_mem_ios_sprt_iospace1e_0
Cache Coherency Unit Transactions
The CCU handles transactions from the FPGA-to-HPS interface, TCU, and peripheral masters in the L3 interconnect as follows:
- Coherent read: The IOCB sends the read to the coherency
directory in the CCC to perform a lookup and issue a snoop to the
Cortex®
-A53
MPCore™
processor if required.
- If the access is a cache hit, data is routed from the cache.
- If the access is a cache miss, data is routed from the appropriate slave agent after cache operations have completed.
- Coherent write: The IOCB sends the write to the coherency
directory in the CCC to perform a lookup and issue a snoop.
- If the access is a cache hit, the cache is updated with the new data and the coherency directory continues to track the cache line.
- If the access is a cache miss, then the new data is written to the appropriate slave agent.
- Non-coherent transactions are handled differently depending on
the master agent issuing the transaction.
- If the FPGA or TCU send a non-coherent access to the CCU, the IOCB routes the access directly to the slave agent.
- If an HPS peripheral master issues a non-cacheable memory access to on-chip RAM or SDRAM, then the L3 interconnect routes the access to the IOCB of the CCU. In turn, the CCU routes the access directly to the corresponding memory.
- If an HPS peripheral master issues a non-cacheable memory access to a peripheral slave agent, then the L3 interconnect routes the access directly to the slave, bypassing the CCU.
Some key points to remember about CCU transactions:
- A master agent issues a read or write address to access a slave. This address is compared against the address ranges programmed in the Address Mask Register (*am_admask*) and Base Address Register (*am_adbase*) to identify the targeted slave device. A slave device can have multiple address ranges assigned to it, each from a different master. Address ranges can be non-continuous.
- You can program address ranges to be disabled, read-only, or write-only. During address decode, the CCU compares the transaction ARPROT or AWPROT with the access privilege programmed for an address range. A failed access check results in a decode error response for the transaction.
- Each address range can also be associated with hash functions that are used in the route lookup process.
- Master agents have no pre-defined priority. A master's L3 interconnect QoS level determines the associated coherency interconnect QoS priority for the L3 masters and slaves, as well as the SDRAM memory interface. The Cortex-A53 MPCore™ and FPGA-to-HPS interface priorities are configured in the System Manager and FPGA, respectively. You can configure the coherency interconnect QoS weights through the QoS Profile Data Register (p_0) registers.
- Fixed transactions are split into multiple single beat increments (INCRs).
- The CCU only accepts 16-, 32- or 64-byte WRAP transactions. All other cache line sizes generate a fatal error interrupt.
- Master and slave ports queue outstanding requests. The table below shows the maximum number of outstanding requests each agent supports.
-
Table 43. Maximum Outstanding Request Support Agent Outstanding Reads Outstanding Writes Cortex® -A53 MPCore™ processor 33 21 FPGA-to-HPS Interface 8 8 TCU 16 1 Peripheral masters 16 16 External SDRAM Memory 32 32 On-chip RAM 2 2 GIC 1 1 Peripheral slaves 16 16 SDRAM register group 2 2
Certain errors or stalls can occur when unsupported accesses occur:
- An unknown address or access privilege violation on the AR or AW channels causes a decode error. This error stalls the command channels until the decode error (DECERR) response can be issued on the R or B channel, respectively.
- Changing the QoS level while commands are outstanding can momentarily stall a channel if the change reorders the command to a slave over the network.
Command Mapping
ARSNOOP[3:0] | ARDOMAIN[1:0] | ARBAR[1:0] | ACE/ ACE-Lite Transaction Type | Target |
---|---|---|---|---|
4’b0000 | 2'b00, 2'b11 | 2'bX0 | ReadNoSnoop | Slave |
4’b0000 | 2'b01, 2'b10 | 2'bX0 | ReadOnce |
|
4’b0001 | 2'b01, 2'b10 | 2'bX0 | ReadShared |
|
4’b0010 | 2'b01, 2'b10 | 2'bX0 | ReadClean | CCC |
4’b0011 | 2'b01, 2'b10 | 2'bX0 | ReadNotSharedDirty | CCC |
4’b0111 | 2'b01, 2'b10 | 2'bX0 | ReadUnique | CCC |
4’b1011 | 2'b01, 2'b10 | 2'bX0 | CleanUnique | CCC |
4’b1100 | 2'b01, 2'b10 | 2'bX0 | MakeUnique | CCC |
4’b1000 | 2'b00, 2'b01, 2'b10 | 2'bX0 | CleanShared | CCC |
4’b1001 | 2'b00, 2'b01, 2'b10 | 2'bX0 | CleanInvalid | CCC |
4’b1101 | 2'b00, 2'b01, 2'b10 | 2'bX0 | MakeInvalid | CCC |
4’b0000 | 2'b01, 2'b10 | 2'b01 | Coherent Memory Bar |
|
4’b0000 | 2'b01, 2'b10 | 2'b11 | Coherent Sync Bar | local bridge |
4’b0000 | 2'b00, 2'b11 | 2'bX1 | Non-coherent Bar | local bridge |
4’b1110 | 2'b01, 2'b10 | 2'bX0 | DVM Complete | DVM |
4’b1111 | 2'b01, 2'b10 | 2'bX0 | DVM Message | DVM |
AWSNOOP[2:0] | AWDOMAIN[1:0] | AWBAR[1:0] | Transaction Type | Target |
---|---|---|---|---|
3’b000 | 2'b00, 2'b11 | 2'bX0 | WriteNoSnoop | Slave |
3’b000 | 2'b01, 2'b10 | 2'bX0 | WriteUnique |
|
3’b001 | 2'b01, 2'b10 | 2'bX0 | WriteLineUnique |
|
3’b010 | 2'b01, 2'b10 | 2'bX0 | WriteClean | CCC |
3’b010 | 2'b00 | 2'bX0 | Non-Share WriteClean | CCC |
3’b011 | 2'b01, 2'b10 | 2'bX0 | WriteBack | CCC |
3’b011 | 2'b00 | 2'bX0 | Non-Share WriteBack | CCC |
3’b100 | 2'b01, 2'b10 | 2'bX0 | Evict | CCC |
3’b101 | 2'b01, 2'b10 | 2'bX0 | WriteEvict | CCC |
3’b101 | 2'b00 | 2'bX0 | Non-Share WriteEvict | CCC |
3’b000 | 2'b01, 2'b10 | 2'b01 | Coherent Memory Bar |
|
3’b000 | 2'b01, 2'b10 | 2'b11 | Coherent Sync Bar | local bridge |
3’b000 | 2'b00, 2'b11 | 2'bX1 | Non-coherent Bar | local bridge |
Programming Guidelines
Enabling Interrupts
You can enable the ECC error or event counter overflow interrupts in the CCC by programming the CCC Interrupt Mask register (agent_ccc0_ccc_interrupt_mask) at offset 0x30190. You can track the interrupt status by reading the CCC Interrupt Status register (agent_ccc0_ccc_interrupt_err) at offset 0x30198.
You can enable read, write or counter overflow error interrupts in a specific bridge by programming the bridge's Interrupt Mask register (*am_intm*). You can track error status by reading the bridge's Status and Error register (*am_err*).
Disabling the FPGA-to-HPS Interface to CCU
Intel® recommends that you properly disable the FPGA-to-HPS interface to the CCU before reconfiguring the FPGA. Before reconfiguration, software must disable the interface using the CCC Active Agent Vector (agent_ccc0_ccc_active_vector_0) register. Your software must poll the CCC Disable Status (agent_ccc0_ccc_agent_disable_status) register to check for pending snoop requests prior to disabling the interface.
Specifying Address Ranges for Slave Devices
AxAddress & AM_ADMASK[i] == AM_ADBASE[i]where i represents a register bit.
You can program address ranges as disabled, read-only, or write-only. During address decode, the CCU compares ARPROT or AWPROT signals with the access privilege programmed for an address range. A failed access check results in a decode error response for the transaction.
-
Program the following fields in a bridge's *am_adbase* register:
Table 46. *am_adbase* Register Field Settings *am_adbase* Register Bitfield Configuration Description BASE_ADDRESS The base address value must be a factor of the address mask value. The base address register bitfields must not have a 1 where a corresponding mask bit is 0. Note: To prevent access errors, ensure that the *am_adbase* base address lies within the slave's valid address range.DI Set this bit if you are configuring this address range to be disabled. R_Wn Set this bit to make this address range readable; clear this bit to make it writable. I Set this bit if this address range holds instructions. NS Set this bit to make the address range non-secure; clear this bit to make the address range secure. P Set this bit to indicate if this range is only available through a privileged access. -
Program the corresponding *am_admask* register.
Bits [2:0] of *am_adbase* and *am_admask* act as a value and mask for checking against the AxPROT of an incoming command. The CCU allows a command access to a range if
AxPROT & *am_admask*[2:0] == *am_adbase*[2:0] & *am_admask*[2:0]
If the above check fails, then the CCU denies the command access to the range and returns a decode error response. For any access, you can selectively disable an address range or designate the access as read-only or write-only access using *am_adbase*[4:3] and *am_admask*[3]. The table below details the encodings.Table 47. Address Range Access. Note: An X in this table denotes a "don't care."*am_adbase*[4]-DI *am_admask*[3]-VALID *am_adbase*[3]-R_Wn Access 1 X X Range disabled 0 1 1 Read only 0 1 0 Write only 0 0 X Read/write
Accessing and Testing the Coherency Directory RAM
- CMD: Indicates which kind of indirect access to perform.
- WAY: Must always be clear because there is only one bank of RAM in the coherency directory.
- INDEX: Specifies the entry to access within the RAM.
The RAM width is 133 bits with an ECC width of 8 bits. These ECC bits are concatenated on the most significant bits of the CCC Indirect RAM Content (agent_ccc0_ccc_indirect_ram_cont_*) registers as {agent_ccc0_ccc_indirect_ram_cont_2[4:0] and agent_ccc0_indirect_ram_cont_1[127:125]}.
Indirect access supports four operations.
- Read Raw data: Use this command when you want to read coherency
directory RAM data without ECC correction.
- In the agent_ccc0_ccc_indirect_access_trig register, clear the cmd bits and specify the RAM index value you want to read in the index field.
- Read the returned data from the CCC Indirect RAM Content (agent_ccc0_indirect_ram_cont_*) registers.
- Write Raw Data: You can use this command to write data to the
coherency directory. You can include an ECC value in this data. This command assumes
the ECC logic is disabled. ECC logic can be enabled and disabled in the CCC ECC Disable (agent_ccc0_ccc_ecc_disable)
register.
- Program the agent_ccc0_ccc_indirect_ram_cont_* registers with the data you want to write to the coherency directory. You can include ECC bits in this value.
- In the agent_ccc0_ccc_indirect_access_trig register, set the cmd bits to 0x1 and specify the RAM index value you want to write in the index field. When triggered the content register value is written into the directory RAM.
- Write Data with generated ECC to coherency directory RAM: You can
use this command to write data without calculating the ECC bits. This command
assumes the ECC logic is enabled.
- Write data to the agent_ccc0_ccc_indirect_ram_cont_* registers.
- In the agent_ccc0_ccc_indirect_access_trig register, set the cmd bits to 0x2 and specify the RAM index value you want to write in the index field. When triggered the content register value is written into the directory RAM with a corresponding ECC value.
- Read-Modify-Write: This command performs a specific kind of
read-modify-write operation on a directory entry. The CCC reads the content of
the directory, XORs that content with the data in the agent_ccc0_indirect_ram_cont_* register, and writes the combined
value into the same directory entry. This command can be used to introduce
single or double bit errors into the directory to test error detection and
handling. The agent_ccc0_indirect_ram_cont_*
registers are not modified during this operation, so they can be used to
introduce errors into multiple lines.
- Write data to the agent_ccc0_ccc_indirect_ram_cont_* registers.
- In the agent_ccc0_ccc_indirect_access_trig register, set the cmd bits to 0x3 and specify the RAM index value you want to read in the index field.
You can issue indirect access commands during normal operation, but the write commands can have side-effects that break coherency functionality. The read raw command is not disruptive, and the read-modify-write can be performed atomically so single-bit errors can be introduced while maintaining functionality.
Secure and Non-secure Transactions
When you configure the system interconnect firewall to permit secure (S) transactions, only secure transactions traverse the firewall. When you configure the system interconnect to permit non-secure (NS) transactions, both secure (S) and non-secure (NS) transactions traverse the firewall.
However, the CCU behaves different with respect to the filtering functions it provides. If you configure the CCU to permit S transactions, only S transactions traverse the CCU (similar to the system interconnect firewall). However, if you configure the CCU to permit NS transactions, only NS transactions pass through to the slave. The CCU blocks S transactions in this case. You can configure CCU filtering to allow both S and NS transactions to traverse the CCU similar to the system interconnect firewall by programming the following values to the NS bit in the *am_adbase* and *am_admask* registers:
*am_adbase*.ns | *am_admask*.ns | Outcome |
---|---|---|
0 (secure) | 1 (enabled) | Secure transactions pass; non-secure transactions generate an error |
0 (secure) | 0 (disabled) | Secure and non-secure transactions pass |
Cache Coherency Unit Address Map and Register Definitions
- You can access the complete address map and register definitions for this IP and the entire HPS through the Intel® Stratix® 10 Hard Processor System Programmer's Reference Manual.
- You can also access an HTML webhelp version of the Intel® Stratix® 10 Hard Processor System Address Map and Register Definitions by clicking either of these links:
System Memory Management Unit
The system memory management unit (SMMU) provides memory management services to system bus masters. The SMMU translates input addresses to output addresses based on address mapping and memory attribute information in the SMMU registers and translation tables. The SMMU also provides caching attributes for physical pages. A single translation control unit (TCU) manages distributed translation buffer units (TBUs) and performs page table walks (PTWs) on translation misses.
The SMMU conforms to the Arm® SMMU v2 Specification.
Description | Revision Number |
---|---|
Arm® CoreLink® MMU-500 | r2p4 |
System Memory Management Unit Features
- Central TCU that supports five distributed TBUs for the following
masters:
- FPGA
- DMA
- EMAC0-2, collectively
- USB0-1, NAND, SD/MMC, ETR, collectively
- Secure Device Manager (SDM)
- Integrates caches for storing page table entries and intermediate table
walk data
- 512-entry macro TLB page table entry cache in the TCU
- 128-entry micro TLB for table walk data in the FPGA TBU and 32-entry micro TLB for all other distributed TBUs
- Single-bit error detection and invalidation on error detection for caches
- Communicates with the MMU of Arm® Cortex® -A53 MPCore™
- System-wide address translation
- Address virtualization
- Support for 32 contexts
- Allows two stages of translation or combined (stage 1 and stage 2)
translation
- Secure or non-secure translation capability in stage 1
- Support for modifying attributes from stage 1 to stage 2 translation
- Capable of multiple levels of address lookup
- Allows bypassing or disabling stages
- Supports up to 49-bit virtual addresses and up to 48-bit physical and intermediate physical addresses
- Provides programmable Quality of Service (QoS) to support page table walk arbitration
- Provides fault handling, logging and interrupts for translation errors
- Supports debug
System MMU Block Diagram
At the memory system level, the system MMU controls the following functions when performing an address translation:
- TLB operation
- Security state determination
- Context determination
- Memory access permissions and determination of memory attributes
- Memory attribute checks
System Memory Management Unit Interfaces
- AXI Programming Interface: The Cortex® -A53 MPCore™ configures the SMMU through this interface.
- ACE-Lite Interface: The TCU uses this interface for page table walk memory requests to the system interconnect.
- DVM Interface: The Cortex® -A53 MPCore™ uses this interface to send TLB control information to the SMMU TLBs.
- Interrupt Interface: The TCU sends context and system monitor interrupts to the generic interrupt controller (GIC) through this interface.
Each TBU contains the following interfaces:
- ACE-Lite Slave Interface: Creates a connection between the I/O device and the SMMU
- ACE-Lite Master Interface: Creates a connection between the SMMU and the system interconnect
- Event interface: Generates performance event signals
System Integration
The TBUs interface to the following masters:
- FPGA
- DMA
- EMAC0-2, collectively
- USB0-1, NAND controller, SD/MMC controller, Arm® Embedded Trace Router (ETR), collectively
- Secure Device Manager (SDM)
Each of the TLBs within the TBUs cache frequently used address ranges. By having multiple TBUs, the frequently cached addresses in the TLBs are localized to the masters connected to them. The TCU performs the page table walks on address misses.
The Cortex® -A53 MPCore™ has its own main and micro translation lookaside buffers (TLBs) for address translation but communicates with the SMMU so that its translation tables remain coherent. For more information about the Cortex® -A53 MPCore™ MMU, refer to the Cortex® -A53 MPCore™ chapter.
System Memory Management Unit Functional Description
- Observes the security state of the transaction that originates the request.
- Maps the incoming transaction to one of the 32 contexts using the incoming stream ID.
- Caches frequently used address ranges using the TLB in that master's TBU.
- Performs a memory page table walk automatically on a TLB address lookup miss.
- Applies memory attributes and translates the incoming address. This step is explained in the following Translation Stages section.
- Applies required fault handling for every transaction.
Translation Stages
-
In stage 1 translations, the virtual address (VA) input is translated to a physical address (PA) or intermediate physical address (IPA) output. Both secure and non-secure translation contexts use stage 1 translations. Typically, an OS defines translations tables in memory for the stage 1 translations of a given security state. The OS also configures the SMMU for the stage 1 translations before enabling the SMMU to accept transactions.
An example of a stage 1 translation could be a guest OS that translates addresses on a system that supports multiple OSs. In this case, the translation from virtual address to physical address is really a translation from virtual address to intermediate physical address that is managed along with other OS IPAs by a virtual machine manager.
- In stage 2 translations, an IPA input is translated to a PA output. Only non-secure translation contexts can use stage 2 translations. An example of stage 2 translation could be a hypervisor translating a particular guest OS IPA to a PA.
- Stage 1 and stage 2 translations may be combined so that a VA input is translated
to an IPA output and then an IPA input is translated to a PA output. The translation
control unit (TCU) of the SMMU performs translation table walks for each stage of
translation. An example of a combined translation could be:
- A non-secure operating system defines the stage 1 translations for application level and operating system level operation. It does this assignment assuming it is mapping from the VAs used by the processors to the PAs in the physical memory system. However, it actually maps from VAs to IPAs.
- The hypervisor defines the stage 2 address translations that map the IPAs to PAs. It does this as part of its virtualization of one or more non-secure guest operating systems.
Each stage of translation can require multiple translation table lookups or levels of address lookups. The SMMU can also modify memory attributes from stage 1 to stage 2 translation. You can also program the SMMU to disable or bypass a stage translation and modify the memory attributes of that disabled or bypassed stage.
Exception Levels
- EL0 has the lowest software execution privilege, and execution in EL0 is called unprivileged execution. This execution level may be used for application software.
- EL1 provides support for operating systems.
- EL2 provides support for processor virtualization or hypervisor mode.
- EL3 provides support for the secure monitor.
Security State
- Secure state:
- The processor can access both the secure memory address space and the non-secure memory address space.
- When executing at EL3, the processor can access all the system control resources.
- Non-secure state:
- The processor can access only the non-secure memory address space.
- The processor cannot access the secure system control resources.
Depending on the security state, only certain exception levels are allowed.
Exception Level | Non-secure State | Secure State |
---|---|---|
EL0 | Yes | Yes |
EL1 | Yes | Yes |
EL2 | Yes | No |
EL3 | No | Yes |
Translation Regimes
The figure below shows the supported translation regimes when EL3 is using AArch64. The non-secure EL1and EL0 translation regime comprises two stages of translation. All other translation regimes shown below comprise only a single stage of translation.
Translation Buffer Unit
The FPGA TBU caches page table walk results for FPGA-issued accesses to the FPGA-to-HPS bridge. Details of the FPGA TBU configuration are shown in the table below.
Parameter | FPGA TBU | Peripheral Master TBUs |
---|---|---|
AXI data bus width | 128 bits | 64 bits |
Write buffer depth | 8 entries | 8 entries |
TLB depth | 128 entries | 32 entries |
TBU queue depth | 8 entries | 8 entries |
The Cortex® -A53 MPCore™ has its own TBU configuration. Details on this TBU can be found in the Cortex® -A53 MPCore™ chapter.
Micro Translation Lookaside Buffer
The micro TLB in the TBU caches the page table walk (PTW) results returned by the TCU. The TBU compares the PTW results of incoming transactions with the entries in the micro TLB before performing a TCU PTW.
Translation Control Unit
The TCU cache consists of macro TLB, prefetch buffers, IPA to PA support and PTW caches.
The prefetch buffer fetches pages up to 16 KB in size. The prefetch buffer is a single four-way associative cache that you can enable or disable depending on the context.
Macro Translation Lookaside Buffer
Security State Determination
- A transaction is either secure or non-secure depending on the value of the APROT[1] signal.
- The stream has an assigned SSD security state that determines whether secure or non-secure software controls the stream.
Each transaction is classified through a security state determination (SSD) as either SSD secure or SSD non-secure. The current bus transaction provides an SSD_index that points to a bit in the smmu_ssd_reg_* registers. For a given transaction, the device is either SSD secure or SSD non-secure. This bit determines the SSD security state.
For an SSD secure transaction, the APROT[1] signal can indicate whether it is secure or non-secure and the information is generally passed downstream. However, an SSD non-secure transaction is forced by the SMMU to indicate non-secure transaction in the APROT[1] signal on the downstream. For each SSD security state, set the SMMU_SCR0.CLIENTPD bit field if you want all transactions to bypass the translation process of the SMMU.
Stream ID
Each transaction is also classified by a 10-bit stream ID. The stream ID represents a set or stream of transactions from a particularmaster device. All transactions in a stream are subject to the same translation process. For example, the DMA controller may have multiple independent threads of execution that each form a different stream and can be subject to different translations. Alternatively, the peripheral masters on the system interconnect may share a single stream ID. Transactions from these devices can only be translated as a single entity. The TCU matches the stream ID against a set of stream match registers, SMMU_SMRx. The SSD security state determines the set of registers that are used. The secure software can partition the set into a non-secure set for use by SSD non-secure transactions and a secure set for use by SSD secure transactions. The stream matching process results in the following possible outcomes:
- No matches: If no matches are found, you can select whether transactions bypass the SMMU.
- Multiple match: If multiple SMMU_SMRn matches are found, the SMMU faults the transactions. The fault detection for these transactions is imprecise.
- Single match: If only a single match is found, the corresponding SMMU_S2CRn for the SMMU_SMRn that matched is used to determine the required additional processing steps.
For each SMMU_SMRn, there is a corresponding SMMU_S2CRn that is used when only a single SMMU_SMRn matches. The SMMU_S2CRn.TYPE bit field determines one of the following results:
- Fault All transactions generate a fault. A client device receives a bus abort if SMMU_sCR0.GFRE == 1, otherwise the transaction acts as read-as-zero/write-ignored (RAZ/WI).
- Bypass transactions bypass the SMMU.
- Translate transactions are mapped to a context bank for additional processing. The SMMU_S2CRn.CBNDX bit field specifies the context bank to be used by the SMMU.
The second stage boot loader configures the stream ID for the SDM-to-HPS TBU interface. The FPGA-to-HPS interface provides its stream ID. You can specify the FPGA-to-HPS stream ID value in Intel® Quartus® Prime.
You can configure the stream ID for each HPS peripheral master through registers in the System Manager. The table below lists the peripheral masters, the corresponding System Manager register used to configure the stream ID and the specific bitfields that represent the stream ID. During a master access the stream ID source is provided as a part of the AxUSER[12:3] signals.
Master | System Manager Register | Register Bitfields Corresponding to stream ID[9:0] |
---|---|---|
EMAC0 | emac0_ace |
awsid[29:20] arsid[17:8] |
EMAC1 | emac1_ace |
awsid[29:20] arsid[17:8] |
EMAC2 | emac2_ace |
awsid[29:20] arsid[17:8] |
USB0 | usb0_l3master | hauser22_13[25:16] |
USB1 | usb1_l3master | hauser22_13[25:16] |
DMA | dma_l3master |
aruser[25:16] awuser[9:0] |
NAND | nand_axuser |
aruser[25:16] awuser[9:0] |
ETR | etr_l3master |
aruser[25:16] awuser[9:0] |
Quality of Service Arbitration
The TCU generates page table walks for all the TBUs. If there are multiple outstanding transactions in the TCU, the TBU with the highest quality of service (QoS) is given priority. For individual prefetch accesses, the SMMU uses the QoS value of the hit transaction. For transactions with the same QoS value, the SMMU translates the transactions in the order they occur. The QoS for each TBU is programmed in the System MMU TBU Quality of Service 0 (SMMU_TBUQOS0) register at offset 0x2100.
System Memory Management Unit Interrupts
Interrupt Type | GIC Interrupt Name(s) | Description |
---|---|---|
Global Fault Interrupt |
gbl_flt_irpt_s gbl_flt_irpt_ns |
The SMMU asserts the global fault interrupt when a fault is identified in the translation process before a context is mapped. The SMMU provides both a secure and non-secure global fault interrupt signal to the generic interrupt controller (GIC). |
Performance Monitoring Interrupt |
perf_irpt_FPGA_TBU perf_irpt_DMA_TBU perf_irpt_EMAC_TBU perf_irpt_IO_TBU |
The SMMU asserts this interrupt when a performance counter overflows. |
Combined Interrupt |
comb_irpt_ns comb_irpt_s |
The non-secure combined interrupt is the logical OR of glb_flt_irpt_ns, perf_irpt_<tbu_name> and cxt_irpt_<number> The secure combined interrupt is the logical OR of glb_flt_irpt_s, perf_irpt_<tbu_name> and cxt_irpt_<number> |
Context Interrupt | cxt_irpt_0 through cxt_irpt_31 | The SMMU asserts one of these interrupts when a context fault is detected. |
System Monitor Interrupt | sys_mon_0 through sys_mon_11 | Each TBU has a system monitor interrupt that it can assert when it detects a fault. |
System Memory Management Unit Reset
The SMMU resets on a power-on, cold, or warm reset. On any one of these resets:
- TCU caches are cleared
- TLB entries are invalidated
- System configuration registers return to their reset state, which may by
undefinedNote: You must reconfigure the SMMU for each transaction client after reset.
System Memory Management Unit Clocks
-
l3_main_free_clk is the clock
source for:
- EMAC 0/1/2 TBU
- The TBU that services USB 0/1, NAND, SD/MMC and ETR
- l4_main_clk is the clock source for the DMA TBU
System Memory Management Unit Configuration
- The 256-byte HPS-to-SDM mailbox that must be protected starts at address 0xFFA30000 and ends at 0xFFA300FF.
- Enable the SMMU to translate from virtual to physical address (stage 1 translation).
- Configure the page tables for your TBU so that it issues a
context fault if a master attempts to access the HPS-to-SDM mailbox range.
There are two ways you can communicate a page table context fault:
- Use interrupts that route through the generic interrupt controller (GIC). Set the CFIE bit of the TBU's context bank system control register (SMMU_CB*_SCTLR) to enable interrupt reporting of a context fault. Program your software to sample the corresponding context interrupt, cxt_irpt_*. Note that the CFIE bit clears on reset. The SMMU contains 32 context banks and 32 corresponding interrupts in the GIC.
- Generate a slave error on the AXI bus as the response sent back to the master. Set the CFRE bit of the context bank system control register (SMMU_CB*_SCTLR) to enable an abort bus error when a context fault occurs. Note that CFRE bit clears on reset.
System Memory Management Unit Address Map and Register Definitions
- You can access the complete address map and register definitions for this IP and the entire HPS through the Intel® Stratix® 10 Hard Processor System Programmer's Reference Manual.
- You can also access an HTML webhelp version of the Intel® Stratix® 10 Hard Processor System Address Map and Register Definitions by clicking either of these links:
System Interconnect
The components of the hard processor system (HPS) communicate with one another, and with other portions of the SoC device, through the system interconnect. The system interconnect consists of the following blocks:
- The main level 3 (L3) interconnect
- The SDRAM L3 interconnect
- The level 4 (L4) buses
The system interconnect is a highly efficient packet-switched network that supports high-throughput traffic. The system interconnect is the main communication bus for the MPU and all hard IP cores in the SoC device.
The system interconnect supports the following features:
- Configurable Arm® TrustZone® -compliant firewall and security support
- Multi-tiered bus structure to separate high bandwidth masters from lower bandwidth peripherals and control and status ports
- Access to an SDRAM hard memory controller in the FPGA fabric
- Programmable quality-of-service (QoS) optimization
- On-chip debugging and tracing capabilities
The system interconnect is based on the Arteris® FlexNoC™ network-on-chip (NoC) interconnect technology.
About the System Interconnect
The system interconnect has the following characteristics:
-
Arm®
TrustZone®
-compliant security firewalls
- For each peripheral, implements secure or nonsecure access
- Optionally configures individual transactions as secure or nonsecure at the initiating master
- For certain peripherals, optionally implements two levels of access: privileged or user
- Three tiers of connectivity:
- The main level 3 (L3) interconnect—Provides high-bandwidth routing between masters and slaves in the HPS.
- The SDRAM L3 interconnect—Provides access to a hard memory controller in the FPGA fabric. A multiport front end (MPFE) scheduler enables multiple masters in the SoC device to share the external SDRAM.
- The level 4 (L4)
buses—Independent buses handling:
- Data traffic for low- to mid-level bandwidth slave peripherals
- Accesses to peripheral control and status registers throughout the address map
- Quality of service (QoS) with three programmable levels of service on a per-master basis.
- Byte oriented address handling.
- Data bus width up to 128 bits.
System Interconnect Block Diagram and System Integration
- Connection points interface the NoC to masters and slaves of other HPS components
- Datapath switches transport data across the network, from initiator connection points to target connection points
- Service network allows you to update master and slave peripheral security features and access NoC registers
- L3 interconnect: moves high-bandwidth data between masters and slaves in the HPS.
- L4 buses: lower performance than the L3 interconnect. These buses connect mid-to-low performance peripherals.
The interconnect is also connected to the Cache Coherency Unit (CCU). The CCU provides additional routing between the MPU, FPGA-to-HPS bridge, L3 interconnect, and SDRAM L3 interconnect.
In addition to providing routing connectivity and arbitration between masters and slaves in the HPS, the NoC features firewall security, QoS mechanisms, and observation probe points throughout the interconnect.
System Interconnect High-Level View
Connectivity
Stratix 10 HPS Master-to-Slave Connectivity Matrix
The following table shows the connectivity of all the master and slave interfaces in the system interconnect.
Slaves | Masters | ||||
---|---|---|---|---|---|
DAP | CCU Master 2 | DMAC 3 | EMAC 0/1/2 | Peripheral Master 4 | |
CCU Slaves 5 | ● | ● | ● | ● | |
TCU | ● | ● | |||
L4 Main Bus Slaves | ● | ● | ● | ||
L4 MP Bus Slaves | ● | ● | |||
L4 AHB Bus Slaves | ● | ● | |||
L4 SP Bus Slaves | ● | ● | ● | ||
L4 SYS Bus Slaves | ● | ● | ● | ||
Secure/Non-Secure Timestamp System Counters | ● | ● | ● | ||
L4 ECC Bus Slaves | ● | ● | |||
DAP | ● | ● | ● | ||
STM | ● | ● | |||
Lightweight HPS-to-FPGA Bridge | ● | ● | ● | ● | ● |
HPS-to-FPGA Bridge | ● | ● | ● | ● | ● |
Service Network | ● | ● | |||
HPS-to-SDM – Peripheral Access (QSPI, NAND, SDMMC) | ● | ● | ● | ||
HPS-to-SDM – Mailbox Access | ● | ● |
- TBU for EMAC 0/1/2
- TBU for USB 0/1, NAND, SD/MMC, and ETR
- TBU for DMAC
Peripherals Connections
System Connections

HPS-to-FPGA Bridge

SDRAM Connections
The three FPGA-to-SDRAM ports connect to the SDRAM Scheduler, which gives FPGA masters the option of direct, non-coherent access to the SDRAM. All other masters have coherent access to the SDRAM through the CCU, including the MPU, FPGA-to-HPS bridge, and HPS peripheral DMA masters.
System Interconnect Architecture
Each system interconnect packet carries a transaction between a master and a slave. The interconnect provides interface widths up to 128 bits, connecting to the L4 slave buses and to HPS and FPGA masters and slaves.
The system interconnect provides low-latency connectivity to the following interfaces:
- HPS-to-FPGA bridge
- Lightweight HPS-to-FPGA bridge
- FPGA-to-HPS bridge
- Three FPGA-to-SDRAM ports
SDRAM L3 Interconnect Architecture
The SDRAM L3 interconnect is part of the system interconnect, and includes these components:
- SDRAM scheduler
- SDRAM adapter
The SDRAM L3 interconnect operates in a clock domain that is 1/2 the speed of the external SDRAM interface clock frequency.
Stratix 10 HPS Secure Firewalls
You can use the system interconnect firewalls to enforce security policies for slave and memory regions in the system interconnect.
The firewalls support the following features:
- For each peripheral, can implement secure or nonsecure access
- Can optionally configure individual transactions as secure or nonsecure at the initiating master
- For certain peripherals, can implement two levels of access: privileged or user
You can program the security configuration registers (SCRs) to set security policies and define which transactions the firewall allows. Transactions that the firewall blocks result in a bus error.
The HPS has the following firewalls:
- Peripheral
- System
- HPS-to-FPGA
- Lightweight HPS-to-FPGA
- Debug access port
- TCU
- SDRAM (includes DDR and DDR L3 firewalls)
About the Rate Adapter
About the SDRAM L3 Interconnect
The SDRAM L3 Interconnect contains an SDRAM scheduler and an SDRAM adapter. The SDRAM scheduler functions as a multi-port front end (MPFE) for multiple HPS masters. The SDRAM adapter is responsible for connecting the HPS to the SDRAM hard memory controller in the FPGA portion of the device. The SDRAM L3 interconnect is part of the system interconnect.
Features of the Stratix 10 HPS SDRAM L3 Interconnect
- Connectivity to the SDRAM
hard memory controller supporting:
- DDR4-SDRAM
- DDR3-SDRAM
- LPDDR3-SDRAM
- Integrated SDRAM scheduler, functioning as a multi-port front end (MPFE)
- Configurable
external
SDRAM
interface
data widths
- 16-bit, with or without 8-bit error-correcting code (ECC)
- 32-bit, with or without 8-bit ECC
- 64-bit, with or without 8-bit ECC
- High-performance ports
- CCU port supporting coherent accesses for MPU L2 Cache master, HPS peripheral DMA masters, and FPGA masters through the FPGA-to-HPS Bridge
- Three 32-, 64-, or 128-bit FPGA ports
- Per-port firewall security support
- 8-bit Single Error Correction, Double Error Detection (SECDED) ECC
SDRAM L3 Interconnect Block Diagram and System Integration
The SDRAM adapter is responsible for bridging the SDRAM scheduler to the hard memory controller (in the FPGA portion of the device). The adapter is also responsible for ECC generation and checking.
The ECC register interface provides control to perform memory and ECC logic diagnostics.
The SDRAM scheduler is a multi-port front end (MPFE) responsible for arbitrating collisions and optimizing performance in traffic to the SDRAM controller in the FPGA portion of the device.
The SDRAM L3 interconnect exposes three ARM Advanced Microcontroller Bus Architecture (AMBA®) Advanced eXtensible Interface (AXI™) ports to the FPGA fabric, allowing soft logic masters to access the SDRAM controller through the same scheduler unit as the MPU system complex and other masters within the HPS. The MPU has access to the SDRAM adapter's control interface to the hard memory controller.
The SDRAM L3 interconnect has a dedicated connection to the hard memory controller in the FPGA portion of the device. This connection allows the hard memory controller to become operational before the rest of the FPGA has been configured.
About the SDRAM Scheduler
The SDRAM scheduler supports the following masters:
- The CCU
- The FPGA-to-SDRAM bridges
The SDRAM scheduler arbitrates among transactions initiated by the masters, and determines the order of operations. The scheduler arbitrates among the masters, ensuring optimal interconnect performance based on configurable quality-of-service settings.
You can configure the SDRAM scheduler through the registers.
About Arbitration and Quality of Service
Arbitration and QoS logic work together to enable optimal performance in your system. For example, by setting QoS parameters, you can prevent one master from using up the interconnect's bandwidth at the expense of other masters.
The system interconnect supports QoS optimization through programmable QoS generators. The QoS generators are located on interconnect initiators, which correspond to master interfaces. The initiators insert packets into the interconnect, with each packet carrying a transaction between a master and a slave. Each QoS generator creates control signals that prioritize the handling of individual transactions to meet performance requirements.
Arbitration and QoS in the HPS system interconnect are based on the following concepts:
- Priority—Each packet has a priority value. The arbitration logic generally gives resources to packets with higher priorities.
- Urgency—Each master has an urgency value. When it initiates a packet, it assigns a priority equal to its urgency.
- Pressure—Each data path has a pressure value. If the pressure is raised, packets on that path are treated as if their priority was also raised.
- Hurry—Each master has a hurry value. If the hurry is raised, all packets from that master are treated as if their priority was also raised.
Proper QoS settings depend on your performance requirements for each component and peripheral, and for system performance as a whole. Intel recommends that you become familiar with QoS optimization techniques before you try to change the QoS settings in the HPS system interconnect.
About the Service Network
Through the service network, you can perform these tasks:
- Access internal interconnect registers
- Update master and slave peripheral security features
About the Observation Network
The observation network connects probes to the observer, which is a port in the CoreSight™ trace funnel. Through the observation network, you can perform these tasks:
- Enable error logging
- Selectively trace transactions in the system interconnect
- Collect HPS transaction statistics and profiling data
The observation network consists of probes in key locations in the interconnect, plus connections to observers. The observation network works across multiple clock domains, and implements its own clock crossing and pipelining where needed.
The observation network sends probe data to the CoreSight subsystem through the AMBA® Trace Bus (ATB) interface. Software can enable probes and retrieve probe data through the interconnect observation registers.
Functional Description of the Stratix 10 HPS System Interconnect
The system interconnect, in conjunction with the system MMU (SMMU), provides access to a 132-GB address space.
Address spaces are divided into one or more regions.
The following figure shows the relationships between the HPS address spaces. The figure is not to scale.
The table below shows the HPS address spaces and the masters that access those address spaces.
Name | Size | Type (Physical/Virtual) | Masters |
---|---|---|---|
MPU view of the HPS/MPU address map | 132 GB | P/V | MPU and FPGA-to-HPS bridge |
L3 NoC view of the HPS/MPU address map | 4 GB 6 | P | All L3 masters |
132 GB | V | All L3 masters, with SMMU enabled | |
FPGA Slaves region of the HPS/MPU address map | 4 GB | P | All masters accessing the HPS-to-FPGA bridge |
Lightweight FPGA Slave region of the HPS/MPU address map | 2 MB | P | All masters accessing the lightweight HPS-to-FPGA bridge |
FPGA to SDRAM Interface view of the DDR address map | 128 GB | P | FPGA masters accessing HPS SDRAM through the FPGA-to-SDRAM interfaces |
Stratix 10 System Interconnect Address Spaces
Each address space uses some or all of a 132-GB address range. Depending on the configuration, different address spaces are visible in different regions for each master.
There are several address spaces that overlap each other, giving masters access to common space such as shared memory or CSRs. Within a given address map, the space is contiguous and non-overlapping. No peripheral mappings overlap makes it unnecessary to segment the space.
HPS-to-FPGA Bridge Address Spaces
FPGA Slave Address Space
The FPGA slave address space provides access to soft components implemented in the FPGA core, through the HPS-to-FPGA bridge. The soft logic in the FPGA performs address decoding.
The L3 and MPU regions provide windows of 4 GB into the FPGA slave address space.
The lower 1.5 GB is accessible from 0x00_8000_0000 to 0x00_E000_0000 in the HPS system memory map.
The full 4 GB space is accessible starting at 0x20_0000_0000 in the HPS system memory map. Therefore, the lower 1.5 GB is mapped to two separate addresses in the HPS address space.

Lightweight FPGA Slave Address Map
The lightweight FPGA slave address space provides access to soft components implemented in the FPGA core through the lightweight HPS-to-FPGA bridge. The soft logic in the FPGA performs address decoding.
A portion of the peripheral region provides a window of 2 MB into the FPGA slave address space. The base address of the lightweight FPGA slaves window is mapped to address 0x0 in the FPGA slave address space.
Stratix 10 HPS L3 Address Space
The L3 address space is 132 GB with the SMMU enabled. This address space applies to all L3 masters.
All L3 address space configurations have the following characteristics:
- The peripheral region matches the peripheral region in the MPU address space, except that MPU private registers (SCU and L2) and the GIC are inaccessible.
- The FPGA slaves region is the same as the FPGA slaves region in the MPU address space.
- The DDR Memory region is the same as the memory region in the MPU address space
The L3 address space configurations contain the regions shown in the following figure:

- lws2f: Lightweight HPS-to-FPGA slaves region
- h2f_per: Peripherals region
Internal MPU registers (SCU and L2) are not accessible to L3 masters.
Cache coherent memory accesses have the same view of memory as the MPU.
SDRAM Window Regions
The L3 address map includes two SDRAM window regions, a 2-GB window and a 124-GB window. These windows provide access to all but 2 GB of the 128-GB SDRAM address space.HPS-to-FPGA Slaves Region
The HPS-to-FPGA slaves region provides access to 4 GB of slaves in the FPGA fabric through the HPS-to-FPGA bridge.Lightweight HPS-to-FPGA Slaves Region
The lightweight HPS-to-FPGA slaves provide access to slaves in the FPGA fabric through the lightweight HPS-to-FPGA bridge.Peripherals Region
The peripherals region includes slaves connected to the L3 interconnect and L4 buses.On-Chip RAM Region
The on-chip RAM region provides access to on-chip RAM. Although the on-chip RAM region is 1 MB, the physical on-chip RAM is only 256 KB.Stratix 10 MPU Address Space
The MPU address space is 132 GB, and applies to addresses generated by the MPU. MPU private registers (SCU and L2) and the GIC are visible only to the MPU. The MPU address map covers the entire HPS address map.
The MPU address space contains the following regions:
- The boot region, starting at 0x_FFE0_0000 in RAM
- The FPGA slaves window region, including the HPS-to-FPGA and lightweight HPS-to-FPGA regions
- The peripheral region
The FPGA-to-HPS bridge sees the same address space as the MPU, except for private registers (SCU and L2) and the GIC, which are visible only to the MPU.
HPS-to-FPGA Slaves Region
The HPS-to-FPGA slaves region provides access to slaves in the FPGA fabric through the HPS-to-FPGA bridge.
Lightweight HPS-to-FPGA Slaves Region
The lightweight FPGA slaves provide access to slaves in the FPGA fabric through the lightweight HPS-to-FPGA bridge.
Peripherals Region
The peripheral region addresses 144 MB at the top of the first 4 GB address space. The peripheral region includes all slaves connected to the L3 Interconnect, L4 buses, and MPU registers (SCU and L2). The on-chip RAM is mapped into the peripheral region.
This region provides access to internally-decoded MPU registers (SCU and L2).
Generic Interrupt Controller Region
The GIC region provides access to the GIC control and status registers.
SCU and L2 Registers Region
The SCU and L2 registers region provides access to internally-decoded MPU registers (SCU and L2).
Stratix 10 HPS SDRAM Address Space
The SDRAM address space is 128 GB. It is accessed through the FPGA-to-SDRAM interface from the FPGA. Note that the FPGA-to-SDRAM interface provides the only address map that can access the entire 128 GB memory range without gaps.
There are cacheable and non-cacheable views into the SDRAM space. Both views are managed by the CCU.
Secure Transaction Protection
The system interconnect provides two levels of secure transaction protection:
- Security firewalls—Enforce secure read and write transactions.
- Privilege filter—Leverages the firewall mechanism and provides additional security by filtering the privilege level of L4 slave transactions. The privilege filter applies to writes only.
All slaves on the SoC are placed behind a security firewall. A subset of slaves are also placed behind a privilege filter. Transactions to these slaves must pass both a security firewall and the privilege filter to reach the slave.
Stratix 10 HPS System Interconnect Master Properties
The system interconnect connects to slave interfaces through the main L3 interconnect and SDRAM L3 interconnect.
Master | Interface Width | Clock | Security | SCR7 Access | Privilege | Issuance (Read/Write/Total) |
---|---|---|---|---|---|---|
AXI-AP | 32 | l4_mp_clk | TBD | TBD | TBD | 1/1/1 |
CCU_IOS | 64 | l3_main_free_clk | TBD | TBD | TBD | 32/32 |
DMA_TBU | 64 | l4_main_clk | TBD | TBD | TBD | 8/8/8 |
EMACx | 32 | l4_mp_clk | TBD | TBD | TBD | 16/16/32 |
EMAC_TBU | 64 | l3_main_free_clk | TBD | TBD | TBD | 32/32/64 |
ETR | 32 | cs_at_clk | TBD | TBD | TBD | 32/1/32 |
NAND | 32 | l4_mp_clk | TBD | TBD | TBD | 8/1/9 |
SD/MMC | 32 | l4_mp_clk | TBD | TBD | TBD | 2/2/4 |
USB | 32 | l4_mp_clk | TBD | TBD | TBD | 2/2/4 |
IO_TBU | 64 | l3_main_free_clk | TBD | TBD | TBD | 8/2/10 |
SDM_TBU | 64 | l3_main_free_clk | TBD | TBD | TBD | 1/1/1 |
Stratix 10 HPS Master Caching and Buffering Overrides
Some of the peripheral masters connected to the system interconnect do not have the ability to drive the caching and buffering signals of their interfaces. The system manager provides registers so that you can enable cacheable and bufferable transactions for these masters.
AHB Bus Signals | AHB Masters - preTBU | ||
---|---|---|---|
SDMMC | USB0 | USB1 | |
HPROT[3:0] | Sys Mgr8 | Sys Mgr | Sys Mgr |
HAUSER[0]: Allocate | Sys Mgr | Sys Mgr | Sys Mgr |
HAUSER[1]: Secure | Sys Mgr | Sys Mgr | Sys Mgr |
HAUSER[5:2]: Snoop | 3'b000 | 3'b000 | 3'b000 |
HAUSER[7:6]: Domain | Sys Mgr | Sys Mgr | Sys Mgr |
HAUSER[9:8]: Bar | 2'b00 | 2'b00 | 2'b00 |
*_ns | A*PROT[1] | A*PROT[1] | A*PROT[1] |
HAUSER[22:13]: xsid | Sys Mgr | Sys Mgr | Sys Mgr |
HAUSER[12:10]: USER ID | 3'b010 | 3'b011 | 3'b100 |
AXI Bus Signals | AXI Masters - preTBU | ||||||
---|---|---|---|---|---|---|---|
EMAC0 | EMAC1 | EMAC2 | NAND | DMAC | ETR | SDM2HPS BE | |
AxSNOOP | 3'b000 | 3'b000 | 3'b000 | 3'b000 | 3'b000 | 3'b000 | 3'b000 |
AxDOMAIN | Sys Mgr | Sys Mgr | Sys Mgr | Sys Mgr | Sys Mgr | Sys Mgr | Sec Mgr |
AxBAR | 2'b00 | 2'b00 | 2'b00 | 2'b00 | 2'b00 | 2'b00 | 2'b00 |
ARCACHE | Sys Mgr | Sys Mgr | Sys Mgr | Sys Mgr | DMAC | ETR | Sec Mgr |
AWCACHE | Sys Mgr | Sys Mgr | Sys Mgr | Sys Mgr | DMAC | ETR | Sec Mgr |
AxPROT | Sys Mgr | Sys Mgr | Sys Mgr | Sys Mgr | DMAC | ETR | SDM |
*_ns | A*PROT[1] | A*PROT[1] | A*PROT[1] | A*PROT[1] | A*PROT[1] | A*PROT[1] | A*PROT[1] |
AxUSER[12:3]: xsid | Sys Mgr | Sys Mgr | Sys Mgr | Sys Mgr |
Sys Mgr[9:4], A*ID[3:0] |
Sys Mgr | Sec Mgr |
AxUSER[2:0]: USER ID | 3'b000 | 3'b001 | 3'b010 | 3'b000 | 3'b000 | 3'b101 | USER[2:0] |
AXI Bus Signals | AXI Masters - postTBU | |||
---|---|---|---|---|
EMAC_TBU | DMA_TBU | IO_TBU | SDM_TBU | |
AxSNOOP | TBU | TBU | TBU | TBU |
AxDOMAIN | TBU | TBU | TBU | TBU |
AxBAR | TBU | TBU | TBU | TBU |
ARCACHE | TBU | TBU | TBU | TBU |
AWCACHE | TBU | TBU | TBU | TBU |
AxPROT | TBU | TBU | TBU | TBU |
*_ns | n/a | n/a | n/a | n/a |
xsid | n/a | n/a | n/a | n/a |
AxUSER[7:0]: USER ID | 5'b10100, USER[2:0] | 5'b10000, USER[2:0] | 5'b11100, USER[2:0] | 5'b00110, USER[2:0] |
AXI Bus Signals | AXI Masters - noTBU | ||||
---|---|---|---|---|---|
SDM2HPS LL | FPGA2SDRAM0 | FPGA2SDRAM1 | FPGA2SDRAM2 | AXI-AP | |
AxSNOOP | 3'b000 | FPGA | FPGA | FPGA | AXI-AP |
AxDOMAIN | Sec Mgr | FPGA | FPGA | FPGA | AXI-AP |
AxBAR | 2'b00 | FPGA | FPGA | FPGA | AXI-AP |
ARCACHE | Sec Mgr | FPGA | FPGA | FPGA | AXI-AP |
AWCACHE | Sec Mgr | FPGA | FPGA | FPGA | AXI-AP |
AxPROT | SDM | FPGA | FPGA | FPGA | AXI-AP |
*_ns | n/a | n/a | n/a | n/a | n/a |
xsid | n/a | n/a | n/a | n/a | n/a |
AxUSER[7:0]: USER ID | 5'b00100, USER[2:0] | 8'b11100000 | 8'b11100001 | 8'b11100010 | 8'b01100000 |
At reset time, some of the masters in the tables above do not provide their own cache and buffering signals. For these masters, at reset time, the system manager drives the cache and buffering signals low. In other words, these masters do not support cacheable or bufferable accesses until you enable them after reset. There is no synchronization between the system manager and the system interconnect, so avoid changing these settings when any of the masters are active.
Stratix 10 HPS Cacheable Transfer Routing
Masters on the L3 system interconnect can initiate coherent transactions in the interconnect's slave address range. For example, as a system designer you can connect an SDRAM interface in the SoC-to-FPGA address range, and ensure coherent access for all masters.
To initiate a coherent transaction, set A*DOMAIN to 2'b01 (inner shareable) or 2'b10 (outer shareable). When it sees any transaction marked shareable, the interconnect logic routes it to the CCU, regardless of the transaction address.
Stratix 10 HPS System Interconnect Slave Properties
The system interconnect connects to various slave interfaces through the main L3 interconnect, the SDRAM L3 interconnect, and the L4 peripheral buses.
System Interconnect Clocks
Clock Domains
All clocks within a domain are synchronous with each other.
Main Clock Domain
The main domain is largest synchronous domain in the interconnect, containing most of the datapath. The main domain generally consists of a single free-running clock and divided clocks with enables. Resets in the main domain depend on clock groups. Each clock group in the table below uses a single reset. Paths crossing different groups also cross asynchronous reset domains.
Group | Clock | Clock Divider | Reset | Usage |
---|---|---|---|---|
main | l3_main_free_clk | - | l3_rst_n | clocks most of the interconnect datapath |
l4_main_clk | 1 | DMAC and SPI | ||
l4_mp_clk | 2 | EMAC, SDMMC, NAND, USB, ECC | ||
l4_sp_clk | 4 | L4_SP bus | ||
l4_sys_clk | 4 | L4_SYS bus | ||
syscfg | l4_sys_clk | 4 | syscfg_rst_n | L4_SHR and L4_SEC busses |
dbg | cs_at_clk | 1 | dbg_rst_n | CoreSight |
cs_pdbg_clk | 2 | CoreSight |
Lightweight HPS-to-FPGA Clock Domain
The lightweight HPS-to-FPGA domain is used only by the lightweight HPS-to-FPGA bridge. The clock is sourced from the FPGA, and is asynchronous to all other clocks.
Group | Clock | Enables | Nominal Ratio | Reset | Usage |
---|---|---|---|---|---|
— | lws2f_clk | — | — | lws2fgpa_bridge_rst_n | — |
HPS-to-FPGA Clock Domain
The HPS-to-FPGA domain is used purely by the HPS-to-FPGA bridge. The FPGA drives the HPS-to-FPGA clock, which is asynchronous to all other clocks.
Group | Clock | Enables | Nominal Ratio | Reset | Usage |
---|---|---|---|---|---|
— | soc2fpga_clk | — | — | soc2fgpa_bridge_rst_n | — |
Stratix 10 HPS System Interconnect Resets
The diagram below shows the reset domains of the system interconnect along with all the idle handshake signals that control the state of each domain. The driver of each idle handshake signal is also indicated in brackets.
The majority of the system interconnect (most masters, slaves, datapaths and routers) are reset by l3_rst_n. Almost all transactions in the system interconnect are routed through the l3_rst_n domain. For full functionality of the system interconnect, l3_rst_n must be out of reset.
Functional Description of the Rate Adapters
The rate adapter module, noc_mpu_m0_L4_MP_rate_ad_main_RateAdapter, is positioned between datapaths clocked by l3_main_free_clk and datapaths clocked by the divided-down clocks l4_mp_clk, l4_sp_clk, and l4_sys_clk. At these bandwidth discontinuities, the rate adapter ensures efficient use of interconnect data pathways.
Functional Description of the Firewalls
Security
System Interconnect Firewalls and Slave Security
To change the security state of a slave requires a secure write to the appropriate SCR.
Firewalls check the secure bit of a transaction against the secure state of the slave. A transaction that passes the Firewall proceeds normally to the slave. A transaction that fails the Firewall results an error response with data set to 0. Transactions that fail the firewall are never presented to the slave interface.
The SCRs, implemented in the system interconnect, control the security state of each slave. The SCR is an internal target on the system interconnect, accessed through the service network. You can configure the slave security state on a per-master basis. This means that the SCR associated with each slave contains multiple secure state bits, one for each master allowed to access it.
Firewalls work in the following order:
- Based on the transaction's destination slave, fetch the entire slave SCR.
- Based on the transaction's originating master, read the master-specific secure bit in the SCR.
- Compare the secure bit with the transaction's secure attribute to determine if the transaction should pass the firewall.
The table below shows how the secure state of a slave is used with the transaction security bit to determine if a transaction passes or fails.
Transaction Security Bit | Slave Security State (SCR) | Result |
---|---|---|
0–Non-Secure | 0–Secure | Fail: simulate successful response |
1–Secure | 0–Secure | Pass: transaction sent to target |
0–Non-Secure | 1–Non-Secure | Pass: transaction sent to target |
1–Secure | 1–Non-Secure | Pass: transaction sent to target |
Stratix 10 HPS Slave Security
The system interconnect enforces security through the slave settings. The slave settings are controlled by the NoC Security Control Register (SCR) in the service network.
Firewalls protect certain L3 and L4 slaves. Each of these slaves has its own security check and programmable security settings. After reset, every slave of the system interconnect is in a secure state. This feature is called "boot secure". Only secure masters can access secure slaves.
The NoC implements five firewalls to check the security state of each slave, as listed in the following table. At reset time, all firewalls default to the secure state.
Name | Function |
---|---|
l4_per_fw | |
l4_sys_fw | |
Lightweight HPS-to-FPGA Firewall | Controls access through the lightweight HPS-to-FPGA bridge |
soc2fpga_fw | |
TCU Firewall | Controls access to the TCU. The system interconnect interfaces to the TCU through a 64-bit AXI bus. |
DAP Firewall | Controls access to the CoreSight APB DAP |
Peripherals Firewall | Filter access to slave peripherals
(SPs) in the following buses:
|
System Firewall | Filter access to system peripherals in
the following components:
|
HPS-to-FPGA Firewall | Filter access to FPGA through the HPS-to-FPGA bridge. |
DDR and DDR L3 Firewalls | Filter access to DDR SDRAM |
In addition to the firewalls listed above, the following slaves are protected by firewalls implemented outside the system interconnect:
Slave Name | Comment |
---|---|
DDR Scheduler and HMC Configuration Register | Firewall in SDRAM interconnect |
Cache Coherency Unit Register Bus (Regbus) | Only accessible by Privileged & Secure Transaction |
On-chip RAM Module - 256KB | Firewall in CCU |
At reset, the privilege filters are configured to allow certain L4 slaves to receive only secure transactions. Software must either configure bridge secure at startup, or reconfigure the privilege filters to accept nonsecure transactions.
To change the security state, you must perform a secure write to the appropriate SCR register of a secure slave. A nonsecure access to the SCR register of a secure slave triggers a bus error.
The following slaves are not protected by firewalls:
Slave Name | Comment |
---|---|
GIC | The GIC implements its own security extensions |
STM | STM implements its own master security through master IDs |
L4_GENTS (Generic TimeStamp) | Fixed Secure/Non-Secure by interconnect, no configuration required. |
Stratix 10 HPS Master Security
All masters on the system interconnect are expected to drive the Secure bit attribute for every transaction.
Master | Secure bit | Secure State | Non Secure State | Source |
---|---|---|---|---|
AXI-AP | A*PROT[1] | 0 | 1 | Driven by AXI-AP |
CCU_IOS | A*PROT[1] | 0 | 1 | Driven by CCU (transported from MPU and FPGA2SOC) |
DMAC | A*PROT[1] | 0 | 1 | Driven by DMAC |
EMACx | A*PROT[1] | 0 | 1 | Driven by Sys Mgr |
EMAC_TBU | A*PROT[1] | 0 | 1 | Driven by TBU (transported from EMAC or page table attribute) |
ETR | A*PROT[1] | 0 | 1 | Driven by ETR |
ETR_TBU | A*PROT[1] | 0 | 1 | Driven by TBU (transported from ETR or page table attribute) |
NAND | A*PROT[1] | 0 | 1 | Driven by Sys Mgr |
SD/MMC | HA*USER[1] | 0 | 1 | Driven by Sys Mgr |
USB | HA*USER[1] | 0 | 1 | Driven by Sys Mgr |
IO_TBU | A*PROT[1] | 0 | 1 | Driven by TBU (transported or page table attribute) |
SDM_TBU | A*PROT[1] | 0 | 1 |
Driven by TBU (transported from page table attribute) |
Accesses to secure slaves by unsecure masters result in a bus error.
Functional Description of the SDRAM L3 Interconnect
The SDRAM L3 interconnect consists of two main blocks, serving the following two main functions:
- The SDRAM scheduler provides multi-port scheduling between the SDRAM L3 interconnect masters and the hard memory controller in the FPGA portion of the SoC. The SDRAM L3 interconnect is mastered by the MPU, the main L3 interconnect, and FPGA-to-SDRAM ports.
- The SDRAM adapter provides connectivity between the SDRAM L3 interconnect masters and the hard memory controller.
The SDRAM L3 interconnect also includes firewalls that can protect regions of SDRAM from unauthorized access.
The hard memory controller is physically located in the FPGA portion of the device, and therefore it is in a separate power domain from the HPS. The HPS cannot use the SDRAM L3 interconnect until the FPGA portion is powered up and the FPGA I/O bitstream is configured.
Functional Description of the Stratix 10 HPS SDRAM Scheduler
The SDRAM scheduler manages transactions to the memory access regions in the SDRAM. These memory regions are defined by the SDRAM L3 firewalls. The second-stage bootloader is expected to program the scheduler with the correct timings to implement optimal access patterns to the hard memory controller.
The SDRAM scheduler has the following features:
- Input connections:
- One 128-bit connection from the CCU
- Up to three 128/64/32-bit connections from the FPGA
- Single 256-bit connection to the
SDRAM L3 adapter
- Capable of issuing transactions at the memory device line rate
- Traffic is comprised of aggregate inputs
Monitors for Mutual Exclusion
The process for a mutually-exclusive access is as follows:
- A master attempts to lock a memory location by performing an exclusive read from that address.
- The master attempts to complete the exclusive operation by performing an exclusive write to the same address location.
- The exclusive write
access is signaled as:
- Failed if another master has written to that location between the read and write accesses. In this case the address location is not updated.
- Successful otherwise.
To support mutually-exclusive accesses, the memory must be configured as normal memory, shareable, or non-cacheable.
Exclusive Access Support
To ensure mutually exclusive access to shared data, use the exclusive access support built into the SDRAM scheduler. The AXI buses that interface to the scheduler provide ARLOCK[0] and AWLOCK[0] signals. The scheduler uses these signals to arbitrate for exclusive access to a memory location. The SDRAM scheduler contains six monitors. The following exclusive-capable masters can use any of the monitors:
- CPU 0
- CPU 1
- CPU 2
- CPU 3
- FPGA-to-HPS bridge
- FPGA-to-SDRAM0 port
- FPGA-to-SDRAM1 port
- FPGA-to-SDRAM2 port
Each master can lock only one memory location at a time.
Arbitration and Quality of Service in the SDRAM Scheduler
Each master on the SDRAM scheduler has software-programmable QoS signals. These signals are propagated to the scheduler and used as arbitration criteria for access to SDRAM.
For information about programming quality of service for the FPGA-to-SDRAM masters, refer to "Functional Description of the QoS Generators" and "Configuring the Quality of Service Logic".
Functional Description of the SDRAM Adapter
The SDRAM adapter provides the following functionality:
- Connects the OCP master to the hard memory controller
- ECC generation, detection, and correction
- Operates at memory half rate
- Matches interface frequency of the single port memory controller in the FPGA
- Connectivity to the MPU, main L3 interconnect, and FPGA undergo clock crossing
ECC
SDRAM Adapter Interrupt Support
The SDRAM adapter supports the following three interrupts:
- The status
interrupt
occurs when:
- Calibration is complete.
- The ECC is unable to schedule an auto-correction write-back to memory. This occurs only when the auto-write-back FIFO buffer is full.
- The ECC read-back interrupt occurs when the ECC hardware detects a single-bit error in the read data. When this happens, hardware corrects the data and returns it to the interconnect.
- The double-bit or fatal
error interrupt occurs when any of the following three errors happens:
- The ECC hardware detects a double-bit error in the read data, which cannot be corrected.
- The ECC hardware detects a single-bit error in the address field. This means that the adapter is returning data that is free from errors, but is not the requested data. When this happens, the adapter returns a data error along with the data.
- Any of the DDR4 devices
have triggered their ALERT pins.
- Parity check failed on the address or command
- Write data CRC check failed
- Cannot gracefully recover because SDRAMs are not providing feedback on failure case
SDRAM Adapter Clocks
SDRAM L3 Firewalls
The SDRAM L3 firewalls define memory access regions in the SDRAM. Each SDRAM L3 interconnect master has its own memory access regions, independent of the other masters. The following block diagram shows the connectivity of the SDRAM L3 firewalls:
The firewalls define whether each memory access region is protected or unprotected relative to its master. You can configure the size of each memory region between 64 KB and 128 KB, on 64 KB boundaries. The following table lists the number of available memory access regions for each master.
SDRAM Master | Number of Memory Access Regions |
---|---|
MPU | 8 |
I/O
coherent masters:
|
7 |
FPGA-to-SDRAM port 0 | 4 |
FPGA-to-SDRAM port 1 | 4 |
FPGA-to-SDRAM port 2 | 4 |
The SDRAM L3 interconnect regulates access to the hard memory controller with the firewalls, which support secure regions in the SDRAM address space. Accesses to the SDRAM pass through the firewalls and then through the scheduler.
SDRAM L3 Interconnect Resets
The reset signal l3_rst_n resets the system interconnect and the SDRAM L3 interconnect, but not the hard memory controller.
When you instantiate the HPS component, Platform Designer automatically connects the hard memory controller's reset signal to the SDRAM L3 interconnect.
Soft logic in the FPGA must support the global_reset_n signal correctly. Refer to the Instantiating the HPS Component chapter for information about global_reset_n.
To optionally preserve the contents of the SDRAM on reset, refer to "Reset Handshaking" in the Reset Manager chapter of the Stratix 10 Hard Processor System Technical Reference Manual.
Functional Description of the Arbitration Logic
The system interconnect contains arbitration nodes at each point where multiple packets might demand the same resource. Each arriving packet has a priority. When multiple packets of different priorities arrive simultaneously at a node, the arbitration logic grants the resource to the highest-priority packet.
If there are simultaneous packets with the same priority, the system interconnect uses a simple round-robin (least recently used) algorithm to select the packet.
Each packet's priority is determined by urgency, pressure, and hurry, as described in "QoS Mechanisms".
Functional Description of the Observation Network
Stratix 10 HPS Interconnect Probes
The system interconnect includes the probes shown in the following table.
Probe Name | Location (Connection Point Name) | Packet Tracing/Statistic | Transaction Profiling | Probe ID |
---|---|---|---|---|
CCU | ccu_ios_p1Resp | Yes | No | 3 |
SOC2FPGA | soc2fgpa_probe_linkResp | Yes | No | 2 |
lwsoc2fgpa_probe_linkResp | Yes | No | ||
EMAC | emac_probe_linkResp | Yes | No | 1 |
emac_tbu_m | No | Yes |
The probe ID is available on the ATB trace bus.
Stratix 10 Packet Tracing, Profiling, and Statistics
Tracing can be fine-tuned through a Filter or a combination of Filters. Packets matching filter criteria will be sent to the observer (to be forwarded to the ATB bus). Trace alarms can also be raised when packets match filter criteria. Trace Alarms are software readable registers on the system interconnect.
Statistics collection is done by setting up counters that track the number of packets matching certain criteria going through a link. Alarms can also be set when a statistic count hit a certain level.
The following table shows how each packet probe is configured.
Probe | nFilter | Filter On Enabled Bytes | Payload Tracing | nStatistics Counter | wStatistics Counter | Statistics Counter Alarm | Cross Trigger |
---|---|---|---|---|---|---|---|
CCU | 2 | FALSE | FALSE | 4 | 16 | TRUE | CoreSight |
SOC2FPGA | 2 | FALSE | FALSE | 4 | 16 | TRUE | CoreSight |
EMAC | 2 | FALSE | FALSE | 4 | 16 | TRUE | CoreSight |
Packet Filtering
Filters can perform the following tasks:
- Select which packets the observation network routes to CoreSight
- Trigger a trace alarm when a packet meets specified criteria
Statistics Collection
Stratix 10 EMAC Transaction Profiling
The EMAC0 transaction probe is configured as shown in the following table.
Width of counters | 10 bits |
Available delay thresholds | 64, 128, 256, 512 |
Available pending transaction count thresholds | 2, 4, 8 |
Number of comparators | 3 |
Profiling Transaction Latency
In latency mode (also called delay mode), one of the four delay threshold values can be chosen for each comparator. The threshold values represent the number of clock cycles that a transaction takes from the time the request is issued to the time the response is returned.
Profiling Pending Stratix 10 EMAC Transactions
In pending transaction mode, three transaction count threshold values are available for each comparator. The threshold values represent the number of requests pending on the EMACs.
Packet Alarms
The following types of alarms are available:
- Trace alarms—You can examine trace alarms through the system interconnect registers.
- Statistics alarms
Error Logging
Configuring the System Interconnect
Configuring the Rate Adapter
You can configure the rate adapter using the L4_MP_rate_ad_main_RateAdapter_Rate register. The default setting of L4_MP_rate_ad_main_RateAdapter_Rate is 0x100.
Configuring the SDRAM Scheduler
Stratix 10 HPS FPGA Port Configuration
You enable and disable ports, and configure the FPGA-to-SDRAM (F2SDRAM) ports to data widths of 32, 64, or 128 bits.
Memory Timing Configuration
The following lists the handoff information used to control the SDRAM scheduler:
- The scheduler is aware of the SDRAM timing so that it can guide traffic into the hard memory controller.
- This information is not used to control the subsystem that sets up memory timings in the hard memory controller hardware.
Configuring the Hard Memory Controller
SDRAM Adapter Memory Mapped Registers
Hard Memory Controller Memory Mapped Registers
Peripheral Region Address Map
Identifier | Slave Description(s) | Base Address(es) | Size(s) | Privilege/Security | Bus |
---|---|---|---|---|---|
FPGASLAVES | FPGA Slaves via |