The Intel® Stratix® 10 device family introduces the Intel® Hyperflex™ core architecture.
The Intel® Stratix® 10 LAB contains Intel® Hyperflex™ registers and other features designed to facilitate retiming. Intel® Hyperflex™ registers are available in ALMs and carry chain. As shown in the Stratix 10 ALM Connection Details figure, the Intel® Hyperflex™ registers are located on the synchronous clear and clock enable inputs to increase or reduce effective path delay. All the Intel® Hyperflex™ registers can be enabled and are controlled by the Intel® Quartus® Prime software during retiming.
The following sections describe the LAB and ALM for Intel® Stratix® 10 devices.
The LABs are configurable logic blocks that consist of a group of logic resources. Each LAB contains dedicated logic for driving control signals to its ALMs. The MLAB is a superset of the LAB and includes all the LAB features. There are a total of 10 ALMs in each LAB, as shown in the LAB and MLAB Structure for Intel® Stratix® 10 Devices figure.
Each MLAB supports a maximum of 640 bits of simple dual-port SRAM. You can configure each ALM in an MLAB as a 32 (depth) x 2 (width) memory block, resulting in a configuration of 32 (depth) x 20 (width) simple dual-port SRAM block.
Each LAB can drive out 40 ALM outputs. Two groups of 20 ALM outputs can drive the adjacent LABs directly through direct-link interconnects.
The direct link connection feature minimizes the use of row and column interconnects, providing higher performance and flexibility.
The local interconnect drives the ALM inputs. ALM outputs, as well as column and row interconnects drive the local interconnect. Neighboring LABs, MLABs, M20K blocks, or digital signal processing (DSP) blocks from the left or right can also drive the LAB's local interconnect using the direct link connection.
There is a dedicated carry chain path between the ALMs. Intel® Stratix® 10 devices include an enhanced interconnect structure in LABs for routing carry chains for efficient arithmetic functions. These ALM-to-ALM connections bypass the local interconnect.
The Intel® Hyperflex™ registers are added to the carry chain to enable flexible retiming across a chain of LABs and the Intel® Quartus® Prime Compiler automatically takes advantage of these resources to improve utilization and performance.
Each LAB supports a single clock to drive the ALM registers in the LAB. The LAB supports two unique clock enable signals, as well as additional clear signals, for the ALM registers.
In addition, each LAB control block drives clock signals for the Hyper-Registers. There is a single clock for the Hyper-Registers on the local interconnect, and additional clocks for the Hyper-Registers located at the ALM inputs.
The LAB row clocks [5..0] and LAB local interconnects generate the LAB-wide control signals. A low skew clock network distributes global signals to the row clocks [5..0]. The MultiTrack interconnect consists of continuous, performance-optimized routing lines of different lengths and speeds used for routing efficiency. The Intel® Quartus® Prime Compiler automatically routes critical design paths on faster interconnects to improve design performance and optimizes the device resources.
LAB-wide signals control the logic for the ALM register's clear signal. The ALM register directly supports both a synchronous and an asynchronous clear. Each LAB supports up to two synchronous clear signals and two asynchronous clear signals, provided that the total number of clear signals is no greater than three.
Intel® Stratix® 10 devices provide a device-wide reset pin (DEV_CLRn) that resets all the registers in the device. You can enable the DEV_CLRn pin in the Intel® Quartus® Prime software before compilation. The device-wide reset signal overrides all other control signals.
The following sections cover the ALM resources, ALM output, and ALM operating modes.
Each ALM contains a variety of LUT-based resources that can be divided between two combinational adaptive LUTs (ALUTs), a two-bits full adder, and four registers.
With up to eight inputs for the two combinational ALUTs, one ALM can implement various combinations of two functions. This adaptability allows an ALM to be completely backward-compatible with four input LUT architectures. One ALM can also implement a subset of eight input functions.
One ALM contains four programmable registers. Each register has the following ports:
- Data in
- Data out
- Clock enable
- Synchronous clear
- Asynchronous clear
Global signals, general-purpose I/O (GPIO) pins, or any internal logic can drive the clock enable signal, clock, and asynchronous or synchronous clear control signals of an ALM register. The clock enable signal has priority over synchronous reset signal.
For combinational functions, the registers are bypassed and the output of the look-up table (LUT) and adders drives directly to the outputs of an ALM.
The general routing outputs in each ALM drive the local, row, and column routing resources. Four ALM outputs can drive column, row, or direct link routing connections.
The LUT, adder, or register output can drive the ALM outputs. Both the LUT or adder and the ALM register can drive out of the ALM simultaneously.
Register packing improves device utilization by allowing unrelated register and combinational logic to be packed into a single ALM. Another mechanism to improve fitting is to allow the register output to feed back into the LUT of the same ALM so that the register is packed with its own fan-out LUT. The ALM can also drive out registered and unregistered versions of the LUT or adder output.
The following figure shows the Intel® Stratix® 10 ALM connectivity. In the Intel® Quartus® Prime Resource Property Editor, the entire ALM connection is simplified. Some routing will be routed internally by the Intel® Quartus® Prime software.
The Intel® Stratix® 10 ALM operates in any of the following modes:
- Normal mode
- Extended LUT mode
- Arithmetic mode
Normal mode allows two functions to be implemented in one Intel® Stratix® 10 ALM, or a single function of up to six inputs.
Up to eight data inputs from the LAB local interconnect are inputs to the combinational logic.
The ALM can support certain combinations of completely independent functions and various combinations of functions that have common inputs. The Intel® Quartus® Prime Compiler automatically selects the inputs to the LUT. ALMs in normal mode support register packing.
The following figure shows a combination of different input connections for the LUT mode. In your design, the Intel® Quartus® Prime software may assign different input namings during compilation.
Combinations of functions with fewer inputs than those shown are also supported. For example, combinations of functions with the following number of inputs are supported.
- 4 and 3
- 3 and 3
- 3 and 2
- 5 and 2
For the packing of two 5-input functions into one ALM, the functions must have at least two common inputs. The common inputs are dataa and datab. The combination of a 4-input function with a 5-input function requires one common input (either dataa or datab).
In a sparsely used device, functions that could be placed in one ALM may be implemented in separate ALMs by the Intel® Quartus® Prime software to achieve the best possible performance. As a device begins to fill up, the Intel® Quartus® Prime software automatically uses the full potential of the Intel® Stratix® 10 ALM. The Intel® Quartus® Prime Compiler automatically searches for functions using common inputs or completely independent functions to be placed in one ALM to make efficient use of device resources. In addition, you can manually control resource use by setting location assignments.
You can implement any three to six input function using the following inputs:
- dataa and datab—whereby dataa and datab are shared across both LUTs to provide flexibility to implement a different function in each LUT.
Both dataa and datab inputs support the register packing feature. If you enable the register packing feature, both dataa and datab inputs or either one of the inputs bypass the LUT and directly feed into the register, depending on the packed register mode used. For Intel® Stratix® 10 devices, the following types of packed register modes are supported:
- 5-input LUT with 1 packed register path
- Two 3-input LUTs with 2 packed register paths
The 3-input LUT with 2 packed register paths is illustrated in the 3-Input LUT Mode Function in Normal Mode figure. For Intel® Stratix® 10 devices, the 6-input LUT mode does not support the register packing feature.
Certain 8-input functions can be implemented in a single ALM using all the LUT inputs:
In the 8-input extended LUT mode, the packed register mode is supported, provided that the packed register shares a dataa or datab input with the 8-input LUT.
The ALM in arithmetic mode uses two sets of two 4-input LUTs along with two dedicated full adders. The dedicated adders allow the LUTs to perform pre-adder logic. Therefore, each adder can add the output of two 4-input functions.
Arithmetic mode also offers clock enable, counter enable, synchronous up and down control, add and subtract control, and synchronous clear.
The clear and clock enable options are LAB-wide signals that affect all registers in the LAB. You can individually disable or enable these signals for each register. The Intel® Quartus® Prime software automatically places any registers that are not used by the counter into other LABs.
The carry chain provides a fast carry function between the dedicated adders in the arithmetic mode.
The 2-bit carry select feature in Intel® Stratix® 10 devices splits the propagation delay of carry chains with the ALM. Carry chains can begin in either the first ALM or the sixth ALM in a LAB. The final carry-out signal is routed to an ALM, where it is fed to local, row, or column interconnects.
To avoid routing congestion in one small area of the device when a high fan-in arithmetic function is implemented, the LAB can support carry chains that only use the bottom half of the LAB before connecting to the next LAB. You can use the available top half of the ALMs in the LAB to implement narrower fan-in functions in the normal mode. Carry chains that use the bottom five ALMs in the first LAB carry into the bottom half of the ALMs in the next LAB within the column. The behavior is the same for both the LAB and the MLAB columns.
The Intel® Quartus® Prime Compiler creates carry chains longer than 20 ALUTs (10 ALMs in arithmetic) by linking LABs together. For an enhanced fitting, a long carry chain runs vertically, allowing fast horizontal connections to the TriMatrix memory and DSP blocks.