Optimizing system power consumption in systems using 28-nm FPGAs presents novel challenges for the system design team. One source of these challenges is the growing importance of power-management strategies. Another source is the growing influence of decisions made at the chip level over power consumption in other parts of the system. Design of the FPGA and the rest of the system must go hand in hand to minimize system power.
Understanding Use Profiles
For many years IC power consumption was relatively straightforward. In each successive generation power decreased, and dynamic power always dominated, so there was no particular need to be concerned about static power, duty cycles, or sleep modes. But by the 28-nm FPGA generation static power has become a major portion of FPGA power consumption. Active management of FPGA power is necessary to minimize the big chip's contribution to system power consumption, and this undertaking influences many other aspects of the system design.
Figure 1. Total Power Breakdown Across Various High-End FPGA Customer Designs
This fact has a very important implication—to manage power effectively, to make performance-power trade-offs, and determine sleep-mode tactics, the system design team must have accurate understanding of the activity profiles for individual system tasks. That knowledge in turn requires an accurate understanding of how the end-user will use the system. Which functions must have high duty cycles, which must have high performance, and which may be suspended for long periods? In a technology in which higher performance implies higher static power, but in which clocks may be gated and threshold voltages selected, these questions become vital.
Tracking System Implications
Given the design effort and level of integration involved, it is easy to focus power-optimization efforts on the system FPGA. But other parts of the system also consume power—this is not news to anyone. What is perhaps novel is that decisions made in the FPGA design can have significant impact on power outside the FPGA.
One case in point is integration. Functions integrated into the FPGA will reduce dynamic power by reducing chip crossings for the signals than enter and leave them. And in particular with hard intellectual property (IP) embedded in 28-nm FPGAs, the core power consumed by these functions may be significantly less than it would be for older generation external ICs. So increasing integration, and consequently the power consumption of the FPGA, may in fact decrease system power consumption.
Use of external memory, and in particular DRAM, can become a power issue as well. The very high throughput achievable in 28-nm FPGAs can create very high-bandwidth flows between the FPGA and external DRAM. The large number of different tasks executing concurrently in the FPGA may reduce the efficiency of DRAM access, further increasing memory power consumption.
The power regulators themselves raise another important question. As FPGA designs use increasingly dynamic power management, the loads on regulators can vary over time in complex and not necessarily predictable ways. This variation may influence regulator design, and it may mean that during at least some of its operating time a particular supply regulator will be operating outside its zone of best efficiency. To learn more about board-level optimizations, see the Reducing Power Consumption and Increasing Bandwidth on 28-nm FPGAs (PDF) white paper.
With an idea of system performance requirements, use profiles, and the sensitivity of system power to chip-level decisions, system designers using 28-nm FPGAs sometimes have a unique degree of freedom—a choice of process technology. At 28 nm there are several process variants available with quite different cost, static-power, dynamic-power, and speed characteristics. One FPGA vendor, Altera, passes this choice through to system designers by offering families built in different technology variants. To learn more, see the section “Tailored Power at 28 nm” in the Meeting the Low Power Imperative at 28 nm (PDF) white paper.
For Altera users, choosing an FPGA family selects a process technology, operating voltage range, and a set of hard IP libraries tailored to cover a specific portion of the performance-power-cost space. This choice provides a framework in which further dynamic and static power optimizations can be made.
FPGA users have unique flexibility in partitioning tasks between hardware and software. Given that software execution often consumes many times more power than hardware implementation of the same task, and given that in many cases the majority of the energy consumption of a task is concentrated in a few small kernels, hardware acceleration can be very effective at reducing dynamic power consumption
Embedded hard CPU cores offer another alternative, with the potential of substantially lower energy consumption per task than with soft CPU cores, while still permitting the use of accelerators. Similarly, tasks can employ power-efficient hard digital signal processing (DSP) blocks in place of programmable fabric to slash power on numerical accelerations. For an example of how to use signal-processing structures, see the Implementing FIR Filters and FFTs with 28-nm Variable-Precision DSP Architecture (PDF) white paper.
Another important area for dynamic power optimization at the architectural level is the use of the memory hierarchy. There are several considerations here. Since interconnect makes a significant contribution to FPGA power, locality and frequency of reference can both be important factors in deciding whether to use distributed RAM, configurable block RAM, or external memory. For example the ability to avoid high-bandwidth traffic to an external DRAM may have such high rewards in terms of system power consumption that it justifies rethinking algorithms, implementation strategies, or FPGA selection.
FPGAs in the 28-nm generation provide finer grained opportunities to save power as well. Most devices allow extensive clock gating. Altera FPGAs also allow threshold-voltage adjustment with moderate granularity. For details about programmable power technology, see the Programmable Power Technology section in the Reducing Power Consumption and Increasing Bandwidth on 28-nm FPGAs (PDF) white paper. Combining these techniques with accurate knowledge of use profiles, designers can independently manage dynamic and static power at block level. With this ability in mind, system architects may want to isolate high-performance tasks from high-duty-cycle tasks, so that the high-performance circuitry can be used in bursts and clock-gated, while the high-duty-cycle circuitry can be implemented with low-leakage circuitry.
I/O power, while not as significant a portion of total power in FPGAs as it ASICs, also deserves attention. The high configurability of general-purpose I/O pins provides significant leverage for trimming dynamic power. For details about I/O power, see the I/O Innovations Enabling Lower Power section in the Reducing Power Consumption and Increasing Bandwidth on 28-nm FPGAs (PDF) white paper.
Figure 2. Factors Impacting General-Purpose I/O Power
Table 1. Main Factors Impacting General-Purpose I/O Power
|Main Factors Impacting I/O Power||Mitigation Techniques|
|Termination resistors (on-chip series termination (RS OCT)
and on-chip parallel termination (RT OCT))
|Dynamic on-chip termination (DOCT)|
|Output buffer drive strength||Programmable drive strength|
|Output buffer slew rate||Programmable slew rate|
|I/O standard (single ended, voltage referenced, or differential)||Support for multiple I/O standards|
|Voltage supply||Support for various voltage rails|
|Capacitive load (charging/discharging)||Interface dependent|
In transceiver-based I/Os, some devices provide flexibility in not only configuration but also process technology and transceiver design, along with implementation of several higher levels of the protocol stack in hard IP. This allows designers to select an FPGA to cover a range of transceiver performance-power points, to tune the device to the specific environment, and to use dynamic power-management techniques. For details about how to use power-efficient transceivers, see the High-Bandwidth, Power-Efficient Transceivers section in the Reducing Power Consumption and Increasing Bandwidth on 28-nm FPGAs (PDF) white paper.
Finer-grained decisions can also add up to significant changes in system power simply by reducing the power consumed by the FPGA. Timing optimizations can increase slack on the critical nets in a block until the entire block can operate in high-threshold mode, for example, substantially reducing static power. Configuration and use of block RAM can have a significant influence on dynamic power. There are many such examples. For examples of how to use RAM block to optimize power, see the RAM Block Power Optimization section in the Reducing Power Consumption and Increasing Bandwidth on 28-nm FPGAs (PDF) white paper, and watch the Lower Power and Boost System Bandwidth on 28-nm FPGAs video.
By starting with a clear notion of use profiles, system designers can determine performance requirements and duty cycles for individual system tasks. Using this information and an understanding of how FPGA implementation decisions impact system-level power consumption, system designers can manipulate algorithms, make architectural choices, and pursue detailed implementation strategies to minimize system power subject to performance and cost requirements and available design-team resources.