By Ron Wilson, Editor-in-Chief, Altera Corporation
Perhaps no semiconductor process has generated more controversy—before a single product has been shipped—than the 20 nm node. There was argument over whether the node would have to wait for production-ready EUV lithography. It did not: double-patterning, though expensive and restrictive on layout, has met the needs of the finest-resolution mask layers.
There were battles over whether the node would require finFET transistors. Intel, IBM, and UMC say yes; Samsung, TSMC, and GLOBALFOUNDRIES say no. TSMC has since equivocated a bit, pulling forward plans for a 16 nm finFET half-node. Perhaps most notoriously, NVIDIA CEO Jen-Hsun Huang publicly questioned the economic viability of the whole 20 nm node, saying that its cost per transistor might never drop below that of 28 nm.
Figure 1. As it matures, the cost of 20 nm technology may never cross over the cost of 28 nm technology.
Note: Numbers based on public data from NVIDIA: www.extremetech.com/computing/123529-nvidia-deeply-unhappy-with-tsmc-claims-22nm-essentially-worthless.
Yet despite the debates, TSMC has rolled out its 20 nm reference flow. Chip designs are under way. Test silicon is on customers’ benches. It is time to ask what the 20 nm generation of system-on-chips (SoCs) will mean to system vendors. Will this node be just another stepping stone on the path of Moore’s Law? Will it present profound new challenges for SoC users? Are there hidden risks? To find out, we spoke with engineers working on 20 nm silicon and surveyed recent conference papers.
A Uniquely Challenging Process
The 20 nm node is arguably the most difficult ever attempted for production, and just a description of the technical challenges would justify a small book. But from the system designer’s perspective—using the SoC, not creating it—everything reduces to five key points: cost, density, speed, power, and 2.5D. System designers’ experiences will largely be determined by how chip designers manage the interplay of these five factors.
Cost is paramount. NVIDIA’s Huang may well have been right: with its greatly increased costs, 20 nm may always be more expensive than 28 nm for the same number of transistors. For SoCs with significant amounts of non-scaling circuitry, such as RF or other analog transistors, monolithic passive components, or electrostatic discharge protection structures, the gap will be larger than for dense logic-only SoCs. Quite simply, for an SoC to migrate to 20 nm, there will have to be some benefit—integration, performance, energy efficiency, or IP access—not available at 28 nm. Otherwise there will be no way to justify the added cost.
Figure 2. A simple example of splitting a pattern too fine to be resolved into two separate, lower-resolution patterns. There are many process steps required to actually use double-patterning in practice.
That statement brings us to the question of density: the one area in which 20 nm is not unlike earlier process transitions. Except for a possible loss of packing efficiency due to all the pattern-dependent design rules, 20 nm will provide about twice the number of transistors per mm2 that 28 nm has. Chip architects will use this increased transistor budget in several ways.
The obvious way is integration. If you can pack two 28 nm SoCs into one 20 nm die, the resulting savings in inter-chip delays, I/O power, and board-level costs usually justify greater cost per transistor. But less obviously, architects can also spend transistors to buy performance or energy efficiency.
One example is simple: if an SoC is DRAM-limited in a major operating mode, sometimes simply enlarging the on-chip RAM can substantially reduce DRAM accesses, giving a big boost in performance and a big reduction in I/O power consumption. But a more typical use of transistors will be to create parallelism. In heavily-threaded, data-parallel, or readily-pipelined applications, adding processors can be more effective than increasing clock frequency. This truth has led the migration from single-core to multicore SoCs already, and at 20 nm it will drive the march on from multicore to many-core.
Figure 3. With a total of ten major processors, this Cavium basestation-on-a-chip design illustrates the trend toward parallelism over raw speed.
Perhaps surprisingly, throwing transistors at the problem can also work for the performance of analog circuits. For example, FPGA vendor Altera has said it will take its maximum chip-to-chip transceiver speed from 28 Gbps in its 28 nm generation to 40 Gbps in 20 nm FPGAs. Part of this gain will, of course, come from higher transistor ft and reduced parasitics. But much will come from faster, far more complex digital-equalization circuits, Altera engineers say. In addition, in many other cases, the enlarged transistor budget will allow designers to digitally enhance the performance of the relatively poor analog signal paths that 20 nm can offer.
Adding transistors can also—though it may sound like a paradox—reduce power consumption. One example is the increasingly complex power-management strategies being contemplated by designers for 20 nm. Using elaborate state machines and control circuits, designers are getting increasingly fine-grained with their clock- and power-gating strategies. It is now common to suppress clocks on cycles when the data entering a register cannot change. And power-gating, originally used only at the block level when an entire subsystem was idle, is now being employed for increasingly short periods on increasingly fine-grained structures. The finer granularity increases the overhead in transistors, but as long as there is a net energy saving, many designers will make the tradeoff.
A more overt example is ARM’s big.LITTLE architecture. This approach adds an entire second CPU—a Cortex™-A7—beside the main Cortex-A15. When a task needs high performance, the system starts up the A15. When the system does not face performance-hungry tasks, it shuts the A15 down and runs noncritical tasks on the much lower power A7. The result is a huge net energy savings without sacrificing peak performance.
The Fine-Print Take Away
The ability to spend transistors to buy performance is absolutely vital to 20 nm SoCs for one simple reason: at the block level, 20 nm chips will not be much faster than their 28 nm equivalents. This is not immediately obvious from the publicity. TSMC, for example, claims that their 20 nm technology “…can provide 30 percent higher speed…than its 28 nm technology.” That is not the doubling we used to expect between process generations, but it is not trivial. Yet to achieve that speed on an entire block, rather than on a few critical paths, might require lavish use of low-Vt transistors with very significant leakage current, raising the issue of local-heating problems. Even without the thermal issues, the design might never close timing across all the many process, voltage, and temperature corners that 20 nm presents. Some engineers have suggested that taking power and variations into consideration, blocks simply ported to 20 nm may gain no speed at all.
Power is another question that becomes more complex at 20 nm. Dynamic power—the CV2f kind—should, in principle, be lower for 20 nm circuits, assuming smaller features lead to reduced parasitic capacitance, operating voltages stay about the same, and frequencies are similar to those at 28 nm. But even though dynamic power per transistor goes down, static power—due to leakage currents—will go up for planar processes. On paper, finFETs should allow much lower sub-threshold leakage current than planar transistors at the same Vt, reducing the largest single component of leakage. So with a finFET process, a designer could either use Vt and Vcc similar to those at 28 nm and both higher performance and lower static power, or she could use a lower Vt, allowing a lower Vcc, which would benefit both static and dynamic power. The optimum choice would depend on the circuits and on the end-system use cases.
Either with or without finFETs, power presents another issue. The sum of static plus dynamic power is unlikely to be half what it was at 28 nm. But density is going up by a factor of two. Arithmetic says that power density—and hence local heating—will limit both layout and clock frequencies in some 20 nm blocks.
Finally, there is 2.5D. There is nothing about 20 nm processes that make them inherently better for fabricating the through-silicon vias (TSVs) that are necessary for 2.5D assemblies. It is more a coincidence of timing that foundries are developing their production TSV technology for their 20 nm process nodes. The result is that we will probably see the large-scale use of TSVs to connect active circuits from multiple dies onto passive silicon substrates in the 20 nm generation.
The promise of this technology is great. A 2.5D assembly can almost transparently increase the resources available to an area- or pad-limited die. It can multiply the bandwidth to DRAM by replacing DDR3 with in-package wide-word I/O. And it can integrate, within a single small footprint, technologies that cannot be fabricated onto a single die. But the challenges are significant as well, both from technical and business perspectives.
The System Designer’s View
What does all this mean for system designers? First, it means that not all SoC product lines will automatically migrate to 20 nm. The early adopters will be only those devices that can translate the doubled transistor count into a substantial edge in system performance, power consumption, or cost. Early examples will include multicore server CPUs, CPU/GPU combination chips, high-end FPGAs, and certain ASIC SoCs—probably starting in the mobile market.
Second, those chips that do appear will likely make heavy use of multiprocessing. This trend may be invisible to a system design team that is using a complete reference design from the chip vendor. Or, as we have discussed in another article, it may be very much an issue if the design team will be involved in writing application code, routing interrupts, managing DRAM traffic, or modeling real-time behavior.
More obvious to the system designer, these chips will be heavily power-managed. Chip designers will use their entire arsenal, including dynamic voltage-frequency scaling, dynamic power gating, and adaptive voltage adjustment, to fight back against the power consumption and variations of the processes. All of these techniques can be of significance to the system designer. More specifically, they can complicate power-network design, and the first two items can introduce variable, or non-deterministic, latencies into analysis of real-time behavior.
In summary, 20 nm will continue the Moore’s law trend in integration, but at a cost. The advent of 2.5D packaging will extend and accelerate both integration and cost escalation, bringing a partial solution to DRAM-bus power and bandwidth problems, and drawing more kinds of ICs inside a single package. But this node will also accelerate the shift from raw speed toward architectural complexity as a means of increasing system performance. And it will be the most heavily power-managed node so far.