By Ron Wilson, Editor-in-Chief, Altera Corporation
In 1983, Altera officially came into existence. The year is within the memory of many people today—the first flights of the space shuttles Challenger and Columbia, the assassination of Philippine leader Benigno Aquino Jr., the US invasion of Grenada. In contrast, in the racing time-scale of technology, 30 years can seem imponderably long. TCP/IP became the official protocol of the ARPANET—there was, as yet, no Internet. IBM introduced the PC-XT—hard disks on personal computers were still in the future. The GNU project was announced.
In electronic design, microprocessors had come to dominate many embedded applications, displacing minicomputers at the high end and handfuls of small-scale digital chips at the low end. The typical digital system comprised a microprocessor or microcontroller surrounded by interface circuits that connected the chip’s bus to external memory chips and interfaces. Small-scale logic chips and medium-scale functional-block ICs were still used, but increasingly either to create the interfaces that surrounded the microprocessor, to bridge between buses, or for performance-critical applications that the microprocessors couldn’t handle.
Also in this period, a new implementation alternative was spreading through the design community. Gate arrays—partially prefabricated arrays of logic gates, configured by a series of customer-defined metal layers during the final production steps—allowed designers to pack thousands of gates of logic and memory into one chip. The design process was unfamiliar, relying on workstation-based EDA tools for schematic capture, picking library elements, and simulation, and the front-end charges were too great for many projects. But for larger, better-funded design teams, gate arrays offered a valuable alternative that could slash the chip count, boost the performance, and reduce the power consumption of a design all at once.
Design styles at the time were remarkably diverse, at least by today’s standards. Engineers trained in digital logic generally designed using formal constructs: Boolean algebra and expression minimization to define combinatorial logic, and state machines to describe sequential logic. These engineers might use either manual techniques or a growing list of computer software tools to capture and analyze their designs. But engineers coming from different backgrounds—especially analog engineers—often used a more intuitive approach. These designers preferred schematics for design capture, and tended to design by starting on the left margin and following input signals through to outputs, adding in gates, flipflops, medium-scale devices, and microprocessors as they went. Gate-array users tended toward the schematic approach also, simply because it was an approximate description of the circuits they were creating on the chip. These two contrasting design styles would play a role in the evolution of yet another innovation that was gradually maturing in the industry of 1983.
In 1978, that other alternative had appeared to relatively little notice. A chip-design team at Monolithic Memories had created a device they called a PAL (for “programmable array logic”, the more obvious term “programmable logic array” having already been taken). The chip was a one-time-programmable digital device designed to implement the minimized standard form of Boolean expressions: the sum-of-products format. A PAL contained a number of macrocells, each of which contained a switch array that could connect any of the chip’s inputs and outputs into each of up to eight product terms. The macrocell then summed the product terms in a wide OR gate and provided a configurable flipflop to register the output of the OR gate, with some multiplexers for selecting clocks, bypassing the register, and so forth. By 1983, PALs had proliferated into an impressive range of sizes, speeds, and configurations, giving designers a perhaps too-rich feast.
These programmable devices offered digital designers an interesting proposition: better logic density than small-scale gates and flipflops, better flexibility than purpose-built off-the-shelf medium-scale devices like counters, registers, and decoders, and a more familiar design flow and no front-end expense compared to gate arrays. The vendors provided software tools for minicomputers and mainframes that could translate Boolean expressions, state machines, and some schematics into switch maps for the chips.
This was the environment in 1984 when Altera offered its first product. That chip, the EP300, (Figure 1) was a programmable-logic chip, but it differed from PALs in four significant ways. First, the EP300 was reprogrammable—a seemingly minor convenience that would prove to be a major factor in the industry. A quartz window in the package allowed users to shine a UV lamp on the die, erasing the EPROM cells that held the device configuration, so the chip could be programmed again.
Second, the EP300 was a CMOS device, at a time when most PALs, small-scale, and medium-scale logic were still built in power-hungry bipolar processes. Third, the Altera chips were universal—that is, users could program the EP300 to mimic the configuration of nearly any of the myriad existing PAL types. And finally, Altera provided a design tool for the EP300 that ran on the IBM PC, rather than on an engineering workstation or a minicomputer. “That was a real novelty,” recalls Altera senior vice president Don Faria. “The PC was so new, and so exciting to engineers, that sometimes people would watch our demo just to get a look at the PC-XT.”
There was another difference, but it concerned Altera, not just the EP300. Altera was fabless. In a time when most semiconductor companies depended on proprietary processes in their own fabs for a competitive advantage, Altera designed the EP300 as nearly as possible to a generic EPROM process. Thus the chip could be built in any of a number of other companies’ fabs, with minimal adjustment to the design. Litho masks were literally photographs of knife-cut Rubylith—not modified by just changing a parameter in a design file. So creating a design that could have competitive performance and portability was a non-trivial challenge.
The new devices were an excellent match for system designers’ needs, Faria remembers. The 20-pin package contained eight macrocells. Each macrocell (Figure 2) included a register that received the OR of eight product terms. So each macrocell could decode a 16-bit address bus with ease. One example in an early databook shows a derivative device, the EP310, connected to a microprocessor address bus and generating the chip-select signals for a RAM, an EPROM, and five serial I/O transceivers. The device could also do complex translations, such as decoding binary numbers to drive the segments of a seven-segment display. Or one chip could implement a quite complex state machine with up to eight binary state variables.
By implementing interfaces and state machines in the EP300, designers could try out new logic expressions, reconfigure designs, remap address buses, and likewise experiment without accumulating a wastebasket full of accusatory used PALs. More important financially—but maybe less so emotionally—they could design a single board to fill a variety of part numbers in finished goods, and even reconfigure units in the field. These advantages led to rapid adoption of the EP300 and its derivatives, and naturally led designers to want to put more and more functions into the programmable device. That trend, in turn, led to pressure for larger devices.
The response from Altera was to extend the family to chips with more macrocells, culminating in the EP1800, a 48-macrocell device in a 68-pin leaded chip carrier. Already by this point, the need for logic capacity had over-stressed the PAL architecture, and the architecture had begun to evolve. “We learned that the standard PAL architecture didn’t scale well,” Faria explains.
In the conventional PAL, each of the product terms in each macrocell has access to all of the input signals, all of the macrocell outputs, and all of these signals’ complements. This fully populated interconnect matrix grows geometrically with the number of macrocells, quickly becoming intractable. “So we began to partition the interconnect,” Faria says.
This trend is clearly visible in the EP1800, which is, in effect, four separate EP300-style devices in one package (Figure 3). The four are connected by an internal bus, rather than all trying to drive the same product-term matrix. This organization made the device manufacturable, but designers had to think explicitly about which macrocell was going to implement a particular logic expression, and which other macrocells would need access to the result. Getting between the quadrants of the device cost an additional delay.
A New Architecture, A New Player
At about this time, 1985, an external event occurred that would influence Altera architectures: recent start-up Xilinx announced its first FPGA. The FPGA had a fundamentally different architecture from the EPLD, and those differences would trigger a number of innovations within Altera.
Instead of being a collection of macrocells, the FPGA was an attempt to mimic the structure of a gate array. The first chips were arrays of logic cells—relatively simple logic elements (LEs), each comprising a three-input look-up table (LUT) to generate logic functions, a single configurable flipflop, and multiplexers for steering signals and selecting clocks. Instead of fixed interconnect, the logic cells connected through switch boxes into a multi-layered heap of metal segments of various lengths and orientations: programmable interconnect. Thus as in gate arrays, the FPGA user could determine the function of a small cluster of gates (the logic cell) and how that cluster connected to other clusters on the chip, gradually building up a circuit by connecting logic cells.
Three aspects of this device caught the attention of logic designers. First, because there was a large number of logic cells in an FPGA compared to the number of macrocells in a similar-sized EPLD, FPGAs gave the often-incorrect appearance of greater capacity. They did possess more flipflops than the equivalent EPLD, but not comparable density for combinatorial logic. The second attractive aspect was the similarity to the then-fashionable gate arrays, a resemblance that caught the imaginations of designers who had been studying or working with the new technology. Third, the FPGAs seemed far more intuitive to designers who were used to thinking of their designs in terms of schematics rather than Boolean equations. It was clear how nets of logic gates and flipflops ought to map into an FPGA.
This latter point was coupled to a growing trend. It was becoming apparent that the formal design style based on Boolean expressions—not unlike the PALs designed to implement it—was not going to scale well. Boolean algebra could exactly and concisely express the designer’s intent for a few hundred gates. But at a thousand gates, the chance of typographic error and the difficulty of understanding what the expressions actually meant were both becoming unacceptably large.
By 1985, Altera’s PC-based design suite offered four different means of design capture: Boolean expressions, state-machine maps, net lists, and schematics. The software took designs in any of these forms, reduced them to Boolean expressions, minimized the expressions, mapped the expressions into the circuitry of the target device, and offered a simulator so designers could see how the chip would function. The tools also gave the user the option of direct control over the individual EPROM cells that configured the chip, in recognition of the fact that there were cases where a skilled human could optimize better than could a piece of software.
Altera’s response to the FPGA was to increase the logic capacity of its next generation of devices. But the Altera designers did this without copying the FPGA architecture or attempting to simply enlarge the EPLD. Instead, they extended the concept of the partitioned PAL into a new kind of architecture: the complex PLD (CPLD).
The CPLD addressed two categories of problems that had quickly become apparent with the FPGA at then-current process densities (around 1 µm). One of these issues was fan-in. The LUT in the FPGA logic cells had only three inputs. So FPGAs needed to cascade many stages of LEs to implement a high-fan-in function such as an address decoder (Figure 4). The many stages often led to excessive delay, and consumed a large proportion of interconnect and logic cells inefficiently.
Figure 4. Some functions, such as this eight-bit address decoder, can be done in two logic stages with the wide inputs of a PAL, but require multiple stages with the much narrower inputs of FPGA logic cells.
The second issue was timing predictability. In a PAL, the delay for any logic expression is the same, no matter how complex it is or where it is placed, as long as it fits in one macrocell. If an expression requires multiple macrocells in a PAL or a CPLD, bridging between cells simply adds another fixed delay to the timing. So you could practically use your fingers to calculate the path delay on a logic function.
This simplicity did not translate to FPGAs. In the early devices, resources were at a premium and interconnect was very delay intensive. So timing could vary dramatically depending on how the tools mapped your design into the logic cells and interconnect segments. As the device filled up, and the tools had to work harder to find open routing paths and free logic cells, the problem became much worse. In a 90%-utilized FPGA—a situation not recommended by the manufacturer, if you read the small print—a seemingly insignificant change to the logic could cause major, and far-reaching, changes to the layout and the timing.
Altera’s first CPLDs, 1988’s MAX5000 devices, sought a path to higher logic density without giving up the deterministic timing of PALs. The architecture achieved its goal by optimizing logic granularity for the process characteristics.
A comparison might illustrate this point. In a conventional PAL, every product term has access to every variable in the chip. The partitioning granularity is thus the entire device. In the first FPGAs, each LUT had access to three signals from itself or other cells. Altera architects reasoned that the PAL’s giant granularity was too large for scalability, but the early FPGA’s extremely fine granularity led to unpredictable timing. The best answer for 1988’s process technology must lie somewhere in between.
The MAX 5000 architecture (Figure 5) sought to find that sweet spot. Each MAX chip contained one or more logic array blocks (LABs). Each LAB, in turn, was essentially a large (16- or 32-macrocell) PAL without the I/O pads. The LAB’s macrocells each had four product terms. But each LAB included a pool of undedicated product terms that could be attached to a macrocell to expand its reach. Each LAB was connected to the on-chip programmable interconnect array (PIA), which could route signals between LABs with a fixed additional delay. And all the LABs shared a programmable I/O routing matrix that connected the logic to the pad ring.
The MAX CPLDs proved a successful compromise. Timing, while not as simple as in a small PAL, was still deterministic and was constrained within a fairly narrow window. Mapping a design onto the device was not trivial, but it was handled automatically by the design software, with the option of manual editing.
Demands for increased logic density continued through the late 1980s and early 1990s. With increased gate counts, application developers began to think of CPLDs, with their superior fan-in and determinism, as the best choice for logic-rich functions like decoders and state machines. Designers chose FPGAs, with their growing advantage in the number of flipflops, for register-intensive functions.
The intelligence of the design tools increased, reducing the need for manual editing just as the devices became gradually harder for a human to comprehend. And in the late 1980s, an entirely new design style, based on automatic logic synthesis from the Verilog hardware-description language (HDL), began to appear as well. Despite the name, HDLs like Verilog did not describe the hardware at all. Rather, they described the function of the hardware in terms of a new formalism called register-transfer logic.
An unintended consequence of the use of Verilog (and its ADA-derived competitor, VHDL) was that synthesis created register-rich, synchronous designs. These designs intuitively made more sense in register-rich FPGA architectures than they did in combinatorially oriented CPLDs.
Intuition didn’t necessarily track reality, and early attempts to perform logic synthesis for FPGAs were undermined by the limitations of the FPGA structure. But none the less, designers began thinking about logic design in terms of textual languages like Verilog, and they began thinking of FPGAs not just as interface elements, but as a way to implement complete functional blocks of their design. As Altera’s second decade approached, the growing logic density of semiconductor processes and—just as important—the increasing richness of the processes’ interconnect stacks, began to open new possibilities. The time was right for another revolution at Altera.