The ability to detect and act upon single event upsets (SEUs) while an FPGA is operating has become more important than ever. All Stratix® FPGAs feature dedicated cyclic redundancy check (CRC) hard intellectual property (IP) circuits that detect CRAM bit flips and indicate an error on a dedicated CRC_ERROR pin.
Since the 130-nm process generation (Stratix FPGAs), Altera has included background error detection circuitry in all FPGAs using hard CRC checker to enable continual verification of the CRAM contents during device operation. The CRC is guaranteed to detect multi-bit errors. The benefit of integrating CRC circuitry on-chip in hard gates is that the circuitry is robust and not susceptible to soft errors. In addition, the CRC circuit is a self-contained block and is enabled simply by checking a box in the Quartus® II compilation options.
Through process and design techniques, Altera has improved FIT/Mb with every technology generation. We've also provided enhanced mitigation solutions for soft errors at various levels – silicon, IP, and tools.
One such example of enhancement is increased functionality and sophistication of CRC circuits as shown in Table 1.
Table 1. CRC Enhancements in Stratix Series FPGAs
|Stratix Series FPGA Family (1)||
and Stratix GX
and Stratix II GX
|Stratix IV E
and Stratix IV GX
- Select the appropriate Stratix series family link for complete details on CRC features in a specific Stratix FPGA family.
Configuration Error Checking
All Altera® Stratix series FPGAs compute the CRC during configuration and store it in registers. Dedicated circuitry checks it against an automatically computed CRC. The CRC_ERROR pin reports failure when configuration RAM data is changed unintentionally and makes it easy to trigger reconfiguration. CRC checking is controlled through Quartus II design software.
Ever since dedicated CRC background configuration checker circuitry was introduced in the first generation of Stratix FPGAs, Altera has continuously enhanced the capability:
- Instead of a single CRC value for the entire device, Stratix series FPGAs (Stratix III FPGAs and later) store a CRC value each for configuration frame, thereby allowing faster SEU detection
- The CRC error detection engine in Stratix series FPGAs (Stratix III FPGAs and later) provide the location of the SEU for both single-bit and adjacent multiple-bit errors
- The CRC configuration circuit in Stratix series FPGAs (Stratix III FPGAs and later) allows for various types of error injection to simulate SEU events and test mitigation strategy
- The CRC detection/correction circuit in Stratix V FPGAs has increased error detection coverage (99.99999998%) and can correct single and double-adjacent multi-bit errors.
- The CRC detection time in Stratix V FPGAs has been improved by ~7X for equivalent densities compared to the Stratix IV family with an enhanced CRC scheme.
- Fault injection in Stratix V FPGAs is enhanced where the user can inject multiple faults in the FPGA.
Configuration Error Classification
Since the majority of configuration errors have no effect on the functionality of an FPGA, the ability to ignore these “don't care” soft errors provides a step increase in the actual mean time between failures (MTBFs) from SEUs resulting in improved system uptime. Using the location data provided by the enhanced CRC circuit in Stratix series FPGAs (Stratix III FPGAs and later), and a small amount of logic to check the error location against the criticality map, an SEU can be determined to be “care” or “don't care”.
In the event of a “don't care” configuration error, you can decide to ignore the SEU and continue running. The criticality map is automatically generated by the Quartus II software development tool, and is accessed via a user-defined interface, such as the active serial configuration memory. The criticality processor logic is integrated in Quartus II software as an IP megafunction.
Configuration Error Correction
In order to minimize the impact of SEU errors, Stratix V FPGAs can correct the CRAM bit flips without system downtime. There are two types of automatic SEU correction.
- Internal scrubbing: With this option, the CRAM CRC detection/correction runs in the background and does not require any user design or external components. When enabled through Quartus II software, the CRC circuit, implemented in hard logic, can detect multi-bit errors and correct single or double-adjacent errors in a frame in a fraction of milliseconds while the rest of the FPGA is in operation.
- Dynamic .pof reload: Correct CRAM bit flips by reloading CRAM images from the external .pof file. Frame-by-frame correction can happen in the background, or you can initiate it. This option may be considered for correcting multiple CRAM bit flips (more than double adjacent errors) per frame.
On-Chip Memory Error Checking
In addition to configuration memory checking, Stratix series FPGAs (Stratix III FPGAs and later) can check the integrity of on-chip memory. The new M20K embedded memory block offers a hard error correction code (ECC) circuit that can be used in pipelined or non-pipelined mode or bypassed altogether. The ECC code used for M20K memory can detect up to three-bit errors and correct up to two-bit errors. Enhanced multi-bit coverage along with physical interleaving of bits in a word provides mitigation for multi-bit upsets in Stratix V FPGAs. Using the ninth memory bit, along with an automatically generated ECC megafunction, can provide SEU mitigation for memory structures built using memory logic array blocks (MLABs). The MegaWizard® Plug-In Manager within Quartus II software makes configuration of the ECC functionality simple.