from Altera
|
|
Many modern digital signal processing (DSP) systems employ floating-point functionality to achieve the high degree of numeric precision and dynamic range that most applications require. Applications such as radar, sonar, bioscience and molecular science, financial modeling, advanced wireless antenna processing, medical imaging, image analytics and synthesis, and precision control are just some that create a demand for floating-point capabilities in FPGAs. Furthermore, as FPGAs continue to grow in size and capability, they are becoming the highest performance platform available for any type of floating-point-based algorithm or computation. In a recent National Science Foundation benchmark, a Stratix® IV FPGA delivered 171 giga floating point operations per second (GFLOPs), and was the clear overall leader for the highest GFLOP/watt performance.
Table 1 shows Altera's market-leading floating-point performance.
| Table 1. Floating-Point Performance | ||||||||||||
| Matrix Multiply | Dimension | Vector Size |
Logic Usage | fMAX | Latency (Cycles) |
GFLOPs | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Logic Elements (LEs) |
DSP | M9K | M144K | Memory (Bits) |
||||||||
| Single precision | 8x8 * 8x8 | 8 | 1,346 | 32 | 26 | - | 14,986 | 420 | 209 | 6.30 | ||
| 16x16 * 16x16 | 8 | 1,434 | 32 | 27 | - | 55,562 | 421 | 611 | 6.32 | |||
| 32x32 * 32x32 | 16 | 2,520 | 64 | 76 | - | 339,718 | 419 | 2,172 | 13.00 | |||
| 64x64 * 64x64 | 32 | 4,728 | 128 | 80 | 16 | 2,382,318 | 388 | 8,353 | 24.45 | |||
| Double precision | 8x8 * 8x8 | 8 | 3,610 | 112 | 34 | - | 29,762 | 303 | 213 | 4.54 | ||
| 16x16 * 16x16 | 8 | 3,693 | 112 | 39 | - | 110,756 | 314 | 615 | 4.71 | |||
| 32x32 * 32x32 | 16 | 7,526 | 224 | 109 | - | 679,302 | 299 | 2,178 | 9.27 | |||
| 64x64 * 64x64 | 32 | 14,335 | 448 | 77 | 32 | 4,765,120 | 284 | 8,359 | 17.91 | |||
Single precision complex |
8x8 * 8x8 | 8 | 3,998 | 128 | 59 | - | 22,666 | 413 | 220 | 12.80 | ||
| 16x16 * 16x16 | 8 | 4,137 | 128 | 64 | - | 79,139 | 404 | 624 | 12.52 | |||
| 32x32 * 32x32 | 16 | 8,002 | 256 | 146 | - | 420,519 | 397 | 2,181 | 24.99 | |||
| 64x64 * 64x64 | 32 | 15,627 | 512 | 216 | 16 | 2,674,289 | 360 | 8,362 | 45.68 | |||
Table 2 shows example numbers of the fast Fourier transform (FFT) MegaCore® function usage in the Stratix IV GX EP4SGX530 device.
| Table 2. FFT MegaCore Function (Second-Generation Fused Datapath) | |||
| Features | 14 Floating-Point FFT Cores, 1,024 Points | ||
|---|---|---|---|
| Usage | Maximum Available Resources |
Usage (%) | |
| Logic utilization | 300,000 | 424,960 | 70 |
| ALUT | 224,000 | 424,960 | 53 |
| Registers | 210,000 | 424,960 | 49 |
| M9K | 1,280 | 1,280 | 100 |
| M144K | 64 | 64 | 100 |
| DSP block (18-bit) | 896 | 1,024 | 88 |
| fMAX | > 300 MHz | - | - |
| Transform time per core | 3 us (normalized: 0.22 us) | - | - |
Altera provides the largest library of IEEE 754-compliant floating-point megafunctions, and they can all be used in any Altera® device family. Key megafunctions include:
- Addition and subtraction (altfp_add_sub)
- Multiplication (altfp_mult)
- Division (altfp_div)
- Square root (altfp_sqrt)
- Compare (altfp_compare)
- Logarithm (altfp_log)
- Exponential (altfp_exp)
- Inverse (altfp_inv)
- Inverse square root (altfp_inv_sqrt)
- Matrix multiplier (altfp_matrix_mult)
- Matrix inversion (altfp_matrix_invert)
- Sine (alt_sine)
- Cosine (alt_cosins)
- FFT (MegaCore)
Features
Floating-point computation performance is typically a balanced combination of the frequency at which the operators run and the pipeline latency of the operator hardware. This product yields a measure of GFLOP performance metric. When designing for maximum GFLOP performance in an FPGA, the total number of operators that can be placed in an FPGA is vital. As such, you can parameterize the Altera floating-point megafunctions in many different ways to fine-tune GFLOP performance (or, similarly, for other key metrics such as power and area) to meet the application-specific requirements. The configurable features include:
- Single and double-precision selection
- Single extended configurable precision
- Operator latency versus area tradeoff
- Reduced functionality
- Optional denormalized number support
- Reduced rounding accuracy
- Optional indefinite support
- Support for dedicated multiplier circuitry (multiplier only)
- Optional add or subtract-only mode (adder or subtractor only)
Typical Performance
Table 3 lists the typical performance for a few selected floating-point operators. All results are quoted based on the Stratix V and Arria V FPGA families. For more details, please refer to the user guide for each megafunction.
Table 3. Floating-Point FPGA Resources |
|||||
| Device | LEs | 27 x 27 Multipliers | Single-Precision Multipliers | Double-Precision Multipliers | Memory (Mbits) |
| 5SGSD4 | 330K | 1,020 | 1,020 | 255 | 18 |
| 5SGSD5 | 462K | 1,498 | 1,498 | 374 | 40 |
| 5SGSD6 | 583K | 1,775 | 1,775 | 443 | 45 |
| 5SGSD8 | 695K | 1,963 | 1,963 | 490 | 50 |
| 5AGXB7 | 503K | 1,139 | 1,139 | 285 | 24 |
Related Links
- Floating-Point Megafunctions (PDF) user guide
- Double-Precision Floating-Point Math (PDF) white paper
- Nios® II Floating-Point Custom Instructions (PDF) tutorial
- High-Performance Computing System Solutions
- Nios II Floating-Point Training (online training)
