The Floating Point Hardware (FPH1)1 provides substantial performance improvement over floating-point software emulation by providing custom instruction implementations of single-precision add, sub, multiply, and divide operations. Software implements all other floating-point operations (including double-precision operations) by emulating FPH1 functionality.
The Floating Point Hardware 2 (FPH2)2 achieves even higher levels of performance by providing lower cycle count implementations of add, sub, multiply, and divide operations, and by providing custom instruction implementations of additional floating-point operations.
- Nios® II Gen2 (or Classic) Processor Reference Handbook
- Nios II Gen2 (or Classic) Software Developer's Handbook
- Nios II Custom Instruction User Guide
- Using Nios II Floating-Point Custom Instructions Tutorial
- Supports FPH1 operations (add, sub, multiply, divide) and adds support for square root, comparisons, integer conversions, minimum, maximum, negate, and absolute
- Single-precision floating-point values are stored in the Nios II general purpose registers
- VHDL only
- Qsys support only
- Single-precision only
- Optimized for FPGAs with 4-input LEs and 18-bit multipliers
- GCC and Nios II SBT (Software Build Tools) software support
- IEEE 754-2008 compliant except for:
- Simplified rounding
- Simplified NaN handling
- No exceptions
- No status flags
- Subnormal supported on a subset of operations
- Binary-compatibility with FPH1
- FPH1 implements Round-To-Nearest rounding. Because FPH2 implements different rounding, results might be subtly different between the two generations
- Approximately 2500 4-input LEs
- Nine 9-bit multipliers
- Three M9K memories or larger
The floating point hardware 2 component is benchmarked in the following hardware configuration:
- Cyclone V 5CGXFC7D6F31C6 device
- Nios II processor
- On-chip memory
- Pipeline bridge
- JTAG UART
- Floating point hardware 2
This hardware design achieves FMAX=125 MHz.
To instantiate the FPH2 component in your system, in Qsys, locate the Floating Point Hardware 2 component in the Project area of the Component Library. The FPH2 component is located under the “Embedded Processors” group in the Component Library.
The floating point hardware 2 component editor, shown in the figure below, allows you to selectively enable any of several groups of floating point custom instructions. By default, all instructions are enabled.
In most cases, you should leave all floating point custom instructions enabled. However, for the MAX 10 device family in certain configurations, you might need to disable the Roots group.
MAX 10 devices cannot support the FPH2 square root instruction in the following configurations:
- Dual configuration mode
- Compressed configuration mode
- External RAM initialization disabled
The square root instruction uses a lookup table, requiring initialization that the MAX 10 cannot support in these configurations. Turn off the Roots option if you are targeting a MAX 10 device in one of these configurations.
When you disable one of the floating point instruction groups, software must implement the functions in that group (in this case, square root) if they are required. The BSP generator automatically creates this support. Refer to "Nios II SBT" for details.
The example in the figure above targets a MAX 10 device. Note the warning message, reminding you that there could be an issue with RAM initialization for the square root function.
After connecting the FPH2 to the Nios II, generate your system in Qsys as you normally would. Then use the Quartus® Prime software to compile the generated RTL, or use an RTL simulator, like Modelsim™, to perform simulations.
|FRAC||Fraction||Specifies the fractional portion (right of the binary point) of the mantissa. The integer value (left of the binary point) is always assumed to be 1 for normal values so it is omitted. This omitted value is called the hidden bit. The mantissa ranges from ≥1.0 to <2.0.|
|EXP||Biased Exponent||Contains the exponent biased by the value 127. The biased exponent value 0x0 is reserved for zero and subnormal values. The biased exponent value 0xff is reserved for infinity and NaN. The biased exponent ranges from 1 to 0xfe for normal numbers (-126 to 127 when the bias is subtracted out).|
|S||Sign||Specifies the sign. 1 = negative, 0 = positive. Normal values, zero, infinity, and subnormals are all signed. NaN has no sign, so the S field is ignored.|
- Zero (EXP=0, FRAC=0)
- Subnormal (EXP=0, FRAC≠0)
- Infinity (EXP=255, FRAC=0)
- NaN (EXP=255, FRAC≠0)
Unit in the last place (ULP) represents the value 2-23, which is approximately 1.192093e-07. The ULP is the distance between the closest straddling floating-point numbers a and b (a ≤ x ≤ b, a ≠ b), assuming that the exponent range is not upper-bounded. The IEEE Round-to-Nearest modes produce results with a maximum error of one-half ULP. The other IEEE rounding modes (Round-to-Zero, Round-to-Positive-Infinity, and Round-to-Negative-Infinity) produce results with a maximum error of one ULP.
The table below shows how single-precision values are encoded across the 32-bit range from 0x0000_0000 to 0xffff_ffff. Single-precision floating point numbers have the following characteristics:
- Precision (ρ) = 24 bits (23 bits in FRAC plus one hidden bit)
- Radix (β) = 2
- emin = -126
- emax = 127
The most-significant bit of FRAC is 0 for signaling NaNs (sNaN) and 1 for quiet NaNs (qNaN).
|Hexadecimal Value||Name||S||EXP||FRAC||Value (Decimal)|
|0x0000_0001||min pos subnormal||0||0x00||0x00_0001||1.40129846e–45 (βemin-ρ+1 = 2-126-24+1 = 2-149)|
|0x007f_ffff||max pos subnormal||0||0x00||0x7f_ffff||1.1754942e-38|
|0x0080_0000||min pos normal||0||0x01||0x00_0000||1.17549435e–38 (βemin = 2-126)|
|0x7f7f_ffff||max pos normal||0||0xfe||0x7f_ffff||3.40282347e+38 ((β – β1- ρ) * 2emax = (2 – 21-24) * 2127 = (2 – 223) * 2127|
|0x7f80_0001||min sNaN (pos sign)||0||0xff||0x00_0001|
|0x7fdf_ffff||max sNaN (pos sign)||0||0xff||0x3f_ffff|
|0x7fe0_0000||min qNaN (pos sign)||0||0xff||0x40_0000|
|0x7fff_ffff||max qNaN (pos sign)||0||0xff||0x7f_ffff|
|0x8000_0001||max neg subnormal||1||0x00||0x00_0001||-1.40129846e–45|
|0x807f_ffff||min neg subnormal||1||0x00||0x7f_ffff||-1.1754942e-38|
|0x8080_0000||max neg normal||1||0x01||0x00_0000||-1.17549435e–38|
|0xff7f_ffff||min neg normal||1||0xfe||0x7f_ffff||-3.40282347e+38|
|0xff80_0001||max sNaN (neg sign)||1||0xff||0x00_0001|
|0xffdf_ffff||min sNaN (neg sign)||1||0xff||0x3f_ffff|
|0xffe0_0000||max qNaN (neg sign)||1||0xff||0x40_0000|
|0xffff_ffff||min qNaN (neg sign)||1||0xff||0x7f_ffff|
When the exact result of a floating-point operation cannot be exactly represented as a floating-point value, it must be rounded.
The IEEE 754-2008 standard defines the default rounding mode to be “Round-to-Nearest RoundTiesToEven”. In the IEEE 754-1985 standard, this is called “Round-to-Nearest-Even”. Both standards also define additional rounding modes called “Round-to-Zero”, “Round-to-Negative-Infinity”, and “Round-to-Positive-Infinity”. The IEEE 754-2008 standard introduced a new optional rounding mode called “Round-to-Nearest RoundTiesAway”.
The FPH2 operations either support Nearest Rounding (RoundTiesAway), Truncation Rounding, or Faithful Rounding. The type of rounding is a function of the operation and is specified in Table 4-2. Because the software emulation library (used when FPH operations aren’t provided) and FPH1 implement Round-to-Nearest RoundTiesToEven, there can be differences in the results between FPH2 and these other solutions.
Nearest Rounding corresponds to the IEEE 754-2008 “Round-to-Nearest RoundTiesAway” rounding mode. Nearest Rounding rounds the result to the nearest single-precision number. When the result is halfway between two single-precision numbers, the rounding chooses the upper number (larger values for positive results, smaller value for negative results).
Nearest Rounding has a maximum error of one-half ULP. Errors are not evenly distributed, because nearest rounding chooses the upper number more often than the lower number when results are randomly distributed.
Truncation Rounding corresponds to the IEEE 754-2008 “Round-To-Zero” rounding mode. Truncation Rounding rounds results to the lower nearest single-precision number.
Truncation Rounding has a maximum error of one ULP. Errors are not evenly distributed.
Faithful Rounding rounds results to either the upper or lower nearest single-precision numbers. Therefore, Faithful Rounding produces one of two possible values. The choice between the two is not defined.
Faithful Rounding has a maximum error of one ULP. Errors are not guaranteed to be evenly distributed.
|Unrounded Value||Nearest Rounding||Truncation Rounding||Faithful Rounding|
|3.34||3.3||3.3||3.3 or 3.4|
|6.45||6.5||6.4||6.4 or 6.5|
|2.00||2.0||2.0||2.0 or 2.1|
|8.99||9.0||8.9||8.9 or 9.0|
|-1.24||-1.2||-1.2||-1.2 or -1.3|
|-3.78||-3.8||-3.7||-3.7 or -3.8|
The table below lists the results of some IEEE 754 special cases. The x represents a normal value. The FPH2 are compliant for all of these cases.
Results are assumed to be correctly signed so signs are omitted when they are not important. When the sign is relevant, signs are shown with extra parenthesis around the value such as (+∞). The value x in the table represents any non-NaN value.
Comparisons ignore the sign of zero for equality. For example, (-0) == (+0) and (+0) ≤ (-0). Comparisons that don’t include equality, like > and <, don’t consider -0 to be less than +0. Comparisons return false if either or both inputs are NaN. The min and max operations return the non-NaN input if one of their inputs is NaN and the other is non-NaN. Other operations that produce floating-point results return NaN if any or all of their inputs are NaN.
|fdivs||0/0=NaN||∞/∞=NaN||0/∞=0, ∞/0=∞||NaN/x=NaN, x/NaN=NaN, NaN/NaN=NaN|
|fsubs||(+∞)-(+∞)=NaN||(-∞)-(-∞)=NaN||(-0)-(-0)=+0||NaN-x=NaN, x-NaN=NaN, NaN-NaN=NaN|
|fadds||(+∞)+(-∞)=NaN||(-∞)+(+∞)=NaN||(+0)+(-0)=+0, (-0)+(+0)=+0||NaN+x=NaN, x+NaN=NaN, NaN+NaN=NaN|
|fmuls||0*∞=NaN||∞*0=NaN||NaN*x=NaN, x*NaN=NaN, NaN*NaN=NaN|
|sqrt(x) =NaN, x<-0||sqrt(NAN) =NaN|
|fixsi & round||int(>231-1)= 0x7fffffff, int(+∞)=0x7fffffff||int(<-231)= x80000000, int(-∞)=0x80000000||int(NaN)=undefined|
|fmins||min((+0),(-0))=(-0)||min((-0),(+0))=(-0)||min(NaN,n)=x, min(x,NaN)=x, min(NaN,NaN)=NaN, min(+∞,x)=x, min(-∞,x)=-∞|
|fmaxs||max((+0),(-0))=(+0)||max((-0),(+0))=(+0)||max(NaN,x)=x, max(x,NaN)=x, max(NaN,NaN)=NaN, max(+∞,x)=+∞, max(-∞,x)=x|
|fcmplts (<)||(+∞)<(+∞)=0||(-∞)<(-∞)=0||(-0)<(+0)=0, (+0)<(-0)=0||NaN<x=0, x<NaN=0, NaN<NaN=0|
|fcmples (≤)||(+∞)≤(+∞)=1||(-∞)≤(-∞)=1||(+0)≤(-0)=1, (-0)≤(+0)=1||NaN≤x=0, x≤NaN=0, NaN≤NaN=0|
|fcmpgts (>)||(+∞)>(+∞)=0||(-∞)>(-∞)=0||(-0)>(+0)=0, (+0)>(-0)=0||NaN>x=0, x>NaN=0, NaN>NaN=0|
|fcmpges (≥)||(+∞)≥(+∞)=1||(-∞)≥(-∞)=1||(-0)≥(+0)=1, (+0)≥(-0)=1||NaN≥x=0, x≥NaN=0, NaN≥NaN=0|
|fcmpeqs (=)||(+∞)=(+∞)=1||(-∞)=(-∞)=1||(-0)=(+0)=1||(NaN==x)=0, (x==NaN)=0, (NaN==NaN)=0|
|fcmpnes (≠)||(+∞)≠ (+∞)=0||(-∞)≠ (-∞)=0||(-0)≠(+0)=0||NaN≠x=0, x≠NaN=0, NaN≠NaN=0|
The FPH2 are implemented with one combinatorial custom instruction and one multi-cycle custom instruction. The combinatorial custom instruction implements the comparison, minimum, maximum, negate, and absolute operations. The multi-cycle custom instruction implements the add, subtract, multiply, divide, square root, and conversion operations.
- No traps/exceptions.
- No status flags.
- Remainder and conversions between binary and decimal operations are not supported. These are provided by the software emulation library.
- No support for round-to-nearest-even mode. Nearest Rounding, Truncation Rounding, or Faithful Rounding is used, depending on the operator.
- Subnormals are not supported by the add, subtract, multiply, divide, and square root operations. Subnormal inputs are treated as signed zero and subnormal outputs are never created (result is signed zero instead). This treatment of subnormal values called flush-to-zero.3
- Subnormals cannot be created by the integer2float conversion operation. This behavior is IEEE 754 compliant.
- No distinction between signaling and quiet NaNs as input operands. Any result that produces a NaN may produce either a signaling or quiet NaN.
- A NaN result with one or more NaN input operands is not guaranteed to return any of the input NaN values; the NaN result can be a different NaN than the input NaNs.
|IEEE 754 Exception||FPH2 Result|
|Division by zero||Signed infinity|
- Operation 4—Provides the name of the floating-point operation. The names match the names of the corresponding GCC floating point hardware command-line options except for “round”, which has no GCC support.
- N—Provides the 8-bit fixed custom instruction N value for the operation. FPH2 component uses fixed N values that occupy the top 32 Nios II custom instruction N values (224 to 255). The FPH1 also use fixed N values (252 to 255) and the FPH2 assign the same operations to those N values to maintain compatibility.
- Cycle 5—Specifies the number of cycles it takes to execute the instruction. A combinatorial custom instruction takes 1 cycle. A multi-cycle custom instruction always requires at least 2 cycles. An N-cycle custom instruction has N-2 register stages inside the custom instruction because the Nios II registers the result from the custom instruction and also allows another cycle for g wire delays in the source operand bypass multiplexers. The Cycle column does not include the extra cycles (maximum of 2) required because the Nios II/f processor stalls the instruction following the multi-cycle custom instruction if that instruction uses the result within 2 cycles. These extra cycles are required because multi-cycle instructions are late-result instructions.
- Result—Describes the computation performed by the operation.
- Subnormal—Describes how the operation treats subnormal inputs and subnormal outputs.
- Rounding 6—Describes how the FPH2 component rounds the result. The possible choices are Nearest, Truncation, Faithful, and none.
- GCC Inference—Shows the C code from which GCC infers the custom instruction operation.
|floatis||250||4||int_to_float(a)||Does not apply||Does not apply||Casting|
|reserved||234 to 247||Undefined||undefined|
|fmins||233||1||(a<b) ? a : b||supported||None||fminf()7|
|fmaxs||232||1||(a<b) ? b : a||supported||None||fmaxf()7|
|fcmplts||231||1||(a<b) ? 1 : 0||supported||None||a<b|
|fcmples||230||1||(a≤b) ? 1 : 0||supported||None||a<=b|
|fcmpgts||229||1||(a>b) ? 1 : 0||supported||None||a>b|
|fcmpges||228||1||(a≥b) ? 1 : 0||supported||None||a>=b|
|fcmpeqs||227||1||(a=b) ? 1 : 0||supported||None||a==b|
|fcmpnes||226||1||(a≠b) ? 1 : 0||supported||None||a!=b|
The FPH2 component provides functions for conversion between signed integer types (C short, int and long types) and 32-bit single-precision floating point types (C float type). The Nios II GCC compiler infers these hardware functions when compiled code converts data between these types, for example in C casting.
The FPH2 component does not provide functions for conversion between unsigned integer types and floating point. When converting between unsigned integer types and float types, the compiler implements software emulation. Therefore conversion to and from unsigned integers is much slower than conversion to and from signed integers.
float f; unsigned int s = (unsigned int)f; // Software emulation
float f; unsigned int s = (unsigned int)(int)f; // FPH2
The round operation performs Nearest Rounding (tie-rounds-away) when converting a float to a signed integer. For example, round converts 4.8 to 5 and -1.5 to -2. Software can invoke the round operation by calling the custom instruction directly, or by using the #define provided in system.h, which replaces the Newlib lroundf() function.
The -mcustom-<operation> command-line option instructs GCC to call custom instructions instead of emulating the specified operation. The syntax of the -mcustom-<operation> is as follows:
N custom instruction value, an unsigned decimal. For a complete list of the operations and their N values, refer to the table in "Operations".
The following command-line options should be passed to GCC to instruct it to use all operations provided by the FPH2 that can be inferred by GCC. For more information, refer to "Inference".
-mcustom-fabss=224 -mcustom-fnegs=225 -mcustom-fcmpnes=226 -mcustom-fcmpeqs=227 -mcustom-fcmpges=228 -mcustom-fcmpgts=229 -mcustom-fcmples=230 -mcustom-fcmplts=231 -mcustom-fmins=232 -mcustom-fmaxs=233 -mcustom-round=248 -mcustom-fixsi=249 -mcustom-floatis=250 -mcustom-fmuls=252 -mcustom-fadds=253 -mcustom-fsubs=254 -mcustom-fdivs=255
GCC supports pragmas located in source code files to override the -mcustom command-line options. The pragmas affect the entire source file.
#pragma GCC target(“custom-<operation>=N”)The following pragma tells GCC to use the software emulation instead of the custom instruction to implement the specified floating-point operation:
#pragma GCC targer(“no-custom-<operation>”)
If you specify the -mcustom-fpu-cfg option on the GCC linker command line, it chooses a precompiled Newlib library with floating-point support. The precompiled libraries only use operations (add, subtract, multiply, and divide) supported by FPH1.
There are options provided by GCC and are not only provided by Nios II GCC. However, these options have Nios II specific behaviors in some cases.
“Do not set ERRNO after calling math functions that are executed with a single instruction, e.g., sqrt. A program that relies on IEEE exceptions for math error handling may want to use this flag for speed while maintaining IEEE arithmetic compatibility.”
If you specify -fno-math-errno on the GCC command line, the compiler maps calls to sqrtf() directly to the fsqrts custom instruction. Otherwise, by default GCC adds several instructions after the fsqrts custom instruction to check for a NaN result, indicating an attempt to take the square root of a negative number. If fsqrts returns NaN, the code calls the Newlib sqrtf() function to set the C errno variable.
Typically, this overhead is undesirable. Altera recommends that you enable -fno-math-errno to eliminate the overhead of calling sqrtf().
If you use the Nios II SBT, the generated makefiles set -fno-math-errno by default. You can override this behavior by setting -fmath-errno in the CPPFLAGS make variable.
The -ffinite-math-only option also eliminates the overhead of checking for NaN result for square root. However, this option also has other effects. Refer to "-ffinite-math-only" for details about this option.
“Treat floating-point constants as single-precision constants instead of implicitly converting them to double-precision constants.”
For FPH2, the Nios II SBT omits -fsingle-precision-constant from the makefile GCC command line by default. This behavior contrasts with SBT support for FPH1, which sets this option with -mcustom-fpu-cfg. The SBT does not use -fsingle-precision-constant for FPH2 because it can cause problems for double-precision code.
You can enable -fsingle-precision-constant if you are sure it will not cause problems for your code. In general, it is better to cast floating-point constants to the float type, or use the 'f' suffix (for example 3.14f), because these approaches are localized and independent of compiler options.
“Allow optimizations for floating-point arithmetic that (a) assume that arguments and results are valid and (b) may violate IEEE or ANSI standards. When used at link-time, it may include libraries or startup files that change the default FPU control word or other similar optimizations.”
The -funsafe-math-optimizations option is not required, because FPH2 does not implement transcendental functions (sin(), cos(), tan(), atan(), exp(), and log()).
This option would be required if the floating point hardware implemented the transcendental functions. GCC requires this option to ensure that application code does not inadvertently use hardware accelerators that might be problematic.
“Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs.”
Programmers are recommended to experiment with this option to determine how it affects their code.
The -ffinite-math-only option also eliminates the GCC overhead created on calls to sqrtf() like –fno-math-errno.
“Compile code assuming that floating-point operations cannot generate user-visible traps. These traps include division by zero, overflow, underflow, inexact result and invalid operation. This option implies -fno-signaling-nans. Setting this option may allow faster code if one relies on “non-stop” IEEE arithmetic, for example.”
Programmers are recommended to experiment with this option to determine how it affects their code.
“Disable transformations and optimizations that assume default floating point rounding behavior. This is round-to-zero for all floating point to integer conversions, and round-to-nearest for all other arithmetic truncations. This option should be specified for programs that change the FP rounding mode dynamically, or that may be executed with a non-default rounding mode. This option disables constant folding of floating point expressions at compile-time (which may be affected by rounding mode) and arithmetic transformations that are unsafe in the presence of sign-dependent rounding modes.”
Programmers are recommended to experiment with this option to determine how it affects their code.
The Nios II SBT include the Newlib library (C and math) in precompiled and source versions. However, the precompiled Newlib libraries are not recommended for FPH2.
You should compile Newlib from source code with individual –mcustom-<operation> options, selected to match your hardware configuration. This allows Newlib to incorporate the benefits of all FPH2 operations that can be inferred by GCC. If you use the Nios II software build tools, the BSP generator takes care of this for you.
The Newlib isgreater(), isgreaterequal(), isless(), islessequal(), and islessgreater macros defined in math.h use the normal comparison operators (such as. < and >=), so these macros automatically use the FPH2 comparison operations.
The Newlib fmaxf() and fminf() functions return the maximum or minimum numeric value of their arguments. NaN arguments are treated as missing data: if one argument is a NaN and the other numeric, then the functions return the numeric value. The FPH2 fmaxs/fmins operations match this behavior.
Nios II GCC cannot reliably replace calls to the following Newlib floating-point functions with the equivalent custom instruction, even though it has –mcustom-<operation> command-line options and pragma support for them:
Instead, these custom instructions must be invoked directly using the __builtin_custom_* facility of GCC. system.h provides the required #define macros to invoke the custom instructions directly. The Nios II Software Build Tools automatically include this header file in your C source files. For information about built-in functions, refer to the Nios II Custom Instruction User Guide.
The Software Build Tools (SBT) are tools used to create Altera HAL-based Board Support Packages (BSP) and application and library makefiles for embedded software running on a Nios II. These tools come in command-line and Eclipse GUI-based forms.
For more information about the SBT, refer to one of the Nios II Software Developer's Handbooks.
- Examines the system you created in Qsys, and determines the correct GCC flags for your floating point hardware.
- Creates makefile rules to pass the -mcustom-<operation> options to GCC, so it knows to use the available FPH2 operations instead of the software emulation code to implement the specified floating-point operations.
- Creates makefile rules to pass the -fno-math-errno option to GCC, to eliminate the overhead of detecting NaN results and setting the errno variable for calls to sqrtf().
- Adds #define macro declarations to system.h for the Newlib math library routines that GCC does not reliably replace with custom instructions. For more information, refer to "C Macros for float(), fmins(), and fmaxs()".
- Creates makefile rules to generate a correct version of Newlib. Uses the GCC flags determined from your hardware system.
|May 2016||2016.05.03||Enhanced Qsys component editor and software build tools, allowing selective implementation of floating point functions.|
|May 2015||2015.05.22||Initial release.|