FloatingPoint IP Cores User Guide
About FloatingPoint IP Cores
You can customize the IP cores by configuring various parameters to accommodate your needs.
List of FloatingPoint IP Cores
IP Core Name  Function Overview 

Operator Functions  
ALTFP_ADD_SUB  Adder/Subtractor 
ALTFP_DIV  Divider 
ALTFP_MULT  Multiplier 
ALTFP_SQRT  Square Root 
Algebraic and Trancendental Functions  
ALTFP_EXP  Exponential 
ALTFP_INV  Inverse 
ALTFP_INV_SQRT  Inverse Square Root 
ALTFP_LOG  Natural Logarithm 
Trigonometric Functions  
ALTFP_ATAN  Arctangent 
ALTFP_SINCOS  Trigonometric Sine/Cosine 
Other Functions  
ALTFP_ABS  Absolute value 
ALTFP_COMPARE  Comparator 
ALTFP_CONVERT  Converter 
ALTERA_FP_ACC_CUSTOM  An Application Specific Accumulator 
ALTERA_FP_FUNCTIONS  A Collection of FloatingPoint
Functions.
This IP core replaces all other FloatingPoint IP cores listed in this table for Arria 10 devices. 
Complex Functions  
ALTFP_MATRIX_INV  Matrix Inverse 
ALTFP_MATRIX_MULT  Matrix Multiplier 
Installing and Licensing IP Cores
The Quartus^{®} Prime software installs IP cores in the following locations by default:
Location  Software  Platform 

<drive>:\intelFPGA_pro\quartus\ip\altera  Quartus^{®} Prime Pro Edition  Windows 
<drive>:\intelFPGA\quartus\ip\altera  Quartus^{®} Prime Standard Edition  Windows 
<home directory>:/intelFPGA_pro/quartus/ip/altera  Quartus^{®} Prime Pro Edition  Linux 
<home directory>:/intelFPGA/quartus/ip/altera  Quartus^{®} Prime Standard Edition  Linux 
Design Flow
If you are an expert user, and choose to configure the IP core directly through parameterized instantiation in your design, refer to the port and parameter details. The details of these ports and parameters are hidden in the parameter editor.
IP Catalog and Parameter Editor
 Filter IP Catalog to Show IP for active device family or Show IP for all device families. If you have no project open, select the Device Family in IP Catalog.
 Type in the Search field to locate any full or partial IP core name in IP Catalog.
 Rightclick an IP core name in IP Catalog to display details about supported devices, to open the IP core's installation folder, and for links to IP documentation.
 Click Search for Partner IP to access partner IP information on the web.
The parameter editor prompts you to specify an IP variation name, optional ports, and output file generation options. The parameter editor generates a toplevel Quartus^{®} Prime IP file (.ip) for an IP variation in Quartus^{®} Prime Pro Edition projects.
The parameter editor generates a toplevel Quartus IP file (.qip) for an IP variation in Quartus^{®} Prime Standard Edition projects. These files represent the IP variation in the project, and store parameterization information.
The Parameter Editor
 Use the Presets window to apply preset parameter values for specific applications (for select cores).
 Use the Details window to view port and parameter descriptions, and click links to documentation.
 Click Generate > Generate Testbench System to generate a testbench system (for select cores).
 Click Generate > Generate Example Design to generate an example design (for select cores).
 Click Validate System Integrity to validate a system's generic components against companion files. (Qsys Pro systems only)
 Click Sync All System Infos to validate a system's generic components against companion files. (Qsys Pro systems only)
The IP Catalog is also available in Qsys and Qsys Pro (View > IP Catalog). The Qsys IP Catalog includes exclusive system interconnect, video and image processing, and other systemlevel IP that are not available in the Quartus^{®} Prime IP Catalog. Refer to Creating a System with Qsys Pro or Creating a System with Qsys for information on use of IP in Qsys and Qsys Pro, respectively.
Generating IP Cores ( Quartus Prime Pro Edition)
Follow these steps to locate, instantiate, and customize an IP variation in the parameter editor:
 Click Tools > IP Catalog. To display details about device support, installation location, versions, and links to documentation, rightclick any IP component name in the IP Catalog.
 To locate a specific type of component, type some or all of the component’s name in the IP Catalog search box. For example, type memory to locate memory IP components, or axi to locate IP components with AXI in the IP name. Apply filters to the IP Catalog display from the rightclick menu.
 To launch the parameter editor, doubleclick any component. Specify a toplevel name for your custom IP variation. The parameter editor saves the IP variation settings in a file named <your_ip> .ip. Click OK. Do not include spaces in IP variation names or paths.

Set the parameter values in the parameter editor and view the
block diagram for the component. The Parameterization Messages tab at the bottom displays any errors
in IP parameters:
 Optionally select preset parameter values if provided for your IP core. Presets specify initial parameter values for specific applications.
 Specify parameters defining the IP core functionality, port configurations, and devicespecific features.
 Specify options for processing the IP core files in other EDA tools.
Note: Refer to your IP core user guide for information about specific IP core parameters.  Click Generate HDL. The Generation dialog box appears.
 Specify output file generation options, and then click Generate. The synthesis and/or simulation files generate according to your specifications.
 To generate a simulation testbench, click Generate > Generate Testbench System. Specify testbench generation options, and then click Generate.
 To generate an HDL instantiation template that you can copy and paste into your text editor, click Generate > Show Instantiation Template.
 Click Finish. Click Yes if prompted to add files representing the IP variation to your project.

After generating
and instantiating your IP variation, make appropriate pin assignments to
connect ports.
Note: Some IP cores generate different HDL implementations according to the IP core parameters. The underlying RTL of these IP cores contains a unique hash code that prevents module name collisions between different variations of the IP core. This unique code remains consistent, given the same IP settings and software version during IP generation. This unique code can change if you edit the IP core's parameters or upgrade the IP core version. To avoid dependency on these unique codes in your simulation environment, refer to Generating a Combined Simulator Setup Script.
IP Core Generation Output ( Quartus Prime Pro Edition)
File Name 
Description 

<my_ip>.ip 
Toplevel IP variation file that contains the parameterization of an IP core in your project. If the IP variation is part of a Qsys Pro system, the parameter editor also generates a .qsys file. 
<my_ip>.cmp  The VHDL Component Declaration (.cmp) file is a text file that contains local generic and port definitions that you use in VHDL design files. 
<my_ip>_generation.rpt  IP or Qsys generation log file. A summary of the messages during IP generation. 
<my_ip>.qgsimc (Qsys Pro systems only) 
Simulation caching file that compares the .qsys and .ip files with the current parameterization of the Qsys Pro system and IP core. This comparison determines if Qsys Pro can skip regeneration of the HDL. 
<my_ip>.qgsynth (Qsys Pro systems only) 
Synthesis caching file that compares the .qsys and .ip files with the current parameterization of the Qsys Pro system and IP core. This comparison determines if Qsys Pro can skip regeneration of the HDL. 
<my_ip>.qip 
Contains all information to integrate and compile the IP component. 
<my_ip>.csv  Contains information about the upgrade status of the IP component. 
<my_ip>.bsf 
A symbol representation of the IP variation for use in Block Diagram Files (.bdf). 
<my_ip>.spd 
Required input file for ipmakesimscript to generate simulation scripts for supported simulators. The .spd file contains a list of files you generate for simulation, along with information about memories that you initialize. 
<my_ip>.ppf  The Pin Planner File (.ppf) stores the port and node assignments for IP components you create for use with the Pin Planner. 
<my_ip>_bb.v  Use the Verilog blackbox (_bb.v) file as an empty module declaration for use as a blackbox. 
<my_ip>.sip  Contains information you require for NativeLink simulation of IP components. Add the .sip file to your Quartus^{®} Prime Standard Edition project to enable NativeLink for supported devices. The Quartus^{®} Prime Pro Edition software does not support NativeLink simulation. 
<my_ip>_inst.v or _inst.vhd  HDL example instantiation template. Copy and paste the contents of this file into your HDL file to instantiate the IP variation. 
<my_ip>.regmap  If the IP contains register information, the Quartus^{®} Prime software generates the .regmap file. The .regmap file describes the register map information of master and slave interfaces. This file complements the ..sopcinfo file by providing more detailed register information about the system. This file enables register display views and user customizable statistics in System Console. 
<my_ip>.svd 
Allows HPS System Debug tools to view the register maps of peripherals that connect to HPS within a Qsys Pro system. During synthesis, the Quartus^{®} Prime software stores the .svd files for slave interface visible to the System Console masters in the .sof file in the debug session. System Console reads this section, which Qsys Pro queries for register map information. For system slaves, Qsys Pro accesses the registers by name. 
<my_ip>.v <my_ip>.vhd  HDL files that instantiate each submodule or child IP core for synthesis or simulation. 
mentor/ 
Contains a ModelSim^{®} script msim_setup.tcl to set up and run a simulation. 
aldec/ 
Contains a RivieraPRO script rivierapro_setup.tcl to setup and run a simulation. 
/synopsys/vcs /synopsys/vcsmx 
Contains a shell script vcs_setup.sh to set up and run a VCS^{®} simulation. Contains a shell script vcsmx_setup.sh and synopsys_sim.setup file to set up and run a VCS MX^{®} simulation. 
/cadence 
Contains a shell script ncsim_setup.sh and other setup files to set up and run an NCSIM simulation. 
/submodules  Contains HDL files for the IP core submodule. 
<IP submodule>/  For each generated IP submodule directory Qsys Pro generates /synth and /sim subdirectories. 
Generating IP Cores ( Quartus Prime Standard Edition)
 In the IP Catalog (Tools > IP Catalog), locate and doubleclick the name of the IP core to customize. The parameter editor appears.
 Specify a toplevel name and output HDL file type for your IP variation. This name identifies the IP core variation files in your project. Click OK. Do not include spaces in IP variation names or paths.
 Specify the parameters and options for your IP variation in the parameter editor. Refer to your IP core user guide for information about specific IP core parameters.

Click
Finish or
Generate (depending on the parameter editor
version). The parameter editor generates the files for your IP variation
according to your specifications. Click
Exit if prompted when generation is complete.
The parameter editor adds the toplevel
.qip file to the current project
automatically.
Note: For devices released prior to Arria^{®} 10 devices, the generated .qip and .sip files must be added to your project to represent IP and Qsys systems. To manually add an IP variation generated with legacy parameter editor to a project, click Project > Add/Remove Files in Project and add the IP variation .qip file.
Upgrading IP Cores
Icons in the Upgrade IP Components dialog box indicate when IP upgrade is required, optional, or unsupported for an IP variation in the project. Upgrade IP variations that require upgrade before compilation in the current version of the Quartus^{®} Prime software.
IP Core Status  Description 

IP Upgraded 
Indicates that your IP variation uses the latest version of the IP core. 
IP Upgrade Optional 
Indicates that upgrade is optional for this IP variation in the current version of the Quartus^{®} Prime software. Optionally, upgrade this IP variation to take advantage of the latest development of this IP core. Retain previous IP core characteristics by declining to upgrade. Refer to the Description for details about IP core version differences. If you do not upgrade the IP, the IP variation synthesis and simulation files remain unchanged, and you cannot modify parameters until upgrading. 
IP Upgrade Required 
Indicates that you must upgrade the IP variation before compiling in the current version of the Quartus^{®} Prime software. Refer to the Description for details about IP core version differences. 
IP Upgrade Unsupported 
Indicates that Quartus^{®} Prime software does not support upgrade of the IP variation due to incompatibility in the current software version. The Quartus^{®} Prime software prompts you to replace the unsupported IP core with equivalent IP core from the IP Catalog. Refer to the Description for details about IP core version differences and links to Release Notes. 
IP End of Life 
Indicates that Altera designates the IP core as endoflife status. You may or may not be able to edit the IP core in the parameter editor. Support for this IP core discontinues in future releases of the Quartus^{®} Prime software. 
IP Upgrade Mismatch Warning 
Provides warning of noncritical IP core differences in migrating IP to another device family. 
Follow these steps to upgrade IP cores:

In the latest version of the
Quartus^{®} Prime software, open the
Quartus^{®} Prime project containing an outdated IP core variation. The
Upgrade IP Components dialog box
automatically displays the status of IP cores in your project, along with
instructions for upgrading each core. To access this dialog box manually, click
Project > Upgrade IP Components.
 To upgrade one or more IP cores that support automatic upgrade, ensure that you turn on the Auto Upgrade option for the IP core(s), and click Perform Automatic Upgrade. The Status and Version columns update when upgrade is complete. Example designs provided with any Altera FPGA IP core regenerate automatically whenever you upgrade an IP core.

To manually upgrade an individual IP core, select the IP core
and click Upgrade in Editor (or simply
doubleclick the IP core name). The parameter editor opens, allowing you to
adjust parameters and regenerate the latest version of the IP core.
Figure 5. Upgrading IP Cores
Note: IP cores older than Quartus^{®} Prime software version 12.0 do not support upgrade. Altera verifies that the current version of the Quartus^{®} Prime software compiles the previous two versions of each IP core. The Altera FPGA IP Core Release Notes reports any verification exceptions for Altera IP cores. Altera does not verify compilation for IP cores older than the previous two releases.
Migrating IP Cores to a Different Device
 To display the IP cores that require migration, click Project > Upgrade IP Components. The Description field provides migration instructions and version differences.
 To migrate one or more IP cores that support automatic upgrade, ensure that the Auto Upgrade option is turned on for the IP core(s), and click Perform Automatic Upgrade. The Status and Version columns update when upgrade is complete.
 To migrate an IP core that does not support automatic upgrade, doubleclick the IP core name, and click OK. The parameter editor appears. If the parameter editor specifies a Currently selected device family, turn off Match project/default, and then select the new target device family.
 Click Generate HDL, and confirm the Synthesis and Simulation file options. Verilog HDL is the default output file format. If you specify VHDL as the output format, select VHDL to retain the original output format.
 Click Finish to complete migration of the IP core. Click OK if the software prompts you to overwrite IP core files. The Device Family column displays the new target device name when migration is complete.

To ensure correctness, review the latest parameters in the
parameter editor or generated HDL.
Figure 6. IP Core Device MigrationNote: IP migration may change ports, parameters, or functionality of the IP variation. These changes may require you to modify your design or to reparameterize your IP variant. During migration, the IP variation's HDL generates into a library that is different from the original output location of the IP core. Update any assignments that reference outdated locations. If a symbol in a supporting Block Design File schematic represents your upgraded IP core, replace the symbol with the newly generated <my_ip> .bsf. Migration of some IP cores requires installed support for the original and migration device families.
FloatingPoint IP Cores General Features
 Support for floatingpoint formats.
 Input support for notanumber (NaN), infinity, zero, and normal numbers.
 Optional asynchronous input ports including asynchronous clear (aclr) and clock enable (clk_en).
 Support for roundtonearesteven rounding mode.
 Compute results of any mathematical operations according to the IEEE754 standard compliance with a maximum of 1 unit in the last place (u.l.p.) error. This assumption is applied to all floatingpoint IP cores excluding complex matrix multiplication and inverse operations (for example, ALTFP_MATRIX_MULTI and ALFP_MATRIX_INV), where a slight increase in errors is observed due to the accumulation of errors during the mathematical operation.
Altera floatingpoint IP cores do not support denormal number inputs. If the input is a denormal value, the IP core forces the value to zero and treats the value as a zero before going through any operation.
IEEE754 Standard for FloatingPoint Arithmetic
 Floatingpoint numbers
 Special values (zero, infinity, denormal numbers, and NaN bit combinations)
 Singleprecision, doubleprecision, and singleextended precision formats for floatingpoint numbers
FloatingPoint Formats
For a normal floatingpoint number, a leading 1 is always implied, for example, binary 1.0011 or decimal 1.1875 is stored as 0011 in the mantissa field. This format saves the mantissa field from using an extra bit to represent the leading 1. However, the leading bit for a denormal number can be either 0 or 1. For zero, infinity, and NaN, the mantissa field does not have an implied leading 1 nor any explicit leading bit.
SinglePrecision Format
 The MSB holds the sign bit.
 The next 8 bits hold the exponent bits.
 23 LSBs hold the mantissa.
The total width of a floatingpoint number in the singleprecision format is 32 bits. The bias for the singleprecision format is 127.
DoublePrecision Format
 The MSB holds the sign bit.
 The next 11 bits hold the exponent bits.
 52 LSBs hold the mantissa.
The total width of a floatingpoint number in the doubleprecision format is 64 bits. The bias for the doubleprecision format is 1023.
SingleExtended Precision Format
 The MSB holds the sign bit.
 The exponent and mantissa fields do not have fixed widths.
 The minimum exponent field width is 11 bits and must be less than the width of the mantissa field.
 The width of the mantissa field must be a minimum of 31 bits.
The sum of the widths of the sign bit, exponent field, and mantissa field must be a minimum of 43 bits and a maximum of 64 bits. The bias for the singleextended precision format is unspecified in the IEEE754 standard. In these IP cores, a bias of 2 ^{(}WIDTH_EXP–1^{)}–1 is assumed for the singleextended precision format.
Special Case Numbers
Meaning  Sign Field  Exponent Field  Mantissa Field 

Zero  Don’t care  All 0’s  All 0’s 
Positive Denormalized  0  All 0’s  Nonzero 
Negative Denormalized  1  All 0’s  Nonzero 
Positive Infinity  0  All 1’s  All 0’s 
Negative Infinity  1  All 1’s  All 0’s 
NotaNumber (NaN)  Don’t care  All 1’s  Nonzero 
Rounding
 roundtonearesteven
 roundtowardzero
 roundtowardpositiveinfinity
 roundtowardnegativeinfinity
Altera floatingpoint IP cores support only the most commonly used rounding mode, which is the roundtonearesteven mode (TO_NEAREST). With roundtonearesteven, the IP core rounds the result to the nearest floatingpoint number. If the result is exactly halfway between two floatingpoint numbers, the IP core rounds the result so that the LSB becomes a zero, which is even.
NonIEEE754 Standard Format
The fixedpoint data type is similar to the conventional integer data type, except that the fixedpoint data carries a predetermined number of fractional bits. If the width of the fraction is 0, the data becomes a normal signed integer.
The notation for fixedpoint format numbers in this user guide is Qm.f, where Q designates that the number is in Q format notation, m is the number of bits used to indicate the integer portion of the number, and f is the number of bits used to indicate the fractional portion of the number.
For example, Q4.12 describes a number with 4 integer bits and 12 fractional bits in a 16bit word.
The following figures show the difference between the signedinteger format and the fixedpoint format for a 32bit number.
FloatingPoints IP Cores Output Latency
For specific details about latency options, refer to the Output Latency section of your selected IP core in this user guide.
FloatingPoint IP Cores Design Example Files
Simulate the designs in the ModelSim^{®}Altera software to generate a waveform display of the device behavior. You must be familiar with the ModelSimAltera software before trying out the design examples.
FloatingPoint IP Cores  Design Files 

ALTFP_ADD_SUB 

ALTFP_DIV 

ALTFP_MULT 

ALTFP_SQRT 

ALTFP_EXP 

ALTFP_INV 

ALTFP_INV_SQRT 

ALTFP_LOG 

ALTFP_ATAN  Not Available 
ALTFP_SINCOS  Not Available 
ALTFP_ABS 

ALTFP_COMPARE 

ALTFP_CONVERT 

ALTERA_FP_ACC_CUSTOM  Not Available 
ALTERA_FP_FUNCTIONS  Not Available 
ALTERA_FP_MATRIX_INV 

ALTERA_FP_MATRIX_MULT  Not Available 
VHDL Component Declaration
The VHDL component declaration is located in the < Quartus^{®} Prime installation directory>\libraries\vhdl\altera_mf\altera_mf_components.vhd
VHDL LIBRARYUSE Declaration
LIBRARY altera_mf;
USE altera_mf_altera_mf_components.all;
ALTERA_FP_MATRIX_INV IP Core
This IP core enables you to perform matrix inversion operation using a combination of Cholesky decomposition, triangular matrix inversion, and matrix multiplication.
ALTERA_FP_MATRIX_INV Features
 Inversion of a matrix.
 Support for floatingpoint format in single precision.
 Support for VHDL and Verilog HDL languages.
 Support for matrix sizes up to are 4 × 4, 6 × 6, 8 × 8, 16 ×16, 32 × 32, and 64 × 64.
 Use of control signal, load.
 Use of handshaking signals: busy, outvalid, and done.
ALTERA_FP_MATRIX_INV Output Latency
ALTERA_FP_MATRIX_INV Resource Utilization and Performance
Precision  Matrix Size  Blocks  Logic usage  Latency  Throughput (kb/s)  Giga FloatingPoint Operations per Second (GFLOPS)  f_{MAX} (MHz)  

Adaptive Logic Modules (ALMs)  DSP Usage (18 x 18 DSPs)  M9K  M144K  Memory (Bits)  
Single  4× 4  2  21159  222  139  —  19919  Pending  Pending  Pending  221 
6 × 6  2  59827  574  90  —  15759  Pending  Pending  Pending  170  
8 × 8  2  5,538  63  49  —  53,736  2,501  3,987  15.26  332  
16 × 16  4  8,865  95  80  —  138,051  11,057  855  30.93  329  
32 × 32  8  15,655  159  193  —  699,164  52,625  165  55.12  290  
64 × 64  16  29,940  287  386  22  4,770,369  281,505  25  83.16  218 
ALTERA_FP_MATRIX_INV Functional Description
 Cholesky decomposition function.
The Cholesky decomposition function generates a lower triangular matrix.
 Triangular matrix inversion function.
The triangular matrix inversion process then generates the inverse of the lower triangular using backward substitution.
 Matrix multiplication function.
The matrix multiplier multiplies the transpose of the inverse triangular matrix with the inverse triangular matrix.
In linear algebra, the Cholesky decomposition states that every positive definite matrix A is decomposed as A = L×LT
where, L is a lower triangular matrix, and LT denotes the transpose of L.
The property of invertible matrices states that (X×Y)1 = X1×Y1 and the property of transpose states that (XT )1 = (X1)T. Combining these two properties, the following equation represents a derivation of a matrix inversion using the Cholesky decomposition method:
A1 = (L×LT)1
= (LT)1 × L1
= (L1)T × L1
where a Cholesky decomposition function is needed to obtain L, a triangular matrix inversion is needed to obtain L1, and a matrix multiplication is needed for (L1)T × L1.
Cholesky Decomposition Function
The other memory block is the processing matrix block which consists of multiple column memories to enable an entire row to be read at once. During the loading of the input memory, the FPC datapath preprocesses the input elements to generate the first column of the resulting triangular matrix. The top element of the first column, l00, is the square root of the input matrix value a00. The rest of the first column, li0 is the input value ai0 divided by l00. This preprocessing step introduces latency into the load, during which the INIT_BUSY signal is asserted. The CALCULATE signal initiates and starts processing after the INIT_BUSY signal is deasserted.
This figure shows the toplevel architecture of the Cholesky decomposition function, where the monolithic input memory and the columnwise processing memory, also known as the vector matrix, are shown. The gray block is the FPC datapath section.
Although the Cholesky decomposition algorithm only operates on the lower triangular matrix, the core requires the entire matrix to be loaded, during which the processing or vector memory is initialized.
The FPC datapath is split into two sections. The first section, also known as the vector section, takes the inner product of two vectors and subtracts it from the input matrix element, a_{ij}. The second section, also known as the root section, calculates square roots and performs division by the square root. The first element is loaded into both inputs of the root section and the outcome is its own square root. The first element continues to stay latched in the left input field of the root section while all the other elements of the first column are loaded into the right input field. The resulting output is the value of the respective column element divided by the value of the first element of the Cholesky decomposition matrix.
During processing, two rows from the processing matrix are loaded. For the first element in each new column, both rows have the same index; hence contain the same values. The first row is latched into the input register of the vector section. For the rest of the column, the row index is increased, and a new a_{ij} element and triangular matrix vector, L_{j} is loaded. The first result out of the vector section is latched onto the left register of the root section. All results from the column, including the first result, are loaded into the right register of the root section. The root section generates the square root of the first vector result, while for the other results coming from the vector section, the number is divided by the square root of the first result.
All calculated values are written to another memory block for further processing. The first column values are output singly during preprocessing, while the values of other columns are burst out during processing.
There are only minor differences between the architectures for real and complex matrices. For the complex matrix, both the input and processing memory blocks contain complex values. Similarly, all values going into the vector section are complex numbers. The complex conjugate of the latched register is obtained by simply inverting the sign bit. As for the root section, the structure is simplified by the nature of the positive definite matrix. The diagonal value, which is the first value at the top of each column in the decomposition, is always a real number so that the result from the inverse square root calculation is always a real number. The complex multiplier in the root section is therefore a real scalar, so only two real multipliers are required.
Triangular Matrix Inversion
for j = n:1:1,
X(j,j) = 1/L(j, j);
for k = j+1:n
for i = j+1:n
X(k, j) = X(k, j) + X(k, i)*L(i, j);
end;
end;
for k = j+1:n
X(k, j) = X(j, j)*X(k, j);
end;
The pseudo code is converted into an RTL file. The result, L1 is stored in the input matrix storage in the Cholesky decomposition function.
Matrix Multiplication
Matrix Inversion Operation
The following sequence describes the matrix inversion operation:
 The operation begins when the enable signal is asserted and the reset signal is deasserted.
 The load signal is asserted to load data from the loaddata[] port for the input matrix. As long as the load signal is high, data is loaded continuously for the input matrix.
 The busy signal is asserted and the done signal is deasserted for a few clock cycles after the datain[] signal is asserted.
 The outvalid signal is asserted multiple times to signify the availability of valid data on the dataout[] port. The number of times this signal is asserted equals the number of rows found in the output matrix.
 The busy and done signals are asserted when the last row of the output matrix has been burst out. This assertion signifies the end of the matrix inversion operation on the first set of data.
ALTERA_FP_MATRIX_INV Design Example: Matrix Inverse of SinglePrecision Format Numbers
ALTERA_FP_MATRIX_INV Design Example: Understanding the Simulation Results
This design example implements a floatingpoint matrix inversion to calculate the inverse value of matrices in singleprecision formats. The optional input ports (enable and reset) are enabled.
Time  Event 

0 ns – 10 ns  Start sequence:

19.86 ns – 340 ns  Matrix input data load:

27.5 ns  Processing stage:

12527.5 – 12922.5 ns  Output stage:

Sample Matrix Data
The following two sets of results are computed:
 PCbased results—these are results obtained from running the simulation in Matlab.
 FPGAbased results—these are results obtained from running the simulation in ModelSim.
This table lists the input and output data values presented in IEEE754 Floatingpoint format.
Matrix  Data 

Input Matrix  40c89c6c 40b16187
40e21dfb 40847306 40c00d1d 40bbf0c4 40be4fc1 40953a30
40b16187 41244acb 410e61b9 40defe3a 40f8e982 40eff916 410e0ff4 41121d78 40e21dfb 410e61b9 41217d87 40d7f5f4 40fd78fa 410618c0 41060327 40ff4517 40847306 40defe3a 40d7f5f4 40b10427 40b6be88 40bbff4a 40d12685 40ca69f9 40c00d1d 40f8e982 40fd78fa 40b6be88 41146829 40ee188a 40fa2d80 40cf065c 40bbf0c4 40eff916 410618c0 40bbff4a 40ee188a 40ecbddf 40e3aa3a 40d60773 40be4fc1 410e0ff4 41060327 40d12685 40fa2d80 40e3aa3a 4111ed09 40ecd83c 40953a30 41121d78 40ff4517 40ca69f9 40cf065c 40d60773 40ecd83c 410847da 
PCbased Output Matrix  42148e03 42f5794f
421b33f4 430e0587 41ff0d66 c2f579a3 c2df1c28 c2f945bc
42f5794f 43d60be5 430944db 43f2dd63 42da2dd0 c3d1dd59 c3bff960 c3d98c47 421b33f4 430944db 424b067c 43204d17 421907da c3107054 c2fc035b c30d24b3 430e0587 43f2dd63 43204d17 440cc66b 43002bbb c3f4e779 c3dcd667 c3f7e3f3 41ff0d66 42da2dd0 421907da 43002bbb 41f5048b c2e44480 c2c91e6d c2df60c9 c2f579a3 c3d1dd59 c3107054 c3f4e779 c2e44480 43d89b61 43c003b9 43d685d3 c2df1c28 c3bff960 c2fc035b c3dcd667 c2c91e6d 43c003b9 43ae19b0 43c37f99 c2f945bc c3d98c47 c30d24b3 c3f7e3f3 c2df60c9 43d685d3 43c37f99 43ddb1bc 
FPGAbased Output Matrix  42148d06 42f5773e
421b32c4 430e0484 41ff0bb7 c2f577f4 c2df1a71 c2f943b1
42f5773e 43d609cf 430943a0 43f2db4a 42da2c09 c3d1db95 c3bff79e c3d98a34 421b32c4 430943a0 424b0515 43204be2 421906da c3106f53 c2fc014f c30d237c 430e0484 43f2db4a 43204be2 440cc563 43002adf c3f4e5c0 c3dcd4a7 c3f7e1df 41ff0bb7 42da2c09 421906da 43002adf 41f50322 c2e44314 c2c91cf5 c2df5f08 c2f577f4 c3d1db95 c3106f53 c3f4e5c0 c2e44314 43d899f3 43c00242 43d68414 c2df1a71 c3bff79e c2fc014f c3dcd4a7 c2c91cf5 43c00242 43ae1837 43c37dda c2f943b1 c3d98a34 c30d237c c3f7e1df c2df5f08 43d68414 43c37dda 43ddafad 
The difference between each result element of the PCbased and FPGAbased output matrices are as shown:
Result differences (in decimal)
253 529 304 259 431 431 439 523
529 534 315 537 455 452 450 531
304 315 359 309 256 257 524 311
259 537 309 264 220 441 448 532
431 455 256 220 361 364 376 449
431 452 257 441 364 366 375 447
439 450 524 448 376 375 377 447
523 531 311 532 449 447 447 527
The difference between the two output matrices are due to the following reasons:
 Method of processing—Matlab uses sequential processing while Modelsim uses parallel processing.
 Method of conversion—Matlab first computes in doubleprecision format, and then only converts the result into singleprecision format. During this conversion, some units in the last place (ulp) are expected to be lost.
ALTERA_FP_MATRIX_INV Signals
Port Name  Required  Description 

sysclk  Yes  The clock input to the ALTERA_FP_MATRIX_INV IP core. This is the main system clock. All operations occur on the rising edge. 
enable  No  Optional port. Allow calculation to take place when asserted. When deasserted, no operation will take place and the outputs are unchanged. 
reset  No  Optional port. The core resets asynchronously when the reset signal is asserted. 
load  Yes  When asserted, loads the LOADDATA bus into the memory. 
loaddata  Yes  Singleprecision 32bit matrix input value. Matrices load row by row. 
Port Name  Required  Description 

ready  Yes  When asserted, the core preprocesses the input data. The calculate signal cannot be asserted until the ready signal is low. 
outdata  Yes  Singleprecision 32bit matrix result value. The matrix result value is written out row by row. 
outvalid  Yes  When asserted, a valid output data is available. An entire row of the result matrix is written out as a burst. There is a gap between row outputs, which will depend on the parameters. 
done  Yes  When asserted, the last output has been written. A new matrix multiply can be started with calculate. done will follow ready by some fixed amount, depending on the parameters. 
ALTERA_FP_MATRIX_INV Parameters
Port Name  Type  Required  Description 

BLOCKS  Integer  No  The number of memory blocks for the doublebuffered storage of matrix multiplication. The allowable range is from 2 to 16. 
DIMENSION  Integer  Yes  The number of rows in
the matrix. As the matrix is square, this is also the number of columns in the
matrix. The supported dimensions are 4 x 4, 6 x 6, 8 x 8, 16 x 16, 32 x 32, and
64 x 64. The maximum supported input dimension is 64 × 64.
This parameter also acts as the VECTORSIZE when calling the ALTERA_FP_MATRIX_MULT IP core internally. 
WIDTH_EXP  Integer  Yes  Specifies the precision of the exponent. The bias of the exponent is always set to 2(WIDTH_EXP1) 1 (that is, 127 for singleprecision format). WIDTH_EXP must be 8 for singleprecision format and must be less than WIDTH_MAN. The available value for WIDTH_EXP is 8. 
WIDTH_MAN  Integer  Yes  Specifies the precision of the mantissa. WIDTH_MAN must be 23 when WIDTH_EXP is 8. Otherwise, WIDTH_MAN must be a minimum of 31. WIDTH_MAN must be greater than WIDTH_EXP. The sum of WIDTH_EXP and WIDTH_MAN must be less than 64. Current available value for WIDTH_MAN is only 23 for single precision. 
ALTERA_FP_MATRIX_MULT IP Core
This IP core performs floatingpoint multiplication between two matrices.
ALTERA_FP_MATRIX_MULT Features
 Multiplication of two matrices.
 Support for floatingpoint formats in single and double precisions.
 Support for configurable performance and resource usage.
 Avalon streaming interfaces and full QSys compliance.
ALTERA_FP_MATRIX_MULT Output Latency
ALTERA_FP_MATRIX_MULT Resource Utilization and Performance
Family  Data Format  Matrix A Size  Matrix B Size  Vector Size  Memory Blocks  ALMs  M20ks  DSP Blocks  FMax (MHz)  Latency (cycles) ^{1} 

Arria 10 (10AX066H2F34I2LP)  Single  8x8  8x8  8  4  979  12  8  409  131 
16x16  16x16  8  4  1052  12  8  408  595  
32x32  32x32  16  8  1579  25  16  373  2155  
64x64  64x64  32  16  2677  49  32  379  8339  
Stratix V (5SGXEA7K2F40C2)  Single  8x8  8x8  8  4  2637  14  8  404  125 
16x16  16x16  8  4  2868  15  8  367  588  
32x32  32x32  16  8  5427  27  16  356  2146  
64x64  64x64  32  16  10311  51  32  348  8328 
ALTERA_FP_MATRIX_MULT Functional Description
The following figure shows the equation:
The matrix A and B can be loaded when the ready signal on their respective interfaces are asserted. When the input matrices are loaded, the core will start computing the output. Valid signal on the output interface will be asserted to indicate valid output data. The input data may be loaded at any time the ready signal is asserted even when the previously loaded data is still being computed.
The ALTERA_FP_MATRIX_MULT IP core consists of the following components:
 Memory blocks for the matrix A storage
 Memory blocks for the matrix B storage
 Dot product
 Accumulator
Figure 19. TopLevel View of the ALTERA_FP_MATRIX_MULT IP Core. This figure shows the toplevel view of the ALTERA_FP_MATRIX_MULT IP core.
The following lists the key features of the architecture:
 Matrix A and B storage are double buffered to allow processing to happen in parallel with data loading.
 Where the number of columns of A (A_COLUMNS) and rows of B (same as A_COLUMNS) are greater than the size of the dot product (VECTOR_SIZE), the rows of A and columns of B are divided into sub rows and sub columns respectively, each containing VECTOR_SIZE elements. In this case, A_COLUMNS/VECTOR_SIZE iterations are needed to compute a full dot product corresponding to a single output element.
 Matrix B memory has sufficient bandwidth so that all the data needed for the dot product can be loaded at once.
 Matrix A memory is allocated with less bandwidth. The bandwidth of the matrix A is a parameter (NUM_BLOCKS) that you can control. A sub row of matrix A is loaded into local registers over a number of cycles before an iteration of the dot product. Once a sub row of Matrix A has been loaded into local registers, all partial dot products involving that sub row are computed before another sub row is loaded.
 For Arria 10 devices, where hardened single precision floatingpoint DSP blocks exist, those will be used for single precision floating point arithmetic.
The matrix multiply architecture is not optimized for sparse matrices and constant matrices.
ALTERA_FP_MATRIX_MULT Signals
These tables list the signals for the ALTERA_FP_MATRIX_MULT IP core.
Port Name  Required  Description 

clk  Yes  The clock input port for the IP core. 
reset_n  No  Asynchronous active low reset port. 
a_data  Yes  Matrix A input data. 
a_valid  Yes  Matrix A Avalon streaming valid signal. When this signal is asserted, data on a_data is valid. 
b_data  Yes  Matrix B input data. 
b_valid  Yes  Matrix B Avalon streaming valid signal. When this signal is asserted, data on b_data is valid. 
c_ready  Yes  Matrix C Avalon streaming ready signal. Ready latency is 0. 
Port Name  Required  Description 

a_ready  Yes  Matrix A Avalon streaming ready signal. Ready latency is 0. 
b_ready  Yes  Matrix B Avalon streaming ready signal. Ready latency is 0. 
c_data  Yes  Matrix C input data. 
c_valid  Yes  Matrix C Avalon streaming valid signal. When this signal is asserted, data on c_data is valid. 
ALTERA_FP_MATRIX_MULT Parameters
Parameter  Value  Description 

Format  Single (32 bit) or Double (64 bit)  The format of the input data. 
Rows in Matrix A  2256  Number of rows in matrix A. 
Columns in Matrix A  8256 Integer multiples of vector size. (Integer multiples of Memory Blocks.)  Number of columns in matrix A. This is also the number of rows in matrix B. 
Rows in Matrix B  8256  Number of rows in matrix B. 
Columns of matrix B  2256  Number of columns in matrix B. 
Vector Size  Allowed values are 8,16, 32, 64, 96, and 128.  The size of the dot product which can be computed in parallel. Where the number of columns of matrix A and rows of matrix
B are greater than Vector Size a number of iterations are required to compute a full dot product.
Vector Size also controls the matrix B memory configuration. Increasing the “Vector Size” increases the matrix B memory bandwidth and the number of memory blocks used. 
Memory Blocks  The Vector Size must be an integer multiple of Memory Blocks.
The number of memory blocks must be smaller than the vector size.
The number of memory blocks must be greater than or equals to the ratio of vector size divided by the number of columns of matrix B. 
Controls the memory configuration of the matrix A storage. Increasing this number increases the memory bandwidth and the number of memory blocks used. 
ALTERA_FP_ACC_CUSTOM IP Core
This IP core performs floatingpoint accumulation and allows you to restrict the range of inputs and maximum accumulated value to save resources. The core uses device latency models to generate RTL to meet a target FMax at the cost of latency.
ALTERA_FP_ACC_CUSTOM Features
 Supports frequency driven cores.
 Supports VHDL RTL generation.
 Supports customization of the required range of the input and output values.
ALTERA_FP_ACC_CUSTOM Output Latency
ALTERA_FP_ACC_CUSTOM Resource Utilization and Performance
Device Family  Input Data  Accumulator Size  Target Frequency (MHz)  Latency  ALMs  DSP Blocks  Logic Registers  M10K  M20K  f_{MAX}  

Floating Point Format  MaxMSBX  MSBA  LSBA  Primary  Secondary  
Arria V (5AGXFB3H4F40C5)  Double  24  40  52  270  15  866  0  1,166  106  0    265 
Cyclone V (5CGXFC7D6F31C7)  Double  24  40  52  230  15  830  0  1,102  32  0    198 
Stratix V (5SGXEA7K2F40C2)  Double  24  40  52  400  15  968  0  1,655  27    0  426 
Arria V (5AGXFB3H4F40C5)  Single  12  20  26  270  12  337  0  588  52  0    309 
Cyclone V (5CGXFC7D6F31C7)  Single  12  20  26  230  12  383  0  494  28  0    225 
Stratix V (5SGXEA7K2F40C2)  Single  12  20  26  400  13  475  0  903  20    0  450 
ALTERA_FP_ACC_CUSTOM Signals
Port Name  Required  Description 

clk  Yes  All input signals, otherwise explicitly stated, must be synchronous to this clock 
areset  Yes  Asynchronous activehigh reset. Deassert this signal synchronously to the input clock to avoid metastability issues. 
en  No  Global enable signal. This port is optional. 
x  Yes  Data input port. 
n  Yes  Boolean port which signals the beginning of a new data set to be accumulated. This should go high together with the first element in the new data set and should go low the next cycle. The data sets may be of variable length and a new data set may be started at any time. The accumulation result for an input will be available after the reported latency. 
Port Name  Required  Description 

r  Yes  The running value of the accumulation. 
xo  Yes  The overflow flag for port x. The signal goes high when the exponent of the input x is larger than maxMSBX. The signal remains high for the entire data set. This flag invalidates port r. You should consider increasing maxMSBX. This flag also indicate infinity and NaN. 
xu  Yes  The underflow flag for port x. The signal goes high when the exponent of the input x is smaller than LSBA. The signal remains high for the entire data set. This flag does not invalidate port r. You should consider lowering LSBA. 
ao  Yes  The overflow flag for Accumulator. The signal goes high when the exponent of the accumulated value is larger than MSBA. The signal remains high for the entire data set. This flag invalidates port r. You should consider increasing MSBA. 
ALTERA_FP_ACC_CUSTOM Parameters
Category  Parameter  Values  Description 

Input Data  Floating point format  single, double  Choose the floating
point format of the input data values. The output data values of the
accumulator is in the same format.
The default is single. 
maxMSBX  —  The maximum weight of
the MSB of an input. For example, when adding probabilities in the 0 to 1 range
set this weight to ceil(log_{2}(1))=0. The
xo output signal goes
high when the MSB of an input value has a weight larger than maxMSBX. The
result of the accumulation is then invalid. If you are unsure about the range
of the inputs, then set the
maxMSBX parameter
to MSBA, at the possible expense of increased resource usage.
The default value is 12. 

Accumulator Size  MSBA  —  The weight of the MSB
of the accumulator. For example, in a financial simulation, if the value of a
stock cannot exceed 100,000 dollars, use a value of ceil(log_{2}(100000))=17.
In a circuit simulation where the circuit adds numbers in the 0 to 1 range, for one year, at 400 MHz, use a value of ceil(log_{2}(365 x 60 x 60 x 24 x 400 x 10^{6}))=54. The ao output signal goes high when the MSB of the accumulated value has a weight larger than MSBA. The result of the accumulation is then invalid. Altera recommends adding a few guard bits to avoid possible accumulator overflow. A few guard bits have little impact on the accumulator size. The default value is 20. 
LSBA  —  The weight of the LSB
of the accumulator and the accuracy of the accumulator. Because an N term
accumulation can invalidate the log_{2}(N)
LSBs of the accumulator, you must consider the length of the accumulation and
the range of the inputs when setting this parameter.
For example, if a 2^{30} accuracy is required over an accumulation of 1024 numbers, then set the LSBA to: (30  log_{2}(1024)) = 40. Any input 2^{e}×1.F, where F is the mantissa and e is less than the LSBA will be shifted out of the accumulator. The au output signal goes high to indicate this situation. The default value is 26. 

Required Performance  Target frequency  Any positive integer value.  Choose the frequency in
MHz at which this core is expected to run. This together with the target device
family will determine the amount of pipelining in the core.
The default value is 200 MHz. 
Optional  Generate an enable port  —  Choose if the
accumulator should have an enable signal.
This parameter is disabled by default. 
Report  —  —  Reports the latency of the device, which is the number of cycles it takes for an accumulation to propagate through the block from input to output. 
ALTFP_ADD_SUB IP Core
This IP core allows you to perform floatingpoint addition or subtraction between two inputs dynamically.
ALTFP_ADD_SUB Features
 Dynamically configurable adder and subtracter functions.
 Optional exception handling output ports such as zero, overflow, underflow, and nan.
 Optimization of speed and area.
ALTFP_ADD_SUB Output Latency
ALTFP_ADD_SUB Truth Table
DATAA[]  DATAB[]  SIGN BIT  RESULT[]  Overflow  Underflow  Zero  NaN 

Normal  Normal  0  Zero  0  0  1  0 
Normal  Normal  0/1  Normal  0  0  0  0 
Normal  Normal  0/1  Denormal  0  1  1  0 
Normal  Normal  0/1  Infinity  1  0  0  0 
Normal  Denormal  0/1  Normal  0  0  0  0 
Normal  Zero  0/1  Normal  0  0  0  0 
Normal  Infinity  0/1  Infinity  1  0  0  0 
Normal  NaN  X  NaN  0  0  0  1 
Denormal  Normal  0/1  Normal  0  0  0  0 
Denormal  Denormal  0/1  Normal  0  0  0  0 
Denormal  Zero  0/1  Zero  0  0  1  0 
Denormal  Infinity  0/1  Infinity  1  0  0  0 
Denormal  NaN  X  NaN  0  0  0  1 
Zero  Normal  0/1  Normal  0  0  0  0 
Zero  Denormal  0/1  Zero  0  0  1  0 
Zero  Zero  0/1  Zero  0  0  1  0 
Zero  Infinity  0/1  Infinity  1  0  0  0 
Zero  NaN  X  NaN  0  0  0  1 
Infinity  Normal  0/1  Infinity  1  0  0  0 
Infinity  Denormal  0/1  Infinity  1  0  0  0 
Infinity  Zero  0/1  Infinity  1  0  0  0 
Infinity  Infinity  0/1  Infinity  1  0  0  0 
Infinity  NaN  X  NaN  0  0  0  1 
NaN  Normal  X  NaN  0  0  0  1 
NaN  Denormal  X  NaN  0  0  0  1 
NaN  Zero  X  NaN  0  0  0  1 
NaN  Infinity  X  NaN  0  0  0  1 
NaN  NaN  X  NaN  0  0  0  1 
ALTFP_ADD_SUB Resource Utilization and Performance
Device Family  Precision  Optimization  Output latency  Adaptive LookUp Tables (ALUTs)  Dedicated Logic Registers (DLRs)  Adaptive Logic Modules (ALMs)  f_{MAX} (MHz) 

Stratix IV  single  speed  7  594  376  385  228 
14  674  686  498  495  
area  7  576  345  375  227  
14  596  603  421  484  
double  speed  7  1,198  687  824  187  
14  997  1,607  1,080  398  
area  7  1,106  630  762  189  
14  904  1,518  1,013  265 
ALTFP_ADD_SUB Design Example: Addition of DoublePrecision Format Numbers
ALTFP_ADD_SUM Design Example: Understanding the Simulation Results
This design example implements a floatingpoint adder for the addition of doubleprecision format numbers. All the optional input ports (clk_en and aclr) and optional output ports (overflow, underflow, zero, and nan) are enabled.
In this example, the output latency of the multiplier is set to 7 clock cycles. Every addition result appears at the result[] port 7 clock cycles after the input values are captured on the dataa[] and datab[] ports.
The following lists the inputs and corresponding outputs obtained from the simulation waveform.
Time  Event 

0 ns, startup 
dataa[] value: 0000 0000 0000 0000h
datab[] value: 7FF0 0000 0000 0000h Output value: All values seen on the output port before the 7th clock cycle are merely due to the behavior of the system during startup and should be disregarded. 
4250 ns  Output value: 7FF0
0000 0000 0000h
Exception handling ports: overflow asserts The addition of zero at the input port dataa[], and infinity value at the input port datab[] results in infinity value. 
40,511 ns 
dataa[] value: 0000 0000 0000 0000h
datab[] value: 0000 0000 1000 0123h The is the addition of a zero and a denormal value. 
43,750 ns  Output value: 0000
0000 0000 0000h
Exception handling ports: zero remains asserted. Denormal inputs are not supported and are forced to zero before addition takes place.This results in a zero. 
ALTFP_ADD_SUB Signals
Port Name  Required  Description 

aclr  No  Asynchronous clear input for floatingpoint adder or subtractor. The source is asynchronously reset when the aclr signal is asserted high. 
add_sub  No  Optional input port to enable dynamic switching between the adder and subtractor functions. The add_sub port must be used when the DIRECTION parameter is set to VARIABLE. When the add_sub port is high, result[] = dataa[] + datab[], otherwise, result[] = dataa[]  datab[]. 
clk_en  No  Clock enable to the floatingpoint adder or subtractor. This port allows addition or subtraction to occur when asserted high. When asserted low, no operations occur and the outputs are unchanged. 
clock  Yes  Clock input to the IP core. 
dataa[]  Yes  Data input to the floatingpoint adder or subtractor. The MSB is the sign bit, the next MSBs are the exponent, and the LSBs are the mantissa bits. The size of this port is the total width of the sign bit, the exponent bits, and the mantissa bits. 
datab[]  Yes  Data input to the floatingpoint adder or subtractor. This port is configured in the same way as dataa[]. 
Port Name  Required  Description 

nan  Yes  NaN exception output. Asserted when an illegal addition or subtraction occurs, such as infinity minus infinity. When an invalid addition or subtraction occurs, a NaN value is output to the result[] port. Any adding or subtracting involving NaN values also produces a NaN value. 
overflow  Yes  Overflow exception port. Asserted when the result of the addition or subtraction, after rounding, exceeds or reaches infinity. Infinity is defined as a number in which the exponent exceeds 2 ^{WIDTH_EXP} 1. 
result[]  Yes  Floatingpoint output result. Like the input values, the MSB is the sign, the next MSBs are the exponent, and the LSBs are the mantissa. The size of this port is the total width of the sign bit, exponent bits, and mantissa bits. 
underflow  Yes  Underflow port for the adder or subtractor. Asserted when the result of the addition or subtraction, after rounding, the value is zero and the inputs are not equal. The underflow port is also asserted when the result is a denormalized number. 
zero  No  Zero port for the adder or subtractor. Asserted when the result[] port is zero. 
ALTFP_ADD_SUB Parameters
Parameter Name  Type  Required  Description 

DIRECTION  String  Yes  Specifies addition or subtraction operations. Values are ADD, SUB, or VARIABLE. If this parameter is not specified, the default is ADD. When the value is VARIABLE, the add_sub port determines whether the operation is addition or subtraction. The add_sub port must be connected if the DIRECTION parameter is set to VARIABLE. If the value is ADD or SUB, the add_sub port is ignored. 
PIPELINE  Integer  No  Specifies the latency in clock cycles used in the ALTFP_ADD_SUB IP core. The PIPELINE parameter supports values of 7 through 14. If this parameter is not specified, the default value is 11. In general, a higher pipeline value produces better f_{MAX} performance. 
ROUNDING  String  Yes  Specifies the rounding mode. The default value is TO_NEAREST. Other rounding modes are currently not supported. 
OPTIMIZE  String  No  Defines the design preference, whether the design is optimized for speed (faster f_{MAX}), or optimized for area (lower resource count). Values are SPEED and AREA. If this parameter is not specified, the default is SPEED. 
WIDTH_EXP  Integer  No  Specifies the precision of the exponent. The bias of the exponent is always set to 2 (WIDTH_EXP1) 1 (that is, 127 for singleprecision format and 1023 for doubleprecision format). The WIDTH_EXP parameter must be 8 for the singleprecision mode and 11 for the doubleprecision mode, or a minimum of 11 for the singleextended precision mode. The WIDTH_EXP parameter must be less than the WIDTH_MAN parameter. The sum of WIDTH_EXP and the WIDTH_MAN parameters must be less than 64. If this parameter is not specified, the default is 8. 
WIDTH_MAN  Integer  No  Specifies the precision of the mantissa. The WIDTH_MAN parameter must be 23 (to comply with the IEEE754 standard for the singleprecision mode) when the WIDTH_EXP parameter is 8. Otherwise, the WIDTH_MAN parameter must have a value that is greater than or equal to 31. The WIDTH_MAN parameter must be greater than the WIDTH_EXP parameter. The sum of the WIDTH_EXP and WIDTH_MAN parameters must be less than 64. If this parameter is not specified, the default is 23. 
ALTFP_DIV IP Core
This IP core performs floatingpoint division operation.
ALTFP_DIV Features
 Division functions.
 Optional exception handling output ports such as zero, division_by_zero, overflow, underflow, and nan.
 Optimization of speed and area.
 Low latency option.
ALTFP_DIV Output Latency
Precision  Mantissa Width  Latency (in clock cycles) 

Single  23  6, 14, 33 
Double  52  10, 24, 61 
Single Extended  31 – 32  8, 18, 41 
33 – 34  8, 18, 43  
35 – 36  8, 18, 45  
37 – 38  8, 18, 47  
39 – 40  8, 18, 49  
41  10, 24, 41  
42  10, 24, 51  
43 – 44  10, 24, 53  
45 – 46  10, 24, 55  
47 – 48  10, 24, 57  
49 – 50  10, 24, 59  
51 – 52  10, 24, 61 
ALTFP_DIV Truth Table
DATAA[]  DATAB[]  SIGN BIT  RESULT[]  Overflow  Underflow  Zero  Divisionbyzero  NaN 

Normal  Normal  0/1  Normal  0  0  0  0  0 
Normal  Normal  0/1  Denormal  0  0  1  0  0 
Normal  Normal  0/1  Infinity  1  0  0  0  0 
Normal  Normal  0/1  Zero  0  1  1  0  0 
Normal  Denormal  0/1  Infinity  0  0  0  1  0 
Normal  Zero  0/1  Infinity  0  0  0  1  0 
Normal  Infinity  0/1  Zero  0  0  1  0  0 
Normal  NaN  X  NaN  0  0  0  0  1 
Denormal  Normal  0/1  Zero  0  0  1  0  0 
Denormal  Denormal  0/1  NaN  0  0  0  0  1 
Denormal  Zero  0/1  NaN  0  0  0  0  1 
Denormal  Infinity  0/1  Zero  0  0  1  0  0 
Denormal  NaN  X  NaN  0  0  0  0  1 
Zero  Normal  0/1  Zero  0  0  1  0  0 
Zero  Denormal  0/1  NaN  0  0  0  0  1 
Zero  Zero  0/1  NaN  0  0  0  0  1 
Zero  Infinity  0/1  Zero  0  0  1  0  0 
Zero  NaN  X  NaN  0  0  0  0  1 
Infinity  Normal  0/1  Infinity  0  0  0  0  0 
Infinity  Denormal  0/1  Infinity  0  0  0  0  0 
Infinity  Zero  0/1  Infinity  0  0  0  0  0 
Infinity  Infinity  0/1  NaN  0  0  0  0  1 
Infinity  NaN  X  NaN  0  0  0  0  1 
NaN  Normal  X  NaN  0  0  0  0  1 
NaN  Denormal  X  NaN  0  0  0  1  1 
NaN  Zero  X  NaN  0  0  0  1  1 
NaN  Infinity  X  NaN  0  0  0  0  1 
NaN  NaN  X  NaN  0  0  0  0  1 
ALTFP_DIV Resource Utilization and Performance
Device family  Precision  Optimization  Output latency  Logic Usage  f_{MAX}(MHz)  

Adaptive LookUp Tables (ALUTs)  Dedicated Logic Registers (DLRs)  Adaptive Logic Modules (ALMs)  18bit DSP  
Stratix IV  Single  Speed  33  3,593  3,351  2,500  —  313 
Area  33  1,646  2,074  1,441  —  308  
Double  Speed  61  13,867  13,143  10,196  —  292  
Area  61  5,125  7,360  4,842  —  267  
Low Latency Option  
Stratix IV  Single  —  6  207  304  212  16  154 
—  14  253  638  385  16  358  
Double  —  10  714  1,077  779  44  151  
—  24  765  2,488  1,397  44  238 
ALTFP_DIV Design Example: Division of SinglePrecision
ALTFP_DIV Design Example: Understanding the Simulation Results
This design example implements a floatingpoint divider for the division of singleprecision numbers with a low latency option. The output latency is 6, hence every division generates the output result 6 clock cycles later.
Time  Event 

0 ns, startup 
dataa[] value: 0000 0000h
datab[] value: 0000 0000h Output value: The undefined value is seen on the result[] port, which is ignored. All values seen on the output port before the 6th clock cycle are merely due to the behavior of the system during startup and should be disregarded. 
17600 ns  Output value: 7FC0
0000h
Exception handling ports: nan asserts The division of zeros result in a NaN. 
2000 ns 
dataa[] value: 2D0B 496Ah
datab[] value: 3A5A FC26h Both inputs hold normal values. 
20800 ns  Output result: 321F
6EC6h
Exception output ports: nan deasserts The division of two normal value results in a normal value. 
11000 ns 
dataa[] value: 046E 78BCh
datab[] value: 6798 698Bh Both inputs hold normal values. 
27200 ns  Output value: 0h
Exception handling ports: underflow and zero asserts The division of the two normal values results in a denormal value. As denormal values are not supported, the result is zero and the underflow port asserts. The zero port is also asserted to indicate that the result is zero. 
2600 ns 
dataa[] value: 0D72 54A8h
datab[] value: 0070 0000h The input port dataa[] holds a normal value while the input port datab[] holds a denormal value. 
36800 ns  Output value: 7F80
0000h
Exception handling ports: division_by_zero asserts Denormal numbers are forcedzero values, therefore, attempts to divide a normal value with a zero result in an infinity value. 
ALTFP_DIV Signals
Port Name  Required  Description 

aclr  No  Asynchronous clear input for the floatingpoint divider. The source is asynchronously reset when the aclr signal is asserted high. 
clock  Yes  Clock input to the IP core. 
clk_en  No  Clock enable to the floatingpoint divider. This port enables division. This signal is active high. When this signal is low, no division takes place and the outputs remain the same. 
dataa[]  Yes  Numerator data input. The MSB is the sign bit, the next MSBs are the exponent, and the LSBs are the mantissa. The size of this port is the total width of the sign bit, exponent bits and mantissa bits. 
datab[]  Yes  Denominator data input.The MSB is the sign bit, the next MSBs are the exponent, and the LSBs are the mantissa. The size of this port is the total width of the sign bit, exponent bits and mantissa bits. 
Port Name  Required  Description 

result[]  Yes  Divider output port. The division result (after rounding). As with the input values, the MSB is the sign, the next MSBs are the exponent, and the LSBs are the mantissa. The size of this port is the total width of the sign bit, exponent bits, and mantissa bits. 
overflow  No  Overflow port for the divider. Asserted when the result of the division (after rounding) exceeds or reaches infinity. Infinity is defined as a number in which the exponent exceeds 2WIDTH_EXP–1. 
underflow  No  Underflow port for the divider. Asserted when the result of the division (after rounding) is zero even though neither of the inputs to the divider is zero, or when the result is a denormalized number. 
zero  No  Zero port for the divider. Asserted when the value of result[] is zero. 
division_by_zero  No  Divisionbyzero output port for the divider. Asserted when the value of datab[] is a zero. 
nan  No  NaN port. Asserted when an invalid division occurs, such as infinity dividing infinity or zero dividing zero. A NaN value appears as output at the result[] port. Any division of a NaN value causes the nan output port to be asserted. 
ALTFP_DIV Parameters
Parameter Name  Type  Required  Description 

WIDTH_EXP  Integer  Yes  Specifies the precision
of the exponent. If this parameter is not specified, the default is
8. The bias of the
exponent is always set to (2 ^ (WIDTH_EXP  1))  1, that is, 127 for single
precision and 1023 for double precision. The value of
WIDTH_EXP must be
8 for single precision,
11 for double precision,
and a minimum of 11 for single extended precision.
The value of WIDTH_EXP must be less than the value of WIDTH_MAN, and the sum of WIDTH_EXP and WIDTH_MAN must be less than 64. 
WIDTH_MAN  Integer  Yes  Specifies the precision
of the mantissa. If this parameter is not specified, the default is
23. When
WIDTH_EXP is
8 and the floatingpoint
format is the singleprecision format, the
WIDTH_MAN value must be
23. Otherwise, the value
of
WIDTH_MAN must be a
minimum of
31.
The value of WIDTH_MAN must be greater than the value of WIDTH_EXP, and the sum of WIDTH_EXP and WIDTH_MAN must be less than 64. 
ROUNDING  String  Yes  Specifies the rounding mode. The default value is TO_NEAREST. The floatingpoint divider does not support other rounding modes. 
OPTIMIZE  String  No  Specifies whether to optimize for area or for speed. Values are AREA and SPEED. A value of AREA optimizes the design using less total logic utilization or resources. A value of SPEED optimizes the design for better performance. If this parameter is not specified, the default value is SPEED. 
PIPELINE  Integer  No  Specifies the number of
clock cycles needed to produce the result. For the singleprecision format, the
latency options are
33,
14 or
6. For the
doubleprecision format, the latency options are
61,
24 or
10.
For the singleextended precision format, the value ranges from a minimum of 41 to a maximum of 61. For the lowlatency option, the latency is determined from the mantissa width. For a mantissa width of 31 to 40 bits, the value is 8 or 18. For a mantissa width of 41 bits or more, the value is 10 or 24. 
ALTFP_MULT IP Core
This IP core performs floatingpoint multiplication operation.
ALTFP_MULT IP Core Features
 Multiplication functions.
 Optional exception handling output ports such as zero, overflow, underflow, and nan.
 Optional dedicated multiplier circuitries in Cyclone and Stratix series.
ALTFP_MULT Output Latency
Precision  Mantissa Width  Latency (in clock cycles) 

Single  23  5, 6, 10,11 
Double  52  5, 6, 10,11 
SingleExtended  31–52  5, 6, 10,11 
ALTFP_MULT Truth Table
DATAA[]  DATAB[]  RESULT[]  Overflow  Underflow  Zero  NaN 

Normal  Normal  Normal  0  0  0  0 
Normal  Normal  Denormal  0  1  1  0 
Normal  Normal  Infinity  1  0  0  0 
Normal  Normal  Zero  0  1  1  0 
Normal  Denormal  Zero  0  0  1  0 
Normal  Zero  Zero  0  0  1  0 
Normal  Infinity  Infinity  1  0  0  0 
Normal  NaN  NaN  0  0  0  1 
Denormal  Normal  Zero  0  0  1  0 
Denormal  Denormal  Zero  0  0  1  0 
Denormal  Zero  Zero  0  0  1  0 
Denormal  Infinity  NaN  0  0  0  1 
Denormal  NaN  NaN  0  0  0  1 
Zero  Normal  Zero  0  0  1  0 
Zero  Denormal  Zero  0  0  1  0 
Zero  Zero  Zero  0  0  1  0 
Zero  Infinity  NaN  0  0  0  1 
Zero  NaN  NaN  0  0  0  1 
Infinity  Normal  Infinity  1  0  0  0 
Infinity  Denormal  NaN  0  0  0  1 
Infinity  Zero  NaN  0  0  0  1 
Infinity  Infinity  Infinity  1  0  0  0 
Infinity  NaN  NaN  0  0  0  1 
NaN  Normal  NaN  0  0  0  1 
NaN  Denormal  NaN  0  0  0  1 
NaN  Zero  NaN  0  0  0  1 
NaN  Infinity  NaN  0  0  0  1 
NaN  NaN  NaN  0  0  0  1 
ALTFP_MULT Resource Utilization and Performance
Device Family  Precision  Output latency  Logic usage  f_{MAX} (MHz)  

Adaptive LookUp Tables (ALUTs)  Dedicated Logic Registers (DLRs)  Adaptive Logic Modules (ALMs)  18bit DSP  
Stratix IV  Single  5  138  148  100  4  274 
11  185  301  190  4  445  
Double  5  306  367  272  10  255  
11  419  523  348  10  395 
ALTFP_MULT Design Example: Multiplication of DoublePrecision Format Numbers
ALTFP_MULT Design Example: Understanding the Simulation Waveform
This design example implements a floatingpoint multiplier for the multiplication of doubleprecision format numbers. All the optional input ports (clk_en and aclr) and output ports (overflow, underflow, zero, and nan) are enabled.
In this example, the latency is set to 6 clock cycles. Therefore, every multiplication result appears at the result port 6 clock cycles later.
Time  Event 

0 ns, startup 
dataa[] value: 0000 0000 0000 0000h
datab[] value: 4037 742C 3C9E ECC0h Output value: All values seen on the output port before the 6th clock cycle are merely due to the behavior of the system during startup and should be disregarded. 
110 ns  Output value: 0000 0000
0000 0000h
Exception handling ports: zero asserts The multiplication of zero at the input port dataa[], and a nonzero value at the input port datab[] results in a zero. 
600 ns 
dataa[] value: 7FF0 0000 0000 0000h
datab[] value: 4037 742C 3C9E ECC0h This is the multiplication of an infinity value and a normal value. 
710 ns  Output value: 7FF0 0000
0000 0000h
Exception handling ports: overflow asserts The multiplication of an infinity value and a normal value results in infinity. All multiplications with an infinity value results in infinity except when infinity is multiplied with a zero. 
Parameters
Parameter Name  Type  Required  Description 

WIDTH_EXP  Integer  No  Specifies the value of the exponent. If this parameter is not specified, the default is 8. The bias of the exponent is always 2^{(WIDTH_EXP  1)}1 (that is, 127 for the singleprecision format and 1023 for the doubleprecision format). WIDTH_EXP must be 8 for the singleprecision format or a minimum of 11 for the doubleprecision format and the singleextended precision format. WIDTH_EXP must less than WIDTH_MAN. The sum of WIDTH_EXP and WIDTH_MAN must be less than 64. 
WIDTH_MAN  Integer  No  Specifies the value of the mantissa. If this parameter is not specified, the default is 23. When WIDTH_EXP is 8 and the floatingpoint format is singleprecision, the WIDTH_MAN value must be 23; otherwise, the value of WIDTH_MAN must be a minimum of 31. The WIDTH_MAN value must always be greater than the WIDTH_EXP value. The sum of WIDTH_EXP and WIDTH_MAN must be less than 64. 
DEDICATED_MULTIPLIER_ CIRCUITRY  String  No  Specifies whether to use dedicated multiplier circuitry. Values are AUTO, YES, or NO. If this parameter is not specified, the default is AUTO. If a device does not have dedicated multiplier circuitry, the DEDICATED_MULTIPLIER_CIRCUITRY parameter has no effect and defaults to NO. 
PIPELINE  Integer  No  Specifies the number of clock cycles needed to produce the multiplied result. Values are 5, 6, 10, and 11. If this parameter is not specified, the default is 5. 
ALTFP_MULT Signals
Port Name  Required  Description 

clock  Yes  Clock input to the IP core. 
clk_en  No  Clock enable. Allows multiplication to take place when asserted high. When signal is asserted low, no multiplication occurs and the outputs remain unchanged. 
aclr  No  Synchronous clear. Source is asynchronously reset when asserted high. 
dataa[]  Yes  Floatingpoint input data input to the multiplier. The MSB is the sign, the next MSBs are the exponent, and the LSBs are the mantissa. This input port size is the total width of sign bit, exponent bits, and mantissa bits. 
datab[]  Yes  Floatingpoint input data to the multiplier. The MSB is the sign, the next MSBs are the exponent, and the LSBs are the mantissa. This input port size is the total width of sign bit, exponent bits, and mantissa bits. 
Port Name  Required  Description 

result[]  Yes  Output port for the multiplier. The floatingpoint result after rounding. The MSB is the sign, the next MSBs are the exponent, and the LSBs are the mantissa. 
overflow  No  Overflow port for the multiplier. Asserted when the result of the multiplication, after rounding, exceeds or reaches infinity. Infinity is defined as a number in which the exponent exceeds 2^{WIDTH_EXP}1. 
underflow  No  Underflow port for the multiplier. Asserted when the result of the multiplication (after rounding) is 0 while none of the inputs to the multiplication is 0, or asserted when the result is a denormalized number. 
zero  No  Zero port for the multiplier. Asserted when the value of result[] is 0. 
nan  No  NaN port for the multiplier. This port is asserted when an invalid multiplication occurs, such as the multiplication of infinity and zero. In this case, a NaN value is the output generated at the result[] port. The multiplication of any value and NaN produces NaN. 
ALTFP_SQRT
This IP core performs square root calculation based on the input provided.You can use the ports and parameters available to customize the ALTFP_SQRT IP core according to your application.
ALTFP_SQRT Features
 Square root functions.
 Optional exception handling output ports such as zero, overflow, and nan.
Output Latency
Precision  Mantissa Width  Latency (in clock cycles) 

Single  23  16, 28 
Double  52  30, 57 
Singleextended  31  20, 36 
32  20, 37  
33  21, 38  
34  21, 39  
35  22, 40  
36  22, 41  
37  23, 42  
38  23, 43  
39  24, 44  
40  24, 45  
41  25, 46  
42  25, 47  
43  26, 48  
44  26, 49  
45  27, 50  
46  27, 51  
47  28, 52  
48  28, 53  
49  29, 54  
50  29, 55  
51  30, 56 
ALTFP_SQRT Truth Table
DATA[]  SIGN BIT  RESULT[]  NaN  Overflow  Zero 

Normal  0  Normal  0  0  0 
Denormal  0/1  Zero  0  0  1 
Positive Infinity  0  Infinity  0  1  0 
Negative Infinity  1  All 1’s  1  0  0 
Positive NaN  0  All 1’s  1  0  0 
Negative NaN  1  All 1’s  1  0  0 
Zero  0/1  Zero  0  0  1 
Normal  1  All 1’s  1  0  0 
ALTFP_SQRT Resource Utilization and Performance
Device Family  Precision  Output latency  Logic usage  f_{MAX} (MHz)  

Adaptive LookUp Tables (ALUTs)  Dedicated Login Registers (DLRs)  Adaptive Logic Modules (ALMs)  
Stratix IV  Single  28  502  932  528  472 
Double  57  2,177  3,725  2,202  366 
ALTFP_SQRT Design Example: Square Root of SinglePrecision Format Numbers
ALTFP_SQRT Design Example: Understanding the Simulation Results
These figures show the expected simulation results in the ModelSimAltera software.
This design example implements a floatingpoint square root function for singleprecision format numbers with all the exception output ports instantiated. The output ports include overflow, zero, and nan.
The output latency is 28 clock cycles. Every square root computation generates the output result 28 clock cycles later.
Time  Event 

0 ns, startup  Output value: All values seen on the output port before the 28th clock cycle are merely due to the behavior of the system during startup and should be disregarded. 
2 000 ns 
data[] value: 2D0B 496Ah
The data input is a normal number. 
84 000 ns  Output value: 363C
D4EBh
The square root computation of a normal input results in a normal output. 
14 000 ns  data[] value: 0000 0000h 
96 000 ns  Output value: 0000
0000h
Exception handling ports: zero asserts The square root computation of zero results in a zero. 
23 000 ns 
data[] value: 7F80 0000h
The input is infinity. 
105 000 ns  Output value: 7F80
0000h
Exception handling ports: overflow asserts 
ALTFP_SQRT Signals
Port Name  Required  Description 

clock  Yes  Clock input to the IP core. 
clk_en  No  Clock enable that allows square root operations when the port is asserted high. When the port is asserted low, no operation occurs and the outputs remain unchanged. 
aclr  No  Asynchronous clear. When the aclr port is asserted high, the function is asynchronously reset. 
Yes  Floatingpoint input data. The MSB is the sign, the next MSBs are the exponent, and the LSBs are the mantissa. This input port size is the total width of sign bit, exponent bits, and mantissa bits. 
Port Name  Required  Description 

result[]  Yes  Square root output port for the floatingpoint result. The MSB is the sign, the next MSBs are the exponent, and the LSBs are the mantissa. The size of this port is the total width of the sign bit, exponent bits, and mantissa bits. 
overflow  Yes  Overflow port. Asserted when the result of the square root (after rounding) exceeds or reaches infinity. Infinity is defined as a number in which the exponent exceeds 2^{WIDTH_EXP} 1. 
zero  Yes  Zero port. Asserted when the value of the result[] port is 0. 
nan  Yes  NaN port. Asserted when an invalid square root occurs, such as negative numbers or NaN inputs. 
ALTFP_SQRT Parameters
Parameter Name  Type  Required  Description 

WIDTH_EXP  Integer  Yes  Specifies the precision of the exponent. If this parameter is not specified, the default is 8. The bias of the exponent is always set to 2 (WIDTH_EXP 1) 1, that is, 127 for the singleprecision format and 1023 for the doubleprecision format. The value of the WIDTH_EXP parameter must be 8 for the singleprecision format, 11 for the doubleprecision format, and a minimum of 11 for the singleextended precision format. The value of the WIDTH_EXP parameter must be less than the value of the WIDTH_MAN parameter, and the sum of the WIDTH_EXP and WIDTH_MAN parameters must be less than 64. 
WIDTH_MAN  Integer  Yes  Specifies the value of the mantissa. If this parameter is not specified, the default is 23. When the WIDTH_EXP parameter is 8 and the floatingpoint format is singleprecision, the WIDTH_MAN parameter value must be 23. Otherwise, the value of the WIDTH_MAN parameter must be a minimum of 31. The value of the WIDTH_MAN parameter must be greater than the value of the WIDTH_EXP parameter. The sum of the WIDTH_EXP and WIDTH_MAN parameters must be less than 64. 
ROUNDING  String  Yes  Specifies the rounding mode. The default value is TO_NEAREST. Other rounding modes are not supported. 
PIPELINE  Integer  Yes  Specifies the number of clock cycles for the square root results of the result[] port. Values are WIDTH_MAN + 5 and ((WIDTH_MAN + 5/2)+2) as specified by truncating the radix point. 
ALTFP_EXP IP Core
This IP core performs exponential calculation based on the input provided.
ALTFP_EXP Features
 Exponential value of a given input.
 Optional exception handling output ports such as zero, overflow, underflow, and nan.
Output Latency
Precision  Mantissa Width  Latency (in clock cycles) 

Single  23  17 
Double  52  25 
Singleextended  31 – 38  22 
39 – 52  25 
ALTFP_EXP Truth Table
DATAA[]  Calculation  RESULT[]  NaN  Overflow  Underflow  Zero 

Normal  edata  Normal  0  0  0  0 
Normal  edata  Infinity  0  1  0  0 
Normal (numbers of small magnitude)  edata  1  0  0  1  0 
Normal (negative numbers of large magnitude)  edata  0  0  0  1  0 
Denormal  e0  1  0  0  0  0 
Zero  e0  1  0  0  0  0 
Infinity (+)  e+  Infinity  0  0  0  0 
Infinity ()  e  0  0  0  0  1 
NaN  —  NaN  1  0  0  0 
ALTFP_EXP Resource Utilization and Performance
Device Family  Precision  Output Latency  Logic usage  f_{MAX} (MHz)  

Adaptive LookUp Tables (ALUTs)  Dedicated Logic Registers (DLRs)  Adaptive Logic Modules (ALMs)  18bit DSP  
Stratix IV  Single  17  631  521  448  19  284 
Double  25  4,104  2,007  2,939  46  279 
ALTFP_EXP Design Example: Exponential of SinglePrecision Format Numbers
ALTFP_EXP Design Example: Understanding the Simulation Results
These figures show the expected simulation results in the ModelSimAltera software.
This design example implements a floatingpoint exponential for the singleprecision format numbers. The optional input ports (clk_en and aclr) and all four exception handling output ports (nan, overflow, underflow, and zero) are enabled.
For singleprecision format numbers, the latency is fixed at 17 clock cycles. Therefore, every exponential operation outputs the results 17 clock cycles later.
Time  Event 

0 ns, startup 
data[] value: 1A03 568Ch
Output value: An undefined value is seen on the result[] port, which is ignored. All values seen on the output port before the 17th clock cycle are merely due to the behavior of the system during startup and should be disregarded. 
82.5 ns  Output value: 3F80
0000h
As the input value of 1A03568Ch is a very small number, it is seen as a value that is approaching zero, and the result approaches 1 (which is represented by 3F800000). Exponential operations carried out on numbers of very small magnitudes result in a 1 and assert the underflow flag. Exception handling ports: underflow asserts 
30 ns 
data[] value: F3FC DEFFh
This is a normal negative value of a very large magnitude. 
112.5 ns  Output value: 0000
0000h
The outcome of exponential operations on negative numbers of very large magnitudes approaches zero. Exception handling ports: underflow remains asserted 
60 ns 
data[] value: 7F80 0000h
This is a positive infinite value. 
142.5 ns  Output value: 7F80
0000h
The operation on positive infinite values results in infinity. Exception handling ports: underflow deasserts, overflow asserts 
90 ns 
data[] value: 7FC0 0000h
This is a NaN. 
172.5 ns  Output value: 7FC0
0000h
The exponential of a NaN results in a NaN. Exception handling ports: nan asserts 
120 ns 
data[] value: C1D4 49BAh
This is a normal value. 
202.5 ns  Output value: 2C52
5981h
The result is a normal value. Exception handling ports: nan deasserts 
ALTFP_EXP Signals
Port Name  Required  Description 

aclr  No  Asynchronous clear. When the aclr port is asserted high the function is asynchronously reset. 
clk_en  No  Clock enable. When the clk_en port is asserted high, an exponential value operation takes place. When this signal is asserted low, no operation occurs and the outputs remain unchanged. 
clock  Yes  Clock input to the IP core. 
data[]  Yes  Floatingpoint input data. The MSB is the sign, the next MSBs are the exponent, and the LSBs are the mantissa. This input port size is the total width of the sign bit, exponent bits, and mantissa bits. 
Port Name  Required  Description 

result[]  Yes  The floatingpoint exponential result of the value at data[]. The MSB is the sign, the next MSBs are the exponent, and the LSBs are the mantissa. The size of this port is the total width of the sign bit, exponent bits, and mantissa bits. 
overflow  No  Overflow exception output. Asserted when the result of the operation (after rounding) is infinite. 
underflow  No  Underflow exception output. Asserted when the result of the exponential approaches 1 (from numbers of very small magnitude), or when the result approaches 0 (from negative numbers of very large magnitudes). 
zero  No  Zero exception output. Asserted when the value in the result[] port is zero. 
nan  No  NaN exception output. Asserted when an invalid operation occurs. Any operation involving NaN also asserts the nan port. 
ALTFP_EXP Parameters
Parameter Name  Type  Required  Description 

WIDTH_EXP  Integer  Yes  Specifies the precision of the exponent. If this parameter is not specified, the default is 8. The bias of the exponent is always set to 2 ^{(WIDTH_EXP 1)} 1, that is, 127 for the singleprecision format and 1023 for the doubleprecision format. The value of the WIDTH_EXP parameter must be 8 for the singleprecision format, 11 for the doubleprecision format, and a minimum of 11 for the singleextended precision format. The value of the WIDTH_EXP parameter must be less than the value of the WIDTH_MAN parameter, and the sum of the WIDTH_EXP and WIDTH_MAN parameters must be less than 64. 
WIDTH_MAN  Integer  Yes  Specifies the value of the mantissa. If this parameter is not specified, the default is 23. When the WIDTH_EXP parameter is 8 and the floatingpoint format is singleprecision, the WIDTH_MAN parameter value must be 23. Otherwise, the value of the WIDTH_MAN parameter must be a minimum of 31. The value of the WIDTH_MAN parameter must be greater than the value of the WIDTH_EXP parameter. The sum of the WIDTH_EXP and WIDTH_MAN parameters must be less than 64. 
PIPELINE  Integer  Yes  Specifies the amount of latency, expressed in clock cycles, used in the ALTFP_EXP IP core. Acceptable pipeline values are 17, 22, and 25 cycles of latency. Create the ALTFP_EXP IP core with the MegaWizard PlugIn Manager to calculate the value for this parameter. 
ROUNDING  String  Yes  Specifies the rounding mode. The default value is TO_NEAREST. Other rounding modes are not supported. 
ALTFP_INV IP Core
This IP core performs the function of 1/a where a is the given input.
ALTFP_INV Features
 Inverse value of a given input.
 Optional exception handling output ports such as zero, division_by_zero, underflow, and nan.
Output Latency
The output latency options for the ALTFP_INV megafunction differs depending on the precision selected, the width of the mantissa, or both.
Precision  Mantissa Width  Latency (in clock cycles) 

Single  23  20 
Double  52  27 
Single Extended  31 – 39  20 
40 – 52  27 
ALTFP_INV Truth Table
DATA[]  SIGN BIT  RESULT[]  Underflow  Zero  Division_by_zero  NaN 

Normal  0/1  Normal  0  0  0  0 
Normal  0/1  Denormal  1  1  0  0 
Normal  0/1  Infinity  0  0  0  0 
Normal  0/1  Zero  1  1  0  0 
Denormal  0/1  Infinity  0  0  1  0 
Zero  0/1  Infinity  0  0  1  0 
Infinity  0/1  Zero  0  1  0  0 
NaN  X  NaN  0  0  0  1 
ALTFP_INV Resource Utilization and Performance
Device Family  Precision  Output Latency  Logic usage  f_{MAX }(MHz)  

Adaptive LookUp Tables (ALUTs)  Dedicated Logic Registers (DLRs)  Adaptive Logic Modules (ALMs)  18Bit DSP  
Stratix IV  Single  20  401  616  373  16  412 
Double  27  939  1,386  912  48  203 
ALTFP_INV Design Example: Inverse of SinglePrecision Format Numbers
This design example uses the ALTFP_INV IP core to compute the inverse of singleprecision format numbers. This example uses the parameter editor in the Quartus II software.
ALTFP_INV Design Example: Understanding the Simulation Results
These figures show the expected simulation results in the ModelSimAltera software.
This design example implements a floatingpoint inverse for singleprecision format numbers. The optional input ports (clk_en and aclr) and all four exception handling output ports (division_by_zero, nan, zero, and underflow) are enabled.
The latency is fixed at 20 clock cycles; therefore, every inverse operation outputs results 20 clock cycles later.
This table lists the inputs and corresponding outputs obtained from the simulation in the waveforms.
Time  Event 

0 ns, startup 
data[] value: 34A2 E42Fh
Output value: An undefined value is seen on the result[] port, which is ignored. All values seen on the output port before the 20th clock cycle are merely due to the behavior of the system during startup and should be disregarded. 
97.5 ns  Output value: 4A49
2A2Fh
Exception handling ports: division_by_zero deasserts The inverse of a normal number results in a normal value. 
10 ns 
data[] value: 7F80 0000h
This is an infinity value. 
107.5 ns  Output value: 0000
0000h
Exception handling ports: zero asserts The inverse of an infinity value produces a zero. 
60 ns 
data[] value: 7FC0 0000h
This is a NaN. 
157.5 ns  Output value: 7FC0
0000h
Exception handling ports: nan asserts The inverse of a NaN results in a NaN 
70 ns 
data[] value: 0000 1000h
This is a denormal number. 
167.5 ns  Output value: 7F80
0000h
Exception handling ports: nan deasserts, division_by_zero asserts Denormal numbers are forcedzero values, therefore, the inverse of a zero results in infinity. 
Ports
Port Name  Required  Description 

aclr  No  Asynchronous clear. When the aclr port is asserted high, the function is asynchronously cleared. 
clk_en  No  Clock enable. When the clk_en port is asserted high, an inversion value operation takes place. When signal is asserted low, no operation occurs and the outputs remain unchanged. 
clock  Yes  Clock input to the megafunction. 
data[]  Yes  Floatingpoint input data. The MSB is the sign, the next MSBs are the exponent, and the LSBs are the mantissa. This input port size is the total width of the sign bit, exponent bits, and mantissa bits. 
Port Name  Required  Description 

result[]  Yes  The floatingpoint inverse result of the value at the data[]input port. The MSB is the sign, the next MSBs are the exponent, and the LSBs are the mantissa. The size of this port is the total width of the sign bit, exponent bits, and mantissa bits. 
underflow  No  Underflow exception output. Asserted when the result of the inversion (after rounding) is a denormalized number. 
zero  No  Zero exception output. Asserted when the value at the result[] port is a zero. 
division_by_zero  No  Divisionbyzero exception output. Asserted when the denominator input is a zero. 
nan  No  NaN exception output. Asserted when an invalid inversion occurs, such as the inversion of NaN. In this case, a NaN value is output to the result[] port. Any operation involving NaN also asserts the nan port. 
Parameters
Parameter Name  Type  Required  Description 

WIDTH_EXP  Integer  Yes  Specifies the precision of the exponent. If this parameter is not specified, the default is 8. The bias of the exponent is always set to 2 ^{(WIDTH_EXP 1)} 1, that is, 127 for the singleprecision format and 1023 for the doubleprecision format. The value of the WIDTH_EXP parameter must be 8 for the singleprecision format, 11 for the doubleprecision format, and a minimum of 11 for the singleextended precision format. The value of the WIDTH_EXP parameter must be less than the value of the WIDTH_MAN parameter, and the sum of the WIDTH_EXP and WIDTH_MAN parameters must be less than 64. 
WIDTH_MAN  Integer  Yes  Specifies the value of the mantissa. If this parameter is not specified, the default is 23. When the WIDTH_EXP parameter is 8 