|CFD||Computational Fluid Dynamic, a numerical analysis method for solving the conjugated heat transfer problems.|
|CTM||Compact Thermal Model, a geometric model that is used as an input to CFD tool.|
|EPE||Early Power Estimator, a tool that estimates the power consumption of the FPGA device.|
|FPGA||Field Programmable Gate Array|
|HBM||High Bandwidth Memory|
|IHS||Integrated Heat Spreader - case of an Intel® Stratix® 10 FPGA.|
|MCM||Multi-Chip Module - an integrated circuit (IC) with more than one die.|
|SCM||Single Chip Module|
|TCASE||Integrated Heat Spreader or Case Temperature. The case temperature of a component is measured with an attached heat sink. This temperature is measured at the top geometric center of the package case/die.|
|TDP||Thermal Design Power, the power dissipated in a die that is used for thermal analysis purposes.|
|TA||Ambient Temperature, measured locally surrounding the FPGA. The ambient temperature should be measured just upstream of a passive heat sink or at the fan inlet for an active heat sink.|
|TCORE||Core Fabric Die Temperature|
|TJ-MAX||Maximum Junction Temperature, a maximum allowable absolute temperature rating of the device or a targeted value.|
|TTDP||Total Thermal Design Power, the power dissipated in the device that is used for thermal analysis purposes.|
|TIM||Thermal Interface Material|
|TSD||Temperature Sensor Diode|
The Early Power Estimator (EPE) is a tool that estimates the power consumption of an FPGA device early in the design process. It allows you to enter and select the relevant information for a specific FPGA design and obtain the power and the relevant thermal design information for electrical and thermal design purposes. The data provided to the EPE is divided in two categories, general and thermal. Both inputs affect the overall power dissipation of each die and the thermal characteristic of the package to be used for system thermal modeling. Below are the necessary inputs provided to the EPE.
- General information
- FPGA package size
- FPGA core fabric size and grade
- Transceiver type, protocol, grade and placement per transceiver die
- Utilization of FPGA hardware blocks
- Clock rates, toggle rates and frequencies
- HBM specification
- Thermal information
- Ambient air temperature (TA) of the design
- Maximum allowable junction temperature (TJ-MAX) of any die in the FPGA, at the provided TA
- Recommended power margin application
- Core Fabric Die or the main FPGA die: This is the die that contains the basic logic resources and it is provided in different sizes and grades. Each package can only have a single core fabric die.
- Transceiver Die: Transceiver dies are offered in three types: L-Tile, H-Tile and E-Tile. Packages with E-Tile are always equipped with one H-Tile. Each transceiver tile type supports certain protocols and transceiver speeds. Depending on the package size, an Intel® Stratix® 10 device can support between one and six transceiver dies and each die has 24 transceiver channels.
- HBM Die: This die is provided in two configurations, 4 high or 8 high, which refers to the number of memory die stacks in each HBM. Not all Intel® Stratix® 10 packages have HBM, and the ones that do, can have either one or two HBMs.
|TA||Ambient temperature, measured locally surrounding the FPGA. Measure the ambient temperature just upstream of a passive heat sink or at the fan inlet for an active heat sink. This value affects the junction temperature of the main FPGA core fabric die and its power dissipation.|
|TJ-MAX||TJ-MAX is the maximum junction temperature value that the design allows for the given TA. For example, a design may allow the device maximum rated junction temperature at its maximum TA, but for a lower ambient temperature, the junction temperature requirements may be lower than the maximum rated value. These two cases require two sets of thermal entries to the EPE tool to determine the design parameters.|
|TCORE||Core fabric die temperature. EPE tool evaluates the thermal design parameters over a range of TCORE values.|
|Power||EPE tool reports the power dissipation of each die individually.|
|TTDP||Total Thermal Design Power is the total power dissipation of the device, EPE tool reports TTDP for each main FPGA core fabric die temperature.|
|ΨJC||ΨJC is the thermal resistance between
each of the dies in the package and the center of the package IHS. An MCM like the
Stratix® 10 device will have as many ΨJC values as the number of dies in the package. For example,
Stratix® 10 contains five dies, there will be five
ΨJC values reported by the EPE tool. However, the focus
of thermal design is always on the die with maximum ΨJC and
that is the one that is used for calculating the TJ-MAX.
The ΨJC value is calculated from the following
ΨJC = (TJ - TCASE) / TTDP
|ΨCA||ΨCA is the other thermal resistance
value reported by the EPE tool. It is the thermal resistance between the center of the
package IHS and ambient temperature. ΨCA can be used as a
figure of merit in assessment of the required cooling solution for a design. For
example, the lower the ΨCA value, the more aggressive
cooling solution is needed. The value of ΨCA is calculated
from the following equation:
ΨCA = (TCASE - TA) / TTDP
|TCASE||Integrated heat spreader or case temperature is the temperature at the top center of IHS. If the cooling solution maintains a TCASE equal to the TCASE value reported by the EPE tool, then the TJ-MAX value will be same as entered in the tool. A higher TCASE points to a higher TJ than TJ-MAX. Therefore, the goal of the cooling design should be to keep the TCASE at or below the value reported by the EPE tool.|
The Intel® Stratix® 10 FPGA thermal analysis requires the use of its CTMs in a Computational Fluid Dynamic (CFD) tool. The results of the CFD analysis are only valid to determine the core fabric power and IHS temperature. These values are used to determine the junction temperature of all the dies.
This methodology is used because the construction of the CTM does not capture the details of transceiver channel placements; therefore, it cannot be used to predict the correct junction temperature of a transceiver die. The transceiver junction temperature is calculated using the total power dissipation, IHS temperature and thermal resistance of each die which will be covered in later sections.
- Icepak® from ANSYS
- Flotherm® from Mentor Graphics
- 6SigmaET® from Future Facilities
- Thermal Analysis® from SolidWorks
Each die in an Intel® Stratix® 10 FPGA device contains a Temperature Sensing Diode (TSD). Intel® provides a Temperature Sensor IP core to obtain the temperature of each die. However, with flexibility in Intel® Stratix® 10 devices, the location of hot spots on the transceiver die may vary based on your application, and it may not always be in the same location as the temperature sensor. Therefore, a temperature sensor may not report the actual temperature of the hot spot. The EPE calculates the offset values for each transceiver die and reports them on its Thermal worksheet. Addition of these values to the temperatures reported by the appropriate TSDs results in the correct values for the maximum junction temperature of each die. The accuracy of the TSD is +/- 5°C. Therefore, you may need to adjust the TJ-MAX for some designs to ensure the threshold temperature is never crossed.
Supply Design Information to EPE
This is the first step in the thermal design process of an Intel® Stratix® 10 device that provides the tool with the necessary data to estimate the power dissipation of each die. The inputs include the FPGA design information as well as the thermal design requirements of TA and TJ-MAX and power margin selection.
Obtain Thermal Design Parameters
The EPE tool provides the thermal design parameters. The power dissipation of the transceiver die is provided as a constant value, but the main core die power dissipation is provided as a function of its junction temperature and it should be used accordingly in the CFD analysis.
Obtain the applicable CTM for the CFD analysis. Each CTM is provided with the maximum number of dies possible in a package. Unused dies can be ignored and left in the model without affecting the end results.
Run CFD Analysis
Model the system in the CFD tool and apply all the applicable power values to the corresponding dies. The CFD solution provides the core die TDP and temperature and the TCASE. The transceiver and HBM die temperatures cannot be predicted by the CFD and are calculated manually.
Temperatures and Ψ
Junction temperatures of all the dies and ΨCA of the cooling solution are calculated using the following equations:
You can verify the CFD modeling results by comparing the above calculated ΨCA with the value provided by the EPE tool for the corresponding TTDP. If the two values are the same, then the calculated TJ = TJ-MAX.
Use the EPE tool to estimate the power dissipation of the dies in an Intel® Stratix® 10 FPGA. An Excel spreadsheet provides the interface to the tool, and it contains multiple worksheets, each applicable to a part of the design. The EPE tool calculates thermal design parameters that are unique to each design. To activate the Thermal worksheet of the EPE, the following parameters need to be modified in the Main worksheet of the EPE:
- Set the Power Characteristics to Maximum.
- Set the Junction Temp Mode to Detailed Thermal Model.
This activates the Thermal worksheet of the EPE tool, and as a result, any changes made to the EPE affect the values in this worksheet. To obtain the correct thermal values for the analysis, you must enter all the necessary design information and settings in the subsequent worksheets of the EPE.
Selecting the device, package, and transceiver in the Main worksheet of EPE will enable selection of appropriate transceiver and HBM die types and counts in XCVR and HBM worksheets. In the XCVR worksheet you must specify placement of each transceiver in the exact tile and channel location (0-23) to be used in the design. This is necessary to obtain the correct power and thermal parameters. Similarly, in the HBM worksheet you must select the correct HBM and channel numbers (0-7) for your application.
For an example of transceiver placement refer to Figure 2 showing an Intel® Stratix® 10 device with 4 H-Tiles configured to use 54 transceiver channels placed in specific channel locations.
After you have entered all the design data and activated the Thermal worksheet, set the proper thermal variables in the Thermal worksheet .
- Apply Recommended Margin: Intel® recommends that you turn on the recommended margin to ensure sufficient cooling and account for approximations in power modeling.
- Ambient Temp, T A (°C): Temperature of the air or other coolant that flows over the heat sink.
- Max. Junction Temp, T J-MAX (°C): Allowed maximum temperature of any die in the package, regardless of its type. The Max TJ setting can be set to any value that a design requires below the max rating of device.
Once you have entered the thermal settings, the EPE updates the power dissipation of all dies based on the required thermal solution. For example, if the maximum allowed junction temperature is 95°C, the EPE calculates a cooling solution that satisfies this requirement. That is, at least one die is operating at 95°C, while other dies are operating at lower temperatures due to their lower power consumption or power density.
The EPE also provides a solution table which consists of three rows and depicts three sets of solutions. The middle row (Operating Point) is the same as the above solution, and the other two rows represent solutions that are 5°C above and below the core operating temperature resulting from the design. Using this table, you can create the temperature dependent core die power curve which is used in the CFD modeling.
Reducing the thermal resistances of the package in each design improves the efficiency of the cooling system. One way to achieve this is by spreading out the transceiver channels or use an extra transceiver tile to reduce the power density of a transceiver die. Targeted spreading can reduce ΨJC and increase ΨCA, thereby reducing the cooling requirement.
The Intel® Stratix® 10 FPGA thermal design parameters are unique for every project. Thermal design parameters are mainly determined by the power , local power density and power ratio of dies. For this, any changes to the design require design information to be updated accordingly in the EPE so that all the thermal parameters are re-calculated.
In this section, we will demonstrate the necessary steps for the thermal analysis of an Intel® Stratix® 10 device by using an example.
Design Statement: Design a forced convection cooling system for an Intel® Stratix® 10 device as shown in Table 1 and the specified thermal requirements as shown in Table 2. Transceiver channel placement and HBM data are shown in Figure 1 and Figure 2. The core functionality and other activities are set such that the core die reaches a typical power for the Intel® Stratix® 10 FPGA.
|FPGA||Intel® Stratix® 10|
|Device Grade||Extended-1 Smart-VID|
|Number of Transceiver Channels||96|
|Number of HBM||2|
|Maximum Ambient Temperature , °C||35|
|Maximum Allowed Junction Temperature, °C||95|
|Maximum Ambient Temperature , °C||35|
|Maximum Allowed Junction Temperature, °C||95|
The Main worksheet power values are associated with a function and not necessarily dissipated in the die providing the function. So the Thermal worksheet may show a different value for HBM than the Main worksheet. For thermal analysis, always use the power values in the Thermal worksheet.
After entering all the design data into the EPE and activating the Thermal worksheet, the following two tables are updated with all the thermal design parameters. For example in this design the case temperature should be kept below 84 °C and the maximum ΨJC of any die is 0.067 °C/W.
The next step in the thermal analysis process is to create a CFD model of the system using the required CTM as shown in row 9 of Thermal worksheet. In the CFD setup, the power dissipation of transceivers and HBMs are set as fixed values and the power dissipation of the core as a temperature-dependent value from the first two columns of the solution table (see the "EPE Thermal Worksheet Solution Table").
The CFD set up for this example is shown below. FPGA is set in 120 x 35 mm duct with an airflow of 21 CFM. The extruded aluminum heat sink dimensions are: 100 x 100 x 30 mm 40 1x27 mm fins. Air temperature entering the duct is 35 °C.
The CFD analysis provides the Intel® Stratix® 10 case and die temperatures.
The Intel® Stratix® 10 case temperature profile shown below indicates a maximum temperature of 83.6 °C (TCASE) which is less than the 84 °C required by the EPE. This means that the maximum junction temperature will also be less than the design limit of 95 °C.
The Intel® Stratix® 10 die temperature profile shown below is only valid for the core fabric die temperature and not the transceiver die temperatures.
Calculate the transceiver die temperatures manually as follows.
- Determine the maximum core fabric temperature calculated by CFD (92.1 °C from the "Die Temperature Profile from CFD Analysis" figure above). This value is the core die operating temperature or FPGA Core Junction Temperature.
the "EPE Thermal Worksheet Solution Table" and FPGA Core
Junction Temperature (90.26 °C), linearly interpolate Overall Total Power (TTDP) and ΨJC.
ΨJC (°C/W) FPGA Core Junction Temperature (°C) TTDP (W) FPGA Core HSSI_0_0 HSSI_2_0 HSSI_0_1 HSSI_2_1 90.26 148 0.052 0.046 0.069 0.045 0.032
- Calculate the junction temperature (TJ)
using the following equation:
TJ = TCASE + TTDP * ΨJC , where
- TCASE = maximum case temperature of 83.6 °C from the "Case Temperature Profile from CFD Analysis" figure above
- TTDP = 148 W from the table above
- ΨJC = for TJ_max, use the highest ΨJC value of any die which is 0.069 °C/W for the HSSI_2_0 transceiver die from the table above
Result: TJ_max = 83.6 °C + (148 W * 0.069 °C/W) = 93.82 °C
Notice that the calculated HSSI_2_0 junction temperature (TJ_max) is almost 7 °C higher than temperature calculated by CFD for this die. This is because CFD uses uniform power dissipation for the transceiver dies and, therefore, cannot calculate the local hot spots.
Other junction temperatures can be calculated in the same way.
In some designs, it might be possible to further reduce the transceiver temperatures by spreading the channels to reduce the power density. For example, in the above design the HSSI_2_0 transceiver has 8 high speed transceiver channels that are laid out in half of the die. The effect of spreading these channels to all the die area can be shown in the EPE by the following transceiver placement.
The new placement relaxes the cooling requirement from a ΨCA of 0.332 to 0.342 °C/W and now the core fabric die has the highest ΨJC. Repeating the CFD analysis using the original cooling solution with the new power dissipations results in the following IHS temperature results.
Calculating the new junction temperatures with the updated power values and CFD results:
HSSI_2_0 die temperature: TJ = 84 + (150 * 0.042) = 90.3 °C
Core fabric temperature: TJ = 84 + (150 * 0.052) = 91.8 °C
This example demonstrates that the channel spreading could reduce the cooling requirement or result in lower junction temperatures for the same cooling solution.
As indicated previously the temperature sensors are not always in the exact position of the hot spots on the transceivers and depending on the transceiver placement, the EPE calculates the offset value which needs to be added to the field reading.
The transceiver TSDs in the first example should report the following values:
TSD_HSSI_2_0 = 85.5 °C
TSD_HSSI_0_0 = 85.5 °C
TSD_HSSI_2_1 = 81.2 °C
TSD_HSSI_0_1 = 80 °C
Adding the offset values to these numbers provide the actual temperatures shown below:
TJ _HSSI_2_0= 85.5+8 = 93.8 °C
TJ _HSSI_0_0= 85.5+5 = 90.5 °C
TJ _HSSI_2_1=81.2+7 = 88.2 °C
TJ _HSSI_0_1=80+10 = 90 °C
|January 2018||2018.01.26||Updated the app note to account for changes in the latest EPE with HBM and E-Tile updates|
|June 2017||2017.06.19||Added methodology to use the Thermal worksheet of the EPE tool|
|February 2017||2017.02.03||Initial release|