High-speed downlink packet access (HSDPA) is based on the evolution of W-CDMA technology and has been standardized in the 3GPP W-CDMA Release 5 specifications. Targeted at mobile multimedia applications, HSDPA is capable of achieving reduced delays and peak data rates up to 14 Mbps in the downlink (i.e., from the basestation to the mobile terminal). This is made possible by the addition of a new high-speed downlink shared channel and three fundamental technologies relying on the rapid adaptation of transmission parameters to the instantaneous channel conditions:
- Adaptive modulation and coding (AMC)
- Fast hybrid automatic repeat-request (ARQ)
- Fast scheduling
HSDPA Channel Coding Implementation Issues
HSDPA channel coding involves rate 1/3 turbo encoding and other functions, such as cyclic redundancy check (CRC), rate matching, and interleaving (shown in Figure 1).
Figure 1. Channel Coding Scheme in HSDPA
The turbo encoder consists of two recursive convolutional encoders and an internal interleaver. While the convolutional encoders are simple to implement in both hardware and software, the interleaver tends to be complex because of its variability. Any block size from 40 to 5,114 must be supported, and the block size can vary every transmission time interval (TTI) of 2 ms. This is a significant computational burden for a digital signal processor and adds to the latency, which is a critical parameter in HSDPA.
Instead of using a digital signal processor to perform this function, you can download blocks of data to a turbo encoder accelerator function implemented on an FPGA. This removes the need to calculate the look-up table (LUT) content for the interleaver, and also takes the highly repetitive encoding task off the digital signal processor, freeing up bandwidth for the other operations the digital signal processor has to perform.
Channel Coding Acceleration with Altera FPGAs
This section describes the efficient implementation of channel coding functions using Altera's low-cost Cyclone® FPGAs.
Integrated Channel Coding Solution
In addition to turbo encoding, other functions such as CRC generation, code-block segmentation, rate matching, interleaving, and symbol mapping can be efficiently implemented on a single Cyclone EP1C12 FPGA. This not only removes the computational burden for highly repetitive instructions from the digital signal processor, but also reduces the required data bus bandwidth. As data passes through the channel coding chain, shown in Figure 1, the number of bits increases. If data is downloaded at the very beginning of the chain, the smallest number of bits must be transferred from the digital signal processor to the accelerator. Table 1 lists the estimated number of logic elements (LEs) and memory bits required to implement each of the channel coding functions. The total computational requirement is well within the capacity of a single Cyclone EP1C12 device.
|Table 1. Computational Requirements for Integrated Solution|
|Code block segmentation||300||0|
|Physical layer hybrid-ARQ functionality||1,400||30,000|
|Physical channel segmentation||100||0|
|High-speed downlink shared channel interleaving||500||30,000|
|Constellation rearrangement for 16QAM||100||0|
|Physical channel mapping||50||0|
|All functions together||8,630||100,000|
Adaptive Parameter Calculations on Processors
The physical layer hybrid-ARQ functionality involves performing rate matching in two stages. Implementing the two stages involves parameters calculation that determines the necessity and extent of puncturing or repetition. In addition, other variable parameters, such as the block size of the turbo encoder and parameters for physical channel segmentation, must be computed. The algebraic computations involved in these parameter calculations can be efficiently implemented on the flexible Nios® II embedded processor and dual-core ARM® CortexTM-A9 MPCoreTM hard processor. This gives you the flexibility and portability of high-level software design, while maintaining the performance benefits of parallel hardware operations in FPGAs.
Altera FPGA Coprocessor Features
Altera has developed design tools and methodologies that enable you to develop FPGA coprocessing solutions using Altera's Stratix® II, Stratix, and Cyclone devices. Altera® FPGA coprocessors interface with a wide range of digital signal processors and general-purpose processors, providing increased system performance and lower system costs. The high-level architecture for hardware acceleration using the Altera FPGA coprocessors with the Texas Intruments (TI) digital signal processors is illustrated in Figure 2. The hardware accelerators are direct memory access (DMA)-driven via the TI external memory interface (EMIF), and the data is buffered using first-in first-out (FIFO) buffers.
The overall architecture flexibility of Altera FPGA coprocessors enables a system definition that can be relatively tightly coupled to the master CPU or a loosely coupled data-processing plane that has only minimal set-up and status interaction with the master CPU. This wide variation in capabilities makes Altera FPGA coprocessors suitable for dealing with systems with a wide range of performance and flexibility requirements.
Figure 2. Altera FPGA Coprocessor Example
Altera Advantage for HSDPA
This section outlines the many advantages of using Altera solutions to implement HSDPA.
A high-end digital signal processor typically costs around $130, with the turbo encoding process alone taking up 30 to 40 percent of its resources. This is very inefficient compared to Altera's Cyclone FPGA, which can do everything from CRC, turbo encoding, and rate matching to interleaving and quadrature amplitude modulation — all in one Cyclone EP1C12 device that costs around one-fifth the price of a high-end digital signal processor (10K units pricing).
Table 2 gives an example of the cost reduction that can be achieved by performing just the turbo encoder accelerator function on the Cyclone platform, as opposed to a high-end digital signal processor.
|Table 2. FPGA Accelerator: Cost Analysis Example|
|Encoder||Cyclone Device||High-End Digital Signal Processor|
|14.4-Mbps turbo encoder||
|58-Mbps turbo encoder||
- The source for the high-end digital signal processor is the TI website. The 9.7 cycles per bit does not include the calculations necessary for the interleaver table setup information when the block size changes every 2 ms.
The channel coding process involves bitwise operations. This leads to inefficient use of resources when implemented with digital signal processors, which have fixed data widths. You can customize the Cyclone device's M4K memory blocks and the Nios II processor by using different data widths, coefficient widths, and precision choices as needed, providing an optimal digital signal processor implementation for the channel coding application.
Altera's DSP Builder and Qsys system integration development tools and Quartus® II software enable you to easily build and interface Altera FPGA coprocessing blocks with standard processors. You do not need to have a background in register transfer level (RTL) design and do not need to make any changes to the software development environment or the digital signal processor platform.