- Fast and Scalable Baseband Signal Processing for Massive MU-MIMO Systems
Achieving high spectral efficiency in realistic massive multi-user (MU) multiple-input multiple-output (MIMO) wireless systems requires computationally-complex algorithms for data detection in the uplink (users transmit to base station) and beamforming in the downlink (base station transmits to users). Traditional detection and beamforming algorithms, such as linear MMSE or ZF methods, are designed to be executed on centralized computing hardware at the base station (BS), which both results in prohibitive complexity for systems with hundreds or thousands of antennas and generates raw baseband data rates that exceed the limits of current interconnect technology and chip I/O interfaces. This project proposes following complementary schemes to break above bottlenecks for fast and scalable massive MIMO baseband processing:
(1) Approximated and Interpolated Processing
For a base station with tens to hundreds of antennas, where centralized processing is still manageable, we solve detection and beamforming problems by numerical approximation and frequency domain interpolation to significantly reduce the computational complexity while maintaining MMSE/ZF error-rate performance. We also show the hardware efficacy of approximated processing algorithms based on parallel GPU implementations.
[J1] Implicit vs. Explicit Approximate Matrix Inversion for Wideband Massive MU-MIMO Data Detection, JSPS 2017
[C1] Accelerating Massive MIMO Uplink Detection on GPU for SDR Systems, DCAS 2015
[T1] GPU Accelerated Reconfigurable Detector and Precoder for Massive MIMO SDR Systems, M.S. Thesis
(2) Decentralized Processing
To further scale up the system to support hundreds to thousands of BS antennas, we propose novel decentralized baseband processing architectures that partition the BS antenna array into clusters, each associated with independent radio-frequency chains, analog and digital modulation circuitry, and computing hardware. For such architectures, we develop novel decentralized data detection and beamforming algorithms that only access local channel-state information and require low communication bandwidth among the clusters. We study the associated trade-offs between error-rate performance, computational complexity, interconnect bandwidth and achievable rates, and we demonstrate the scalability and hardware efficiency of our solutions for massive MU-MIMO systems with hundreds to thousands of BS antennas using reference implementations on GPU, Xeon Phi and FPGA clusters. Above figure illustrates the decentralized processing architecture, see details here.
[J2] Decentralized Baseband Processing for Massive MU-MIMO Systems, JETCAS 2017
[J3] Decentralized Equalization with Feedforward Architectures for Massive MU-MIMO, TSP 2018
[C2] Decentralized Beamforming for Massive MU-MIMO on a GPU Cluster, GlobalSIP 2016
[C3] Decentralized Data Detection for Massive MU-MIMO on a Xeon Phi Cluster, Asilomar 2016
[C4] On the Achievable Rates of Decentralized Equalization in Massive MU-MIMO Systems, ISIT 2017
[C5] Decentralized Equalization for Massive MU-MIMO on FPGA, Asilomar 2017
[C6] Feedforward Architectures for Decentralized Precoding in Massive MU-MIMO Systems, Asilomar 2018
We note that our proposed approximation, interpolation, and decentralization schemes are complementary and can be further integrated together for efficient design and implementation of practical massive MU-MIMO systems.
- Efficient Digital Predistortion Algorithms and Parallel Designs on Embedded Processors
Modern radio transceivers are targeting low design costs and high power efficiency, while entailing imperfections of analog RF and digital baseband, such as power amplifier (PA) nonlinearities, imbalance of complex in-phase and quadrature (I/Q) signals, and local oscillator (LO) leakage, which result in intermodulation distortion and spurious spectrum emissions. One obvious solution to decrease the levels of unwanted emissions is to back off the transmit power from its saturation region, which is called maximum power reduction (MPR) in 3GPP LTE context, but it will significantly sacrifice transmit efficiency and distance. Currently, digital predistortion (DPD) technique serves as an alternative solution for suppressing spurious emissions by preprocessing the I/Q samples to enable cancellation effects at baseband before passing through the PA, which also poses challenges on low-complexity algorithm development and flexible and efficient implementation for low transceiver design costs. This project proposes both novel efficient DPD algorithms and high performance implementations using modern embedded parallel processors, such as mobile multi-core CPUs and GPUs, and SoC FPGAs:
(1) Full-band Digital Predistortion
Full-band DPD seeks to linearize the full composite transmit signal. We develop a customized software-defined radio (SDR) platform using WARP V3 radio board and general purpose processors, and experimentally monitor the DPD suppression effect by integrating our embedded parallel DPD implementations on the SDR platform. Above figure shows our experimental setup, see details here.
[J1] Parallel Digital Predistortion Design on Mobile GPU and Embedded Multicore CPU for Mobile Transmitters, JSPS 2017
[C1] Mobile GPU Accelerated Digital Predistortion on a Software-defined Mobile Transmitter, GlobalSIP 2015
[C2] A High Performance GPU-based Software-defined Basestation, Asilomar 2014
(2) Sub-band Digital Predistortion
Sub-band DPD seeks to suppress the spurious emissions in non-contiguous transmission cases, concentrating the linearization efforts to the most critical intermodulation distortion band, but not the main component carrier bands. We develop novel low-complexity sub-band DPD algorithms and demonstrate the effectiveness on WARP V3 radio board with real-time implementations on its SoC FPGA.
[J2] Low-Complexity Sub-band Digital Predistortion for Spurious Emission Suppression in Noncontiguous Spectrum Access, TMTT 2016
[C3] Sub-band Digital Predistortion for Noncontiguous Transmissions: Algorithm Development and Real-Time Prototype Implementation, Asilomar 2015
- Joseph Cavallaro, Professor, Rice University, USA
- Christoph Studer, Assistant professor, Cornell University, USA
- Tom Goldstein, Assistant professor, University of Maryland, USA
- Mikko Valkama, Professor, University of Tampere University of Technology, Finland
- Markku Juntti, Professor, University of Oulu, Finland
- Jani Boutellier, Assistant professor, University of Tampere University of Technology, Finland
- Michael Wu, Staff design engineer, Xilinx, USA
- Guohui Wang, Research engineer, Snapchat, USA
- Bei Yin, Staff engineer, Qualcomm, USA
- Aida Vosoughi, Senior engineer, Oracle, USA
- Chance Tarver, Graduate Assistant, Rice University, USA