Deep Reinforcement Learning-Based Resource Allocation in Massive MIMO NOMA Systems

doi:10.5121/ijcnc.2025.17601

Volume 17, Number 6

Deep Reinforcement Learning-Based Resource Allocation in Massive MIMO NOMA Systems

Authors

Pham Hoai An ^1,2, Nguyen Dung ^1,2, Nguyen Thi Xuan Uyen ^1,2, Nguyen Thai Cong Nghia ^1,2 and Ngo Minh Nghia ^1,2
¹ VNUHCM - University of Science, Vietnam, ² Vietnam National University, Vietnam

Abstract

Massive MIMO systems with preconfigured spatial beams efficiently serve near-field (NF) users, while farfield (FF) users can be multiplexed on the same beams using non-orthogonal multiple access (NOMA). To realistically capture propagation, the spherical wave model (SWM) is employed for NF channels and the plane wave model (PWM) for FF channels, reflecting the distinct near- and far-field regions. While conventional optimization approaches such as successive convex approximation (SCA) and branch- ndbound (BB) suffer from local optimality or prohibitive complexity, recent advances in deep learning have enabled scalable and adaptive solutions for wireless resource allocation. On this basis, a resource allocation strategy is developed using the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, where the base station acts as an agent that dynamically adjusts power and allocation coefficients to maximize the sum throughput of FF users. Simulation results show that the proposed RLbased method can approach, and in some cases match, deterministic SCA at high SNR, while consistently outperforming randomly initialized SCA in medium-to-high SNR regimes. Compared to optimization-based baselines, the TD3 approach eliminates iterative problem reformulation, reduces computational complexity, and provides stronger adaptability to dynamic channels and user mobility.

Keywords

Deep Reinforcement Learning, Massive MIMO, NOMA, Resource Allocation, TD3.

Archives