DEEP REINFORCEMENT LEARNING AND HYPER HEURISTICS APPLIED TO RESOURCE ALLOCATION IN 6G COMMUNICATIONS SYSTEMS WITH D2D COMMUNICATIONS AND SENSING
Deep Reinforcement Learning, PPO, Hyper Heuristics, Communications and Sensing
This work proposes a strategy for the joint execution of spectrum allocation and power control in 5G mobile communication systems and future generations with integrated sensing. The application addressed in this work is situated in a context related to Industry 4.0, encompassing an industrial scenario with primary communications, D2D communications, and sensors. The proposed solution for resource allocation in this system consists of the conjunction of two state-of-the-art techniques: Deep Reinforcement Learning (DRL) algorithms and Hyper-Heuristics (HH). The first algorithm that forms the proposed joint strategy in this work was developed using neural networks trained through DRL techniques for power control. The second algorithm, which completes the proposed strategy, was developed using techniques related to the application of HHs in conjunction with DRL algorithms for the allocation of available spectrum. The main objectives of the joint strategy were: to protect primary communications, aiming to reduce the outage rate to ensure quality communication; and to protect the system’s sensors, aiming to reduce the sensor outage rate to ensure that the detection probability was above a predefined threshold. As a secondary objective, the proposed algorithm sought to maximize the transmission rate of D2D communications. The results showed that the power control algorithm that performed best, compared to other state-of-the-art algorithms in the area, was Proximal Policy Optimization (PPO). This proposed algorithm, separately from the spectrum allocation algorithm, was able, in a Resource Block (RB), to reduce the primary communications outage rate from 64.35% to 11.75%, reduce the sensor outage rate from 38.5% to 4.4%, and increase the SNIR of D2D communications from -25.6 dB to -7.5 dB, compared with the results obtained by a random algorithm. For the complete strategy, that is, with DRL and HH algorithms performing both power control and spectrum allocation, the results showed that, compared to a resource allocation based on random choices, the joint strategy was able to reduce the primary communications outage rate from 65.8% to 13.3%, reduce the sensor outage rate from 48.1% to 3.3%, and increase the SNIR of D2D communications from -24.3 dB to -11.2 dB in systems with multiple RBs. Additionally, the algorithm proved scalable for systems with varying amounts of communications, sensors, and RBs, being applicable in different system configurations.