In this paper, we study dynamic robust power allocation strategies under the imperfectness of the channel state information at the transmitters. Considering unknown payoff functions at the transmitters, we propose an heterogeneous Delayed COmbined fully DIstributed Payoff and Strategy Reinforcement Learning (Delayed-CODIPAS-RL) in which each transmitter learns its payoff function as well as its associated optimal strategies in the long-term. We show that equilibrium power allocations can be obtained using the multiplicative weighted imitative CODIPAS-RLs and Bush-Mosteller based CODIPAS-RL. We also show almost sure convergence to the set of global optima for specific scenarios.
- Combined learning
- Heterogeneous learning
- Power allocation
ASJC Scopus subject areas
- Computer Networks and Communications