This paper aims to contribute to bridge the gap between existing theoretical results in distributed radio resource allocation policies based on equilibria in games (assuming complete information and rational players) and practical design of signal processing algorithms for self-configuring wireless networks. For this purpose, the framework of learning theory m games is exploited. Here, a new learning algorithm based on mild information assumptions at the transmitters is presented. This algorithm possesses attractive convergence properties not available for standard reinforcement learning algorithms and in addition, it allows each transmitter to learn both its optimal strategy and the values of its expected utility for all its actions. A detailed convergence analysis is conducted. In particular, a framework for studying heterogeneous wireless networks where transmitters do not learn at the same rate is provided. The proposed algorithm, which can be applied to any wireless network verifying the information assumptions stated, is applied to the case of multiple access channels in order to provide some numerical results.