TY - GEN
T1 - Distributed strategic learning with application to network security
AU - Zhu, Quanyan
AU - Tembine, Hamidou
AU - Başar, Tamer
PY - 2011
Y1 - 2011
N2 - We consider in this paper a class of two-player nonzero-sum stochastic games with incomplete information. We develop fully distributed reinforcement learning algorithms, which require for each player a minimal amount of information regarding the other player. At each time, each player can be in an active mode or in a sleep mode. If a player is in an active mode, she updates her strategy and estimates of unknown quantities using a specific pure or hybrid learning pattern. We use stochastic approximation techniques to show that, under appropriate conditions, the pure or hybrid learning schemes with random updates can be studied using their deterministic ordinary differential equation (ODE) counterparts. Convergence to state-independent equilibria is analyzed under specific payoff functions. Results are applied to a class of security games in which the attacker and the defender adopt different learning schemes and update their strategies at random times.
AB - We consider in this paper a class of two-player nonzero-sum stochastic games with incomplete information. We develop fully distributed reinforcement learning algorithms, which require for each player a minimal amount of information regarding the other player. At each time, each player can be in an active mode or in a sleep mode. If a player is in an active mode, she updates her strategy and estimates of unknown quantities using a specific pure or hybrid learning pattern. We use stochastic approximation techniques to show that, under appropriate conditions, the pure or hybrid learning schemes with random updates can be studied using their deterministic ordinary differential equation (ODE) counterparts. Convergence to state-independent equilibria is analyzed under specific payoff functions. Results are applied to a class of security games in which the attacker and the defender adopt different learning schemes and update their strategies at random times.
UR - http://www.scopus.com/inward/record.url?scp=80053140894&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80053140894&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:80053140894
SN - 9781457700804
T3 - Proceedings of the American Control Conference
SP - 4057
EP - 4062
BT - Proceedings of the 2011 American Control Conference, ACC 2011
T2 - 2011 American Control Conference, ACC 2011
Y2 - 29 June 2011 through 1 July 2011
ER -