TY - GEN
T1 - Adaptive Honeypot Engagement Through Reinforcement Learning of Semi-Markov Decision Processes
AU - Huang, Linan
AU - Zhu, Quanyan
N1 - Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019
Y1 - 2019
N2 - A honeynet is a promising active cyber defense mechanism. It reveals the fundamental Indicators of Compromise (IoCs) by luring attackers to conduct adversarial behaviors in a controlled and monitored environment. The active interaction at the honeynet brings a high reward but also introduces high implementation costs and risks of adversarial honeynet exploitation. In this work, we apply infinite-horizon Semi-Markov Decision Process (SMDP) to characterize a stochastic transition and sojourn time of attackers in the honeynet and quantify the reward-risk trade-off. In particular, we design adaptive long-term engagement policies shown to be risk-averse, cost-effective, and time-efficient. Numerical results have demonstrated that our adaptive engagement policies can quickly attract attackers to the target honeypot and engage them for a sufficiently long period to obtain worthy threat information. Meanwhile, the penetration probability is kept at a low level. The results show that the expected utility is robust against attackers of a large range of persistence and intelligence. Finally, we apply reinforcement learning to the SMDP to solve the curse of modeling. Under a prudent choice of the learning rate and exploration policy, we achieve a quick and robust convergence of the optimal policy and value.
AB - A honeynet is a promising active cyber defense mechanism. It reveals the fundamental Indicators of Compromise (IoCs) by luring attackers to conduct adversarial behaviors in a controlled and monitored environment. The active interaction at the honeynet brings a high reward but also introduces high implementation costs and risks of adversarial honeynet exploitation. In this work, we apply infinite-horizon Semi-Markov Decision Process (SMDP) to characterize a stochastic transition and sojourn time of attackers in the honeynet and quantify the reward-risk trade-off. In particular, we design adaptive long-term engagement policies shown to be risk-averse, cost-effective, and time-efficient. Numerical results have demonstrated that our adaptive engagement policies can quickly attract attackers to the target honeypot and engage them for a sufficiently long period to obtain worthy threat information. Meanwhile, the penetration probability is kept at a low level. The results show that the expected utility is robust against attackers of a large range of persistence and intelligence. Finally, we apply reinforcement learning to the SMDP to solve the curse of modeling. Under a prudent choice of the learning rate and exploration policy, we achieve a quick and robust convergence of the optimal policy and value.
KW - Active defense
KW - Honeynet
KW - Reinforcement learning
KW - Risk quantification
KW - Semi-Markov decision processes
UR - http://www.scopus.com/inward/record.url?scp=85074777610&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85074777610&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-32430-8_13
DO - 10.1007/978-3-030-32430-8_13
M3 - Conference contribution
AN - SCOPUS:85074777610
SN - 9783030324292
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 196
EP - 216
BT - Decision and Game Theory for Security - 10th International Conference, GameSec 2019, Proceedings
A2 - Alpcan, Tansu
A2 - Vorobeychik, Yevgeniy
A2 - Baras, John S.
A2 - Dán, György
PB - Springer
T2 - 10th International Conference on Decision and Game Theory for Security, GameSec 2019
Y2 - 30 October 2019 through 1 November 2019
ER -