TY - CHAP
T1 - Introduction to Partially Observed MDPs
AU - Zhu, Quanyan
AU - Xu, Zhiheng
N1 - Publisher Copyright:
© 2020, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020
Y1 - 2020
N2 - In our cross-layer design, we use different models to capture the properties of different layers. As stated in Chap. 8, we can use an MDP model to capture the dynamical movements of the cyber layer. However, in the scenario, we assume that the defender can observe the cyber state at each cyber time instant. In real applications, it is challenging to obtain the full information of the cyber state directly. Hence, the MDP cannot capture the incomplete knowledge of the cyber states. In this chapter, we will introduce a Partially Observed Markov Decision Process (POMDP) to capture the uncertainty of the cyber state. In a POMDP, instead of observing the states, we have an observation, whose distribution depends on the state. Therefore, we use this information to build a Hidden Markov Model (HMM) filter, which can construct a belief of the states. Based on the belief, we aim to find an optimal policy to minimize an expected cost.
AB - In our cross-layer design, we use different models to capture the properties of different layers. As stated in Chap. 8, we can use an MDP model to capture the dynamical movements of the cyber layer. However, in the scenario, we assume that the defender can observe the cyber state at each cyber time instant. In real applications, it is challenging to obtain the full information of the cyber state directly. Hence, the MDP cannot capture the incomplete knowledge of the cyber states. In this chapter, we will introduce a Partially Observed Markov Decision Process (POMDP) to capture the uncertainty of the cyber state. In a POMDP, instead of observing the states, we have an observation, whose distribution depends on the state. Therefore, we use this information to build a Hidden Markov Model (HMM) filter, which can construct a belief of the states. Based on the belief, we aim to find an optimal policy to minimize an expected cost.
UR - http://www.scopus.com/inward/record.url?scp=85096373017&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85096373017&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-60251-2_10
DO - 10.1007/978-3-030-60251-2_10
M3 - Chapter
AN - SCOPUS:85096373017
T3 - Advances in Information Security
SP - 139
EP - 145
BT - Advances in Information Security
PB - Springer
ER -