Abstract
Time-average Markov decision processes (MDPs) with finite state and action spaces are considered. It is shown that the state space has a natural partition into strongly communicating classes and a set of states that are transient under all stationary policies. For every policy, any associated recurrent class must be a subset of one of the strongly communicating classes, also, there exists a stationary policy with recurrent classes that are strongly communicating. A polynomial-time algorithm is given to determine the partition. The decomposition theory is utilized to investigate MDPs with a sample-path constraint. For MDPs with arbitrary recurrent structures, it is shown that there exists an epsilon -optimal stationary policy for each epsilon greater than 0 if and only if there exists a feasible policy. Verifiable conditions are given for the existence of an optimal stationary policy.
Original language | English (US) |
---|---|
Pages (from-to) | 2264-2269 |
Number of pages | 6 |
Journal | Proceedings of the IEEE Conference on Decision and Control |
State | Published - 1987 |
ASJC Scopus subject areas
- Control and Systems Engineering
- Modeling and Simulation
- Control and Optimization