## Abstract

Time-average Markov decision processes (MDPs) with finite state and action spaces are considered. It is shown that the state space has a natural partition into strongly communicating classes and a set of states that are transient under all stationary policies. For every policy, any associated recurrent class must be a subset of one of the strongly communicating classes, also, there exists a stationary policy with recurrent classes that are strongly communicating. A polynomial-time algorithm is given to determine the partition. The decomposition theory is utilized to investigate MDPs with a sample-path constraint. For MDPs with arbitrary recurrent structures, it is shown that there exists an epsilon -optimal stationary policy for each epsilon greater than 0 if and only if there exists a feasible policy. Verifiable conditions are given for the existence of an optimal stationary policy.

Original language | English (US) |
---|---|

Pages (from-to) | 2264-2269 |

Number of pages | 6 |

Journal | Proceedings of the IEEE Conference on Decision and Control |

State | Published - 1987 |

## ASJC Scopus subject areas

- Control and Systems Engineering
- Modeling and Simulation
- Control and Optimization