Abstract
Time-average Markov decision processes with finite state and action spaces are considered. Several definitions of variability are introduced and compared. It is shown that a stationary policy maximizes one of these criteria, namely, the expected long-run average variability. An algorithm that produces such an optimal stationary policy is given.
Original language | English (US) |
---|---|
Pages (from-to) | 1261-1262 |
Number of pages | 2 |
Journal | Proceedings of the IEEE Conference on Decision and Control |
Volume | 2 |
State | Published - 1989 |
Event | Proceedings of the 28th IEEE Conference on Decision and Control. Part 2 (of 3) - Tampa, FL, USA Duration: Dec 13 1989 → Dec 15 1989 |
ASJC Scopus subject areas
- Control and Systems Engineering
- Modeling and Simulation
- Control and Optimization