Abstract
In this paper we address the following basic feasibility problem for infinite-horizon Markov decision processes (MDP’s): can a policy be found that achieves a specified value (target) of the long-run limiting average reward at a specified probability level (percentile)? Related optimization problems of maximizing the target for a specified percentile and vice versa are also considered. We present a complete (and discrete) classification of both the maximal achievable target levels and of their corresponding percentiles. We also provide an algorithm for computing a deterministic policy corresponding to any feasible target-percentile pair. Next we consider similar problems for an MDP with multiple rewards and/or constraints. This case presents some difficulties and leads to several open problems. An LP-based formulation provides constructive solutions for most cases.
Original language | English (US) |
---|---|
Pages (from-to) | 2-10 |
Number of pages | 9 |
Journal | IEEE Transactions on Automatic Control |
Volume | 40 |
Issue number | 1 |
DOIs | |
State | Published - Jan 1995 |
ASJC Scopus subject areas
- Control and Systems Engineering
- Computer Science Applications
- Electrical and Electronic Engineering