Abstract
We consider the Markov decision problem of finding a policy to maximize the long-run average reward subject to K long-run average cost constraints. We show that there exists an optimal policy with a degree of randomization less than or equal to K. Consequently, it is never necessary to randomize in more than K states. An algorithm employing linear programming is shown to produce the optimal policy with the limited-randomization property.
Original language | English (US) |
---|---|
Pages | 649-651 |
Number of pages | 3 |
State | Published - 1986 |
ASJC Scopus subject areas
- General Engineering