We consider the Markov decision problem of finding a policy to maximize the long-run average reward subject to K long-run average cost constraints. We show that there exists an optimal policy with a degree of randomization less than or equal to K. Consequently, it is never necessary to randomize in more than K states. An algorithm employing linear programming is shown to produce the optimal policy with the limited-randomization property.
|Original language||English (US)|
|Number of pages||3|
|State||Published - 1986|
ASJC Scopus subject areas