Abstract
This article addresses the Markov decision problem with long-run average reward V//u when there is a global constraint to be satisfied: I//u less than equivalent to alpha , where I//u is also a long-run average. Using Lagrange multiplier techniques, existence of an optimal stationary policy is proven. Unlike the unconstrained theory, optimal stationary policies are in general randomized. Structural properties of an optimal policy are determined and the corresponding dynamic programming equations are derived.
Original language | English (US) |
---|---|
Pages | 175-179 |
Number of pages | 5 |
State | Published - 1984 |
ASJC Scopus subject areas
- General Engineering