Abstract
Uniformization permits the replacement of a semi-Markov decision process (SMDP) by a Markov chain exhibiting the same asverage rewards for simple (non-randomized) policies. It is shown that various anomalies may occur, especially for stationary (randomized) policies; uniformization introduces virtual jumps with concomitant action changes not present in the original process. Since these lead to discrepancies in the average rewards for stationary processes, uniformization can be accepted as valid only for simple policies. We generalize uniformization to yield consistent results for stationary policies also. These results are applied to constrained optimization of SMDP, in which stationary (randomized) policies appear naturally.
Original language | English (US) |
---|---|
Pages (from-to) | 644-656 |
Number of pages | 13 |
Journal | Journal of Applied Probability |
Volume | 24 |
Issue number | 3 |
DOIs | |
State | Published - 1987 |
ASJC Scopus subject areas
- Statistics and Probability
- General Mathematics
- Statistics, Probability and Uncertainty