Uniformization permits the replacement of a semi-Markov decision process (SMDP) by a Markov chain exhibiting the same asverage rewards for simple (non-randomized) policies. It is shown that various anomalies may occur, especially for stationary (randomized) policies; uniformization introduces virtual jumps with concomitant action changes not present in the original process. Since these lead to discrepancies in the average rewards for stationary processes, uniformization can be accepted as valid only for simple policies. We generalize uniformization to yield consistent results for stationary policies also. These results are applied to constrained optimization of SMDP, in which stationary (randomized) policies appear naturally.
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty