UNIFORMIZATION FOR SEMI-MARKOV DECISION PROCESSES UNDER STATIONARY POLICIES.

Frederick J. Beutler, Keith W. Ross

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Uniformization permits the replacement of a semi-Markov decision process (SMDP) by a Markov chain exhibiting the same asverage rewards for simple (non-randomized) policies. It is shown that various anomalies may occur, especially for stationary (randomized) policies; uniformization introduces virtual jumps with concomitant action changes not present in the original process. Since these lead to discrepancies in the average rewards for stationary processes, uniformization can be accepted as valid only for simple policies. We generalize uniformization to yield consistent results for stationary policies also. These results are applied to constrained optimization of SMDP, in which stationary (randomized) policies appear naturally.

    Original languageEnglish (US)
    Pages (from-to)644-656
    Number of pages13
    JournalJournal of Applied Probability
    Volume24
    Issue number3
    DOIs
    StatePublished - 1987

    ASJC Scopus subject areas

    • Statistics and Probability
    • General Mathematics
    • Statistics, Probability and Uncertainty

    Fingerprint

    Dive into the research topics of 'UNIFORMIZATION FOR SEMI-MARKOV DECISION PROCESSES UNDER STATIONARY POLICIES.'. Together they form a unique fingerprint.

    Cite this