Markov decision processes with sample path constraints: the communicating case

Keith W. Ross, Ravi Varadarajan

    Research output: Contribution to journalArticlepeer-review

    Abstract

    We consider time-average Markov Decision Processes (MDPs), which accumulate a reward and cost at each decision epoch. A policy meets the sample-path constraint if the time-average cost is below a specified value with probability one. The optimization problem is to maximize the expected average reward over all policies that meet the sample-path constraint. The sample-path constraint is compared with the more commonly studied constraint of requiring the average expected cost to be less than a specified value. Assuming that a policy exists that meets the sample-path constraint, we establish that there exist nearly optimal stationary policies for communicating MDPs. A parametric linear programming algorithm is given to construct nearly optimal stationary policies. The discussion relies on well known results from the theory of stochastic processes and linear programming.

    Original languageEnglish (US)
    Pages (from-to)780-790
    Number of pages11
    JournalOperations Research
    Volume37
    Issue number5
    DOIs
    StatePublished - 1989

    ASJC Scopus subject areas

    • Computer Science Applications
    • Management Science and Operations Research

    Fingerprint

    Dive into the research topics of 'Markov decision processes with sample path constraints: the communicating case'. Together they form a unique fingerprint.

    Cite this