## Abstract

We consider time-average Markov Decision Processes (MDPs), which accumulate a reward and cost at each decision epoch. A policy meets the sample-path constraint if the time-average cost is below a specified value with probability one. The optimization problem is to maximize the expected average reward over all policies that meet the sample-path constraint. The sample-path constraint is compared with the more commonly studied constraint of requiring the average expected cost to be less than a specified value. Assuming that a policy exists that meets the sample-path constraint, we establish that there exist nearly optimal stationary policies for communicating MDPs. A parametric linear programming algorithm is given to construct nearly optimal stationary policies. The discussion relies on well known results from the theory of stochastic processes and linear programming.

Original language | English (US) |
---|---|

Pages (from-to) | 780-790 |

Number of pages | 11 |

Journal | Operations Research |

Volume | 37 |

Issue number | 5 |

DOIs | |

State | Published - 1989 |

## ASJC Scopus subject areas

- Computer Science Applications
- Management Science and Operations Research