Percentile Performance Criteria For Limiting Average Markov Decision Processes

Jerzy A. Filar, Dmitry Krass, Keith W. Ross

    Research output: Contribution to journalArticlepeer-review

    Abstract

    In this paper we address the following basic feasibility problem for infinite-horizon Markov decision processes (MDP’s): can a policy be found that achieves a specified value (target) of the long-run limiting average reward at a specified probability level (percentile)? Related optimization problems of maximizing the target for a specified percentile and vice versa are also considered. We present a complete (and discrete) classification of both the maximal achievable target levels and of their corresponding percentiles. We also provide an algorithm for computing a deterministic policy corresponding to any feasible target-percentile pair. Next we consider similar problems for an MDP with multiple rewards and/or constraints. This case presents some difficulties and leads to several open problems. An LP-based formulation provides constructive solutions for most cases.

    Original languageEnglish (US)
    Pages (from-to)2-10
    Number of pages9
    JournalIEEE Transactions on Automatic Control
    Volume40
    Issue number1
    DOIs
    StatePublished - Jan 1995

    ASJC Scopus subject areas

    • Control and Systems Engineering
    • Computer Science Applications
    • Electrical and Electronic Engineering

    Fingerprint

    Dive into the research topics of 'Percentile Performance Criteria For Limiting Average Markov Decision Processes'. Together they form a unique fingerprint.

    Cite this