SAMPLE PATH THEORY FOR TIME-AVERAGE MARKOV DECISION PROCESSES.

Keith W. Ross, Ravi Varadarajan

    Research output: Contribution to journalConference articlepeer-review

    Abstract

    Time-average Markov decision processes (MDPs) with finite state and action spaces are considered. It is shown that the state space has a natural partition into strongly communicating classes and a set of states that are transient under all stationary policies. For every policy, any associated recurrent class must be a subset of one of the strongly communicating classes, also, there exists a stationary policy with recurrent classes that are strongly communicating. A polynomial-time algorithm is given to determine the partition. The decomposition theory is utilized to investigate MDPs with a sample-path constraint. For MDPs with arbitrary recurrent structures, it is shown that there exists an epsilon -optimal stationary policy for each epsilon greater than 0 if and only if there exists a feasible policy. Verifiable conditions are given for the existence of an optimal stationary policy.

    Original languageEnglish (US)
    Pages (from-to)2264-2269
    Number of pages6
    JournalProceedings of the IEEE Conference on Decision and Control
    StatePublished - 1987

    ASJC Scopus subject areas

    • Control and Systems Engineering
    • Modeling and Simulation
    • Control and Optimization

    Fingerprint Dive into the research topics of 'SAMPLE PATH THEORY FOR TIME-AVERAGE MARKOV DECISION PROCESSES.'. Together they form a unique fingerprint.

    Cite this