Programs are typically evaluated through the average treatment effect and its standard error. In particular, is the treatment effect positive and is it statistically significant? In theory, programs should be evaluated in a decision framework, using social welfare functions and posterior predictive distributions for outcomes of interest. This chapter discusses the use of stochastic dominance of predictive distributions of outcomes to rank programs, and, under more restrictive parametric and functional form assumptions, the chapter develops intuitive mean-variance tests for program evaluation that are consistent with the underlying decision problem. These concepts are applied to the GAIN and JTPA datasets.