Many analytical and empirical studies of software testing effectiveness have used the probability that a test set exposes at least one fault as the measure of effectiveness. That measure is useful for evaluating testing techniques when the goal of testing is to gain confidence that the program is free from faults. However, if the goal of testing is to improve the reliability of the program (by discovering and removing those faults that are most likely to cause failures when the software is in the field) then the measure of test effectiveness must distinguish between those faults that are likely to cause failures and those that are unlikely to do so. Delivered reliability was previously introduced as a means of comparing testing techniques in that setting. This paper empirically compares reliability delivered by three testing techniques, branch testing, the all-uses data flow testing criterion, and operational testing. The subject program is a moderate-sized C-program (about 10,000 LOC) produced by professional programmers and containing naturally occurring faults.