TY - JOUR
T1 - All-uses vs mutation testing
T2 - An experimental comparison of effectiveness
AU - Frankl, Phyllis G.
AU - Weiss, Stewart N.
AU - Hu, Cang
N1 - Funding Information:
C. Hu was with PolytechnicU nioers&wy hent his work wasp er-formed l%e wonko f P. G. Franksa nd C. Hu wass upporteidn part by NSF Gmnt CCR-9206910P.. Fmnkl wasp artiallys upporteldy a Viiiting Fellowshipg mn~G R/Loo445, to the Centrefo r Softwam Reliabilityfm m the UK Engineer& and PhysicalS cienceRs esearch Councb!S . N. Weiss’sw ork was supportedin part ly NSF Gmnt CDA-9222720.
PY - 1997/9
Y1 - 1997/9
N2 - The effectiveness of a test data adequacy criterion for a given program and specification is the probability that a test set satisfying the criterion will expose a fault. Experiments were performed to compare the effectiveness of the mutation testing and all-uses test data adequacy criteria at various coverage levels, for randomly generated test sets. Large numbers of test sets were generated and executed, and for each, the proportion of mutants killed or def-use associations covered was measured. This data was used to estimate and compare the effectiveness of the criteria. The results were mixed: at the highest coverage levels considered, mutation was more effective than all-uses for five of the nine subjects, all-uses was more effective than mutation for two subjects, and there was no clear winner for two subjects. However, mutation testing was much more expensive than all-uses. The relationship between coverage and effectiveness for fixed-sized test sets was also explored and was found to be nonlinear and, in many cases, nonmonotonic.
AB - The effectiveness of a test data adequacy criterion for a given program and specification is the probability that a test set satisfying the criterion will expose a fault. Experiments were performed to compare the effectiveness of the mutation testing and all-uses test data adequacy criteria at various coverage levels, for randomly generated test sets. Large numbers of test sets were generated and executed, and for each, the proportion of mutants killed or def-use associations covered was measured. This data was used to estimate and compare the effectiveness of the criteria. The results were mixed: at the highest coverage levels considered, mutation was more effective than all-uses for five of the nine subjects, all-uses was more effective than mutation for two subjects, and there was no clear winner for two subjects. However, mutation testing was much more expensive than all-uses. The relationship between coverage and effectiveness for fixed-sized test sets was also explored and was found to be nonlinear and, in many cases, nonmonotonic.
UR - http://www.scopus.com/inward/record.url?scp=0031235549&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0031235549&partnerID=8YFLogxK
U2 - 10.1016/S0164-1212(96)00154-9
DO - 10.1016/S0164-1212(96)00154-9
M3 - Article
AN - SCOPUS:0031235549
SN - 0164-1212
VL - 38
SP - 235
EP - 253
JO - Journal of Systems and Software
JF - Journal of Systems and Software
IS - 3
ER -