TY - GEN
T1 - NetCheck
T2 - 11th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2014
AU - Zhuang, Yanyan
AU - Gessiou, Eleni
AU - Portzer, Steven
AU - Fund, Fraida
AU - Muhammad, Monzur
AU - Beschastnikh, Ivan
AU - Cappos, Justin
N1 - Funding Information:
We thank our shepherd Dejan Kostic, Ulrike Stege for discussing ordering algorithms with us, and our reviewers for their invaluable feedback. This work was supported in part by the National Science Foundation through Awards 1223588 and 1205415, NSF Graduate Research Fellowship Award 1104522, the NYU WIRELESS research center and the Center for Advanced Technology in Telecommunications (CATT).
PY - 2014
Y1 - 2014
N2 - This paper introduces NetCheck, a tool designed to diagnose network problems in large and complex applications. NetCheck relies on blackbox tracing mechanisms, such as strace, to automatically collect sequences of network system call invocations generated by the application hosts. NetCheck performs its diagnosis by (1) totally ordering the distributed set of input traces, and by (2) utilizing a network model to identify points in the totally ordered execution where the traces deviated from expected network semantics. Our evaluation demonstrates that NetCheck is able to diagnose failures in popular and complex applications without relying on any application- or network-specific information. For instance, NetCheck correctly identified the existence of NAT devices, simultaneous network disconnection/reconnection, and platform portability issues. In a more targeted evaluation, NetCheck correctly detects over 95% of the network problems we found from bug trackers of projects like Python, Apache, and Ruby. When applied to traces of faults reproduced in a live network, NetCheck identified the primary cause of the fault in 90% of the cases. Additionally, NetCheck is efficient and can process a GB-long trace in about 2 minutes.
AB - This paper introduces NetCheck, a tool designed to diagnose network problems in large and complex applications. NetCheck relies on blackbox tracing mechanisms, such as strace, to automatically collect sequences of network system call invocations generated by the application hosts. NetCheck performs its diagnosis by (1) totally ordering the distributed set of input traces, and by (2) utilizing a network model to identify points in the totally ordered execution where the traces deviated from expected network semantics. Our evaluation demonstrates that NetCheck is able to diagnose failures in popular and complex applications without relying on any application- or network-specific information. For instance, NetCheck correctly identified the existence of NAT devices, simultaneous network disconnection/reconnection, and platform portability issues. In a more targeted evaluation, NetCheck correctly detects over 95% of the network problems we found from bug trackers of projects like Python, Apache, and Ruby. When applied to traces of faults reproduced in a live network, NetCheck identified the primary cause of the fault in 90% of the cases. Additionally, NetCheck is efficient and can process a GB-long trace in about 2 minutes.
UR - http://www.scopus.com/inward/record.url?scp=85013638751&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85013638751&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85013638751
T3 - Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2014
SP - 115
EP - 128
BT - Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2014
PB - USENIX Association
Y2 - 2 April 2014 through 4 April 2014
ER -