There are many online systems where millions of users post original content such as videos, reviews of items such as products, services and businesses, etc. While there are general rules for good behavior or even formal Terms of Service, there are still users who post content that is not suitable. Increasingly, online systems rely on other users who view the posted content to provide feedback. We study online systems where users report negative feedback, i.e., report abuse; these systems are quite distinct from much studied, traditional reputation systems that focus on eliciting popularity of content by various voting methods. The central problem that we study here is how to monitor the quality of negative feedback, that is, detect negative feedback which is incorrect, or perhaps even malicious. Systems address this problem by testing flags manually, which is an expensive operation. As a result, there is a tradeoff between the number of manual tests and the number of errors defined as the number of incorrect flags the monitoring system misses. Our contributions are as follows: • We initiate a systematic study of negative feedbacks systems. Our framework is general enough to be applicable for a variety of systems. In this framework, the number of errors the system admits is bounded over the worst case of adversarial users while simultaneously the system performs only small amount of manual testing for multitude of standard users who might still err while reporting. • Our main contribution is a randomized monitoring algorithm that we call Adaptive Probabilistic Testing (APT), that is simple to implement and has guarantees on expected number of errors. Even for adversarial users, the total expected error is bounded by εN over N flags for a given ε > 0. Simultaneously, the number of tests performed by the algorithm is within a constant factor of the best possible algorithm for standard users. • Finally, we present empirical study of our algorithm that shows its performance on both synthetic data and real data accumulated from a variety of negative feedback systems at Google. Our study indicates that the algorithm performs better than the analysis above shows.