This paper presents a study on the robustness and variability of performance of general video game-playing agents. Agents analyzed includes those that won the different legs of the 2014 and 2015 General Video Game AI Competitions, and two sample agents distributed with its framework. Initially, these agents are run in four games and ranked according to the rules of the competition. Then, different modifications to the reward signal of the games are proposed and noise is introduced in either the actions executed by the controller, their forward model, or both. Results show that it is possible to produce a significant change in the rankings by introducing the modifications proposed here. This is an important result because it enables the set of human-authored games to be automatically expanded by adding parameter-varied versions that add information and insight into the relative strengths of the agents under test. Results also show that some controllers perform well under almost all conditions, a testament to the robustness of the GVGAI benchmark.