Relay nodes are a potential threat to networks since they are used in many malicious situations like stepping stone attacks, botnet communication, peer-to-peer streaming etc. Quick and accurate detection of relay nodes in a network can significantly improve security policy enforcement. There has been significant work done and novel solutions proposed for the problem of identifying relay flows active within a node in the network. However, these solutions require quadratic number of comparisons in the number of flows. In this paper, a related problem of identifying relay nodes is investigated where a relay node is defined as a node in the network that has an active relay flow. The problem is formulated as a variance estimation problem and a statistical approach is proposed for the solution. The proposed solution requires linear time and space in the number of flows and therefore can be employed in large scale implementations. It can be used on its own to identify relay nodes or as a first step in a scalable relay flow detection solution that performs known quadratic time analysis techniques for relay flow detection only on nodes that have been detected as relay nodes. Experimental results show that the proposed scheme is able to detect relay nodes even in the presence of intentional inter-packet delays and chaff packets introduced by adversaries in order to defeat timing based detection algorithms.