Layered coding elegantly handles user bandwidth heterogeneity in video conferencing, however, it incurs rate and complexity overheads. An alternative is partitioning the receiver sets and using non-layered coding for each group. In this paper, we investigate how to maximize the received video quality for both systems under uplink and downlink capacity constraints, while limiting the number of hops that the videos travel by two. Towards this end, we first show that any multicast tree is equivalent to a collection of depth-1 and depth-2 trees, under outbound and inbound flow constraints. For the layered system, we propose an algorithm that simultaneously solves for the number of video layers, the rate and distribution tree of each layer. For the receiver partitioning system, we develop an algorithm for determining the receiver partitions and tree construction for each group. Through numerical comparison study, we show that the receiver partitioning system achieves significantly higher video quality than the layered system, due to its higher coding efficiency.