Network operators usually adopt Traffic Engineering (TE) to configure the routing in their networks to achieve good load balancing performance and high resource utilization. While centralized TE can effectively improve network performance with a global view of the network, distributed TE has been considered as an alternative to manage large-scale networks that are usually partitioned into multiple regions. However, it is challenging for distributed TE to reach a global optimal performance since each region can make its local routing decisions only based on partially observed network states. In this paper, we propose a novel distributed TE scheme called FedTe, which leverages supervised learning coupled with a collaborative approach to improve the overall load balancing performance for multi-region networks. FedTe learns from the global optimal routing strategy in a centralized offline manner and predicts the optimal distribution of cross-region traffic among different regions through distributed deployment in real time. The predicted cross-region traffic distribution is integrated with measured local traffic to construct each region's optimal regional traffic matrix, which is used to perform intra-region TE optimization. FedTe can also handle dynamic traffic variation and link failures with a 2-layer hierarchical graph neural network architecture. To validate the effectiveness of the proposed scheme, we evaluate FedTe with two real-world network topologies and a large-scale synthetic topology. Extensive evaluation results show that FedTe can achieve near-optimal load balancing performance and outperform state-of-the-art distributed TE approaches by up to 28.9% on average.