Modular construction has been an alternative to traditional construction processes to reduce environmental impact and construction waste as well as to deal with space constraints in highly dense urban construction sites. Furthermore, since modules are pre-fabricated in a controlled environment, modular construction has the advantage to achieve automation and optimization as compared to traditional construction. However, due to the one-of-a-type nature of construction projects, automation in construction is still in its infancy as compared to other manufacturing industries. Meanwhile, recently, advancements in technologies such as computer vision and deep learning provide opportunities to train machine intelligence to solve problems that were not possible before. In this study, we propose an approach to automatically generate high-resolution synthetic training data for scene understanding in the modular construction context. Evaluation of the approach in testbed factory settings shows that we can systematically capture and label AEC components such as walls and doors on RGB-D images as synthetic datasets for applications of supervised learning in relation to modular construction. The proposed method can provide a mechanism to feed the necessary but missing large-scale datasets to train scene understanding models in modular construction factories as modular projects and corresponding workpieces change.