Recent studies have shown the benefits of using additional elevation data [e.g., digital surface model (DSM) or normalized DSM (nDSM)] for enhancing the performance of the semantic labeling of aerial images. However, previous methods mostly adopt 3-D elevation information as additional inputs, while, in many real-world applications, one does not have the corresponding DSM images at hand, and the spatial resolution of acquired DSM images usually does not match the aerial images. To alleviate this data constraint and also take advantage of 3-D elevation information, in this letter, a geometry-aware segmentation model is introduced to achieve accurate semantic labeling of aerial images via joint height estimation. Instead of using a single-stream encoder&x2013;decoder network for semantic labeling, we design a separate decoder branch to predict the height map and use the DSM images as side supervision to train this newly designed decoder branch. With the newly designed decoder branch, our model can distill the 3-D geometric features from 2-D appearance features under the supervision of ground-truth DSM images. Moreover, we develop a new geometry-aware convolution module that fuses the 3-D geometric features from the height decoder branch and the 2-D contextual features from the semantic segmentation branch. The fused feature embeddings can produce geometry-aware segmentation maps with enhanced performance. Our model is trained with DSM images as side supervision, while, in the inference stage, it does not require DSM data and directly predicts the semantic labels. Experiments on International Society for Photogrammetry and Remote Sensing (ISPRS) Vaihingen and Potsdam data sets demonstrate the effectiveness of the proposed method for the semantic segmentation of aerial images.
- Data models
- Image segmentation
ASJC Scopus subject areas
- Geotechnical Engineering and Engineering Geology
- Electrical and Electronic Engineering